Note: this is for the original question, as well as for others who may be visiting this for related help, as was the case with me. ;)

Answer:
You may try: Poppler or pdf2htmlEX which is based on Poppler.

I'd recommend looking at the pdf2htmlEX documentation it also has as very good comparison table.

Answer from Asad Malik on Stack Overflow
🌐
npm
npmjs.com › package › pdf2html
pdf2html - npm
Convert PDF files to HTML, extract text, generate thumbnails, extract images, and extract metadata using Apache Tika and PDFBox
      » npm install pdf2html
    
Published   Jul 13, 2025
Version   4.4.0
Author   Shebin Leo Vincent
🌐
ConvertApi
convertapi.com › pdf-to-html › javascript
PDF to HTML Conversion JavaScript SDK – Convert PDF to HTML.
Easily convert PDFs to HTML using our PDF to HTML JS module. Enhance accessibility and user interaction by seamlessly transforming PDF content. Our conversion process preserves text, images, and formatting, ensuring accuracy.
🌐
Apryse
docs.apryse.com › core › guides › features › conversion › convert-pdf-to-html › js
Server/Desktop PDF to HTML Converter Library in Node.js (JavaScript) | Apryse documentation
... PDF Converter (SVG, XPS, TIFF, JPG, RTF, TXT, More) Full sample code which shows how to use PDFNet Convert for direct, high-quality conversion between PDF, XPS, EMF, SVG, TIFF, PNG, JPEG, and other image formats.
🌐
PDF.io
pdf.io › pdf2html
Convert PDF to HTML
Convert PDF to HTML without losing text or format · Choose File · Google Drive · Dropbox · URL · or · Drop files here · Loading file: 0% Cancel · Processing... Cancel · Done! Download · Google Drive · Dropbox · Start over · Settings · Embed CSS ·
🌐
GitHub
github.com › pdf2htmlEX › pdf2htmlEX
GitHub - pdf2htmlEX/pdf2htmlEX: Convert PDF to HTML without losing text or format.
Flexible output: all-in-one HTML or on demand page loading (needs JavaScript). Moderate file size, sometimes even smaller than PDF. Supporting links, outlines (bookmarks), printing, SVG background, Type 3 fonts and more... ... pdf2htmlEX, as a whole package, is licensed under GPLv3+. Some resource files are released with relaxed licenses, read LICENSE for more details. pdf2htmlEX is made possible thanks to the following projects:
Starred by 5.3K users
Forked by 497 users
Languages   HTML 83.6% | C++ 10.0% | Shell 2.4% | Python 1.2% | JavaScript 1.0% | C 0.6%
🌐
APITemplate.io
apitemplate.io › home › convert pdf to html instantly – fast, clean & easy
Convert PDF to HTML Instantly - Fast, Clean & Easy - APITemplate.io
December 8, 2024 - HTML to PDF generation: Convert HTML content directly into PDF format. URL to PDF generation: Convert entire web pages to PDFs by simply providing their URLs. The platform also supports custom CSS and JavaScript, allowing for advanced styling and customization. Its powerful template engine ensures that you can create complex documents efficiently, while the API Console enables you to preview and customize JSON data instantly.
🌐
IDRsolutions
idrsolutions.com › online-pdf-to-html-converter
PDF to HTML converter - online, free
5 days ago - from IDRSolutions import IDRCloudClient client = IDRCloudClient('https://cloud.idrsolutions.com/cloud/' + IDRCloudClient.BUILDVU) try: result = client.convert( input=IDRCloudClient.UPLOAD, file='path/to/file.pdf', token='YOUR_TRIAL_TOKEN' # Token provided to you via e-mail ) outputURL = result['downloadUrl'] client.downloadResult(result, 'path/to/outputDir') if outputURL is not None: print("Download URL: " + outputURL) except Exception as error: print(error) // To add the client to your project use will need to add the file idrcloudclient.js to your project and include the following line to ac
Find elsewhere
🌐
GroupDocs Cloud
blog.groupdocs.cloud › groupdocs cloud blog › convert pdf to html using node.js | pdf to webpage via rest api
Convert PDF to HTML using Node.js | PDF to Webpage via REST API
April 18, 2025 - Learn how to convert PDF to HTML using GroupDocs.Conversion Cloud SDK for Node.js. Easily transform PDF files into web-ready HTML with REST API.
🌐
IDRsolutions
idrsolutions.com › online-pdf-to-html5-converter
Online PDF converter - PDF to HTML5
5 days ago - from IDRSolutions import IDRCloudClient client = IDRCloudClient('https://cloud.idrsolutions.com/cloud/' + IDRCloudClient.BUILDVU) try: result = client.convert( input=IDRCloudClient.UPLOAD, file='path/to/file.pdf', token='YOUR_TRIAL_TOKEN' # Token provided to you via e-mail ) outputURL = result['downloadUrl'] client.downloadResult(result, 'path/to/outputDir') if outputURL is not None: print("Download URL: " + outputURL) except Exception as error: print(error) // To add the client to your project use will need to add the file idrcloudclient.js to your project and include the following line to ac
Top answer
1 of 16
325

jsPDF is able to use plugins. In order to enable it to print HTML, you have to include certain plugins and therefore have to do the following:

  1. Go to https://github.com/MrRio/jsPDF and download the latest Version.
  2. Include the following Scripts in your project:
  • jspdf.js
  • jspdf.plugin.from_html.js
  • jspdf.plugin.split_text_to_size.js
  • jspdf.plugin.standard_fonts_metrics.js

If you want to ignore certain elements, you have to mark them with an ID, which you can then ignore in a special element handler of jsPDF. Therefore your HTML should look like this:

<!DOCTYPE html>
<html>
  <body>
    <p id="ignorePDF">don't print this to pdf</p>
    <div>
      <p><font size="3" color="red">print this to pdf</font></p>
    </div>
  </body>
</html>

Then you use the following JavaScript code to open the created PDF in a PopUp:

var doc = new jsPDF();          
var elementHandler = {
  '#ignorePDF': function (element, renderer) {
    return true;
  }
};
var source = window.document.getElementsByTagName("body")[0];
doc.fromHTML(
    source,
    15,
    15,
    {
      'width': 180,'elementHandlers': elementHandler
    });

doc.output("dataurlnewwindow");

For me this created a nice and tidy PDF that only included the line 'print this to pdf'.

Please note that the special element handlers only deal with IDs in the current version, which is also stated in a GitHub Issue. It states:

Because the matching is done against every element in the node tree, my desire was to make it as fast as possible. In that case, it meant "Only element IDs are matched" The element IDs are still done in jQuery style "#id", but it does not mean that all jQuery selectors are supported.

Therefore replacing '#ignorePDF' with class selectors like '.ignorePDF' did not work for me. Instead you will have to add the same handler for each and every element, which you want to ignore like:

var elementHandler = {
  '#ignoreElement': function (element, renderer) {
    return true;
  },
  '#anotherIdToBeIgnored': function (element, renderer) {
    return true;
  }
};

From the examples it is also stated that it is possible to select tags like 'a' or 'li'. That might be a little bit to unrestrictive for the most usecases though:

We support special element handlers. Register them with jQuery-style ID selector for either ID or node name. ("#iAmID", "div", "span" etc.) There is no support for any other type of selectors (class, of compound) at this time.

One very important thing to add is that you lose all your style information (CSS). Luckily jsPDF is able to nicely format h1, h2, h3 etc., which was enough for my purposes. Additionally it will only print text within text nodes, which means that it will not print the values of textareas and the like. Example:

<body>
  <ul>
    <!-- This is printed as the element contains a textnode -->        
    <li>Print me!</li>
  </ul>
  <div>
    <!-- This is not printed because jsPDF doesn't deal with the value attribute -->
    <input type="text" value="Please print me, too!">
  </div>
</body>
2 of 16
94

This is the simple solution. This works for me. You can use the javascript print concept and simple save this as pdf.

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    <title></title>
    <script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js"></script>
    <script type="text/javascript">
        $("#btnPrint").live("click", function () {
            var divContents = $("#dvContainer").html();
            var printWindow = window.open('', '', 'height=400,width=800');
            printWindow.document.write('<html><head><title>DIV Contents</title>');
            printWindow.document.write('</head><body >');
            printWindow.document.write(divContents);
            printWindow.document.write('</body></html>');
            printWindow.document.close();
            printWindow.print();
        });
    </script>
</head>
<body>
    <form id="form1">
    <div id="dvContainer">
        This content needs to be printed.
    </div>
    <input type="button" value="Print Div Contents" id="btnPrint" />
    </form>
</body>
</html>
🌐
Reddit
reddit.com › r/askprogramming › converting pdf into html: is it possble?
r/AskProgramming on Reddit: Converting PDF into HTML: is it possble?
February 3, 2023 -

Currently for a client I am building a script to convert PDF into HTML fitting their CMS. We also do this conversion on DOCX (to HTML) and that goes wel. But PDF is a whole other format and less of a document format and more of a layout format. What would be the most efficient way to convert PDF to HTML in a format that strips the layout as much as possible, but for example keeps all markup (such as bold/italic texts), images, etc.

Does anyone have experience with this kind of conversion? I can use recommendations and advice!

Things I have tried:

  • pdf2htmlEX: Very elegant for normal conversions for users in the browser, but it is so elegant that it keeps the layout, strips tags and put them as styling (CSS) and converts tables to background images; not something useful for me

  • pdftohtml: Not the most pretty output, disregards tables, puts a lot of <br/> tags into the HTML.

Things I still want to try:

  • Parsr: [EDIT] After some experience with Parsr, it might be exactly what I am looking for. And it seems it's capable of Markdown conversion (haven't seen it working yet), that would mean I can easily convert to HTML. [EDIT 2] This tool is performing amazingly well!

🌐
CloudConvert
cloudconvert.com › pdf-to-html
PDF to HTML Converter
>PDF to HTML Converter - CloudConvert is a free & fast online file conversion service.
🌐
PDFCrowd
pdfcrowd.com › html-to-pdf
Convert HTML to PDF Online - PDFCrowd
Convert webpages and HTML files to PDF easily with our online converter. Fast, free, and optimized for perfect readability.
🌐
IDRsolutions
idrsolutions.com › online-pdfform-to-html-converter
Online PDF Form to HTML Converter - IDRsolutions
5 days ago - HTMLConversionOptions conversionOptions ... File("C:/MyDocument.pdf"); File output = new File("C:/MyOutputDirectory/"); PDFtoHTML5Converter converter = new PDFtoHTML5Converter(input, output, conversionOptions, viewerOptions); try ...
🌐
DocHub
dochub.com › en › functionalities › convert-pdf-to-js
Convert PDF to JS | DocHub
Save time and transform files with a secure and easy online tool. Upload and convert PDF to JS. Get your file ready to use in no time.
🌐
PDF Generation API
pdfgeneratorapi.com › home › 3 ways to generate pdf from html with javascript
3 Ways to Generate PDF from HTML with JavaScript - PDF Generator API
May 8, 2025 - This article has explored various approaches to converting HTML content into PDFs. We’ve discussed popular JavaScript libraries like html2pdf, Puppeteer, and jsPDF, each offering unique functionalities and use cases. We’ve also compared client-side and server-side generation methods, helping you understand their strengths and limitations.
🌐
Xodo
xodo.com › convert-pdf-to-html
PDF to HTML | Free PDF to HTML Converter Online
Convert PDF to HTML online. Turn your PDFs into HTML files directly in your browser with our free PDF to HTML Converter.
🌐
ConvertApi
convertapi.com › pdf-to-html
PDF to HTML Conversion API – Convert PDF to HTML
Convert PDFs to HTML via API. Preserve layout, text, images, and tables; handle passwords and ranges; use WYSIWYG and OCR for scans.
🌐
iLovePDF
ilovepdf.com › html-to-pdf
HTML to PDF converter. Transform HTML pages into PDF
Converting HTML to PDF... ... The given URL is invalid. Please check to see if it is written correctly. Select your web files here or just drag and drop them here. Files can be html, css or js types.