Look at the newest version of mudraw. It is a command line tool that is from the MuPDF family of tools.

mudraw -o out.html -F html in.pdf

Use the newest version, if possible. It has gained quite a few new and additional features (it can do more than just PDF->HTML conversion):

$ mudraw
   Usage: mudraw [options] file [pages]
   -p -  password

   -o -  output file name (%d for page number)
   -F -  output format (default inferred from output file name)
         raster: png, tga, pnm, pam, pbm, pwg, pcl
         vector: svg, pdf, trace
         text: txt, html, stext

   -s -  show extra information:
         m - show memory use
         t - show timings
         f - show page features
         5 - show md5 checksum of rendered image

   -R -  rotate clockwise (default: 0 degrees)
   -r -  resolution in dpi (default: 72)
   -w -  width (in pixels) (maximum width if -r is specified)
   -h -  height (in pixels) (maximum height if -r is specified)
   -f -  fit width and/or height exactly; ignore original aspect ratio
   -B -  maximum bandheight (pgm, ppm, pam, png output only)

   -W -  page width for EPUB layout
   -H -  page height for EPUB layout
   -S -  font size for EPUB layout

   -c -  colorspace (mono, gray, grayalpha, rgb, rgba, cmyk, cmykalpha)
   -G -  apply gamma correction
   -I    invert colors

   -A -  number of bits of antialiasing (0 to 8)
   -D    disable use of display list
   -i    ignore errors

   pages comma separated list of page numbers and ranges

Update (April 2016)

The calling convention of the tool has been changed. It is still part of the MuPDF family, but you run it like this now:

mutool draw
Answer from Kurt Pfeifle on Stack Exchange
Top answer
1 of 8
3

Look at the newest version of mudraw. It is a command line tool that is from the MuPDF family of tools.

mudraw -o out.html -F html in.pdf

Use the newest version, if possible. It has gained quite a few new and additional features (it can do more than just PDF->HTML conversion):

$ mudraw
   Usage: mudraw [options] file [pages]
   -p -  password

   -o -  output file name (%d for page number)
   -F -  output format (default inferred from output file name)
         raster: png, tga, pnm, pam, pbm, pwg, pcl
         vector: svg, pdf, trace
         text: txt, html, stext

   -s -  show extra information:
         m - show memory use
         t - show timings
         f - show page features
         5 - show md5 checksum of rendered image

   -R -  rotate clockwise (default: 0 degrees)
   -r -  resolution in dpi (default: 72)
   -w -  width (in pixels) (maximum width if -r is specified)
   -h -  height (in pixels) (maximum height if -r is specified)
   -f -  fit width and/or height exactly; ignore original aspect ratio
   -B -  maximum bandheight (pgm, ppm, pam, png output only)

   -W -  page width for EPUB layout
   -H -  page height for EPUB layout
   -S -  font size for EPUB layout

   -c -  colorspace (mono, gray, grayalpha, rgb, rgba, cmyk, cmykalpha)
   -G -  apply gamma correction
   -I    invert colors

   -A -  number of bits of antialiasing (0 to 8)
   -D    disable use of display list
   -i    ignore errors

   pages comma separated list of page numbers and ranges

Update (April 2016)

The calling convention of the tool has been changed. It is still part of the MuPDF family, but you run it like this now:

mutool draw
2 of 8
2

pdf2htmlEX accurately converts PDFs to HTML and retains the formatting. However, the generated HTML code is hard to read and parse programmatically. It is free, open source, and works offline on a variety of platforms.

https://github.com/coolwanglu/pdf2htmlEX

https://github.com/coolwanglu/pdf2htmlEX/wiki/Download

🌐
GitHub
github.com › coolwanglu › pdf2htmlEX
GitHub - coolwanglu/pdf2htmlEX: Convert PDF to HTML without losing text or format.
Convert PDF to HTML without losing text or format. - coolwanglu/pdf2htmlEX
Starred by 10.6K users
Forked by 1.9K users
Languages   HTML 82.7% | C++ 12.6% | JavaScript 1.4% | Python 1.1% | Roff 0.6% | C 0.6%
🌐
GitHub
github.com › pdf2htmlEX › pdf2htmlEX
GitHub - pdf2htmlEX/pdf2htmlEX: Convert PDF to HTML without losing text or format.
Convert PDF to HTML without losing text or format. - pdf2htmlEX/pdf2htmlEX
Starred by 5.3K users
Forked by 497 users
Languages   HTML 83.6% | C++ 10.0% | Shell 2.4% | Python 1.2% | JavaScript 1.0% | C 0.6%
🌐
FlowPaper
flowpaper.com › blog › Best-ways-to-convert-PDF-to-HTML5
How to best convert PDF to HTML5
The mutool command is used to convert PDF files to HTML. The basic syntax is: ... After running the conversion, the output.html file will be generated in the current directory. Open the resulting HTML file in a web browser to check the output. ... Open-Source and Free.
🌐
CloudConvert
cloudconvert.com › pdf-to-html
PDF to HTML | CloudConvert
>PDF to HTML Converter - CloudConvert is a free & fast online file conversion service.
🌐
wkhtmltopdf
wkhtmltopdf.org
wkhtmltopdf
Download a precompiled binary or build from source · Create your HTML document that you want to turn into a PDF (or image) Run your HTML document through the tool. For example, if I really like the treatment Google has done to their logo today and want to capture it forever as a PDF:
🌐
Reddit
reddit.com › r/askprogramming › converting pdf into html: is it possble?
r/AskProgramming on Reddit: Converting PDF into HTML: is it possble?
February 3, 2023 -

Currently for a client I am building a script to convert PDF into HTML fitting their CMS. We also do this conversion on DOCX (to HTML) and that goes wel. But PDF is a whole other format and less of a document format and more of a layout format. What would be the most efficient way to convert PDF to HTML in a format that strips the layout as much as possible, but for example keeps all markup (such as bold/italic texts), images, etc.

Does anyone have experience with this kind of conversion? I can use recommendations and advice!

Things I have tried:

  • pdf2htmlEX: Very elegant for normal conversions for users in the browser, but it is so elegant that it keeps the layout, strips tags and put them as styling (CSS) and converts tables to background images; not something useful for me

  • pdftohtml: Not the most pretty output, disregards tables, puts a lot of <br/> tags into the HTML.

Things I still want to try:

  • Parsr: [EDIT] After some experience with Parsr, it might be exactly what I am looking for. And it seems it's capable of Markdown conversion (haven't seen it working yet), that would mean I can easily convert to HTML. [EDIT 2] This tool is performing amazingly well!

Find elsewhere
🌐
BitRecover
bitrecover.com › home › convert your data › how to convert pdf to html free? 2 ways discussed
Convert PDF to HTML Free by Discovering Top 2 Methods
May 28, 2024 - Select the destination path and click “Save” to convert PDF to HTML free. Tips and Tricks: After selecting open with MS Word option, it shows two different options i.e.
🌐
IDRsolutions
idrsolutions.com › online-pdf-to-html-converter
PDF to HTML converter - online, free
4 days ago - Convert PDF to HTML online. Smallest file size, highest quality conversion. Free on individual files using BuildVu commercial PDF to HTML converter. 🏆
🌐
FreeConvert
freeconvert.com › pdf-to-html
PDF to HTML Converter - FreeConvert.com
You can also batch convert PDF to HTML format. We use both open source and custom software to make sure our conversions are of the highest quality.
🌐
IDRsolutions
idrsolutions.com › online-pdf-to-html5-converter
Online PDF converter - PDF to HTML5
4 days ago - Convert PDF to HTML5 online. Smallest file size, highest quality conversion. Free on individual files using BuildVu commercial PDF to HTML5 converter. 🏆
🌐
PDF24 Tools
tools.pdf24.org › en › pdf-to-html
PDF to HTML converter - quick, online, free - PDF24
Free online converter to convert PDF to HTML. Quick and Easy. Without installation. Without registration.
Rating: 4.9 ​ - ​ 138 votes
🌐
Rasteredge
rasteredge.com › how-to › csharp-imaging › pdf-convert-html
C#: How to convert PDF to html files programmatically using C#.net library with online free open source code
C#: Convert PDF to HTML · C#: Convert PDF to Jpeg · C# File: Compress PDF · C# File: Merge PDF · C# File: Split PDF · C# Page: Insert PDF pages · C# Read: PDF Text Extract · C# Read: PDF Image Extract · C# Write: Insert text into PDF · C# Write: Add Image to PDF ·
🌐
Nutrient
nutrient.io › blog › sdk › top ten ways to convert html to pdf
How to convert HTML to PDF: 10 best tools compared
August 20, 2025 - In practice, teams use open source ... break. Start a free trial or talk to Sales. wkhtmltopdf(opens in a new tab) is a command-line tool that uses WebKit to convert HTML to PDF....
🌐
Reddit
reddit.com › r/web_design › converting pdf to html5 magazine - any good open source projects?
r/web_design on Reddit: Converting PDF to HTML5 magazine - any good open source projects?
December 29, 2015 -

I my organization we get a couple of PDF magazines every week. Some of the magazines is only 10-20 pages and 10 MB in size. Others are several hundred pages and up to 250 MB in size.

Most people want to view them on their iPad, and until now we just posted the PDFs to a Apache host where indexing is allowed.

It works, but when the magazine is not showed in the browsers before the entire PDF is downloaded. That can take quite a long time.

The ideal solution would somewhat like Issuu, but that could be hosted locally. If I could just FTP upload the PDFs to a folder and let the software convert the files automatically it would be perfect.

Does such a fantastic piece of software exist?

🌐
Apache PDFBox
products.documentprocessing.com › conversion › python › xhtml2pdf
xhtml2pdf | Convert HTML to PDF in Python | Open Source
June 25, 2025 - Additionally, it lacks support for dynamic pages that depend on JavaScript to fetch content, and it doesn't follow complex CSS-specified layouts (for example, it applies the colors, font sizes etc. mentioned in the CSS but it ignores the layout CSS such as paddings, margin, display etc). Despite these constraints, xhtml2pdf remains a valuable tool for straightforward static HTML-to-PDF conversion needs. Markdown-It API – Convert Markdown to HTML in Python · ODFPy API – Efficient OpenDocument Format (ODF) Conversion in Python
🌐
Reddit
reddit.com › r/html › what is the best way to convert a pdf to html
r/HTML on Reddit: What is the best way to convert a pdf to html
November 18, 2024 -

We are moving systems and to do that i need the html of an invoice / quote what would be the best way to convert a pdf into html code?

Ps. I did ask ChatGPT but he sucks at this

edit: I am a Python developer and I don't know anything about HTML