convert pdf to html open source

Is there a free PDF to HTML conversion software?

softwarerecs.stackexchange.com › questions › 19081 › is-there-a-free-pdf-to-html-conversion-software

Look at the newest version of mudraw. It is a command line tool that is from the MuPDF family of tools.

mudraw -o out.html -F html in.pdf

Use the newest version, if possible. It has gained quite a few new and additional features (it can do more than just PDF->HTML conversion):

$ mudraw
   Usage: mudraw [options] file [pages]
   -p -  password

   -o -  output file name (%d for page number)
   -F -  output format (default inferred from output file name)
         raster: png, tga, pnm, pam, pbm, pwg, pcl
         vector: svg, pdf, trace
         text: txt, html, stext

   -s -  show extra information:
         m - show memory use
         t - show timings
         f - show page features
         5 - show md5 checksum of rendered image

   -R -  rotate clockwise (default: 0 degrees)
   -r -  resolution in dpi (default: 72)
   -w -  width (in pixels) (maximum width if -r is specified)
   -h -  height (in pixels) (maximum height if -r is specified)
   -f -  fit width and/or height exactly; ignore original aspect ratio
   -B -  maximum bandheight (pgm, ppm, pam, png output only)

   -W -  page width for EPUB layout
   -H -  page height for EPUB layout
   -S -  font size for EPUB layout

   -c -  colorspace (mono, gray, grayalpha, rgb, rgba, cmyk, cmykalpha)
   -G -  apply gamma correction
   -I    invert colors

   -A -  number of bits of antialiasing (0 to 8)
   -D    disable use of display list
   -i    ignore errors

   pages comma separated list of page numbers and ranges

Update (April 2016)

The calling convention of the tool has been changed. It is still part of the MuPDF family, but you run it like this now:

mutool draw

Answer from Kurt Pfeifle on Stack Exchange

Stack Exchange

softwarerecs.stackexchange.com › questions › 19081 › is-there-a-free-pdf-to-html-conversion-software

gratis - Is there a free PDF to HTML conversion software? - Software Recommendations Stack Exchange

Top answer

1 of 8

Look at the newest version of mudraw. It is a command line tool that is from the MuPDF family of tools.

mudraw -o out.html -F html in.pdf

Use the newest version, if possible. It has gained quite a few new and additional features (it can do more than just PDF->HTML conversion):

$ mudraw
   Usage: mudraw [options] file [pages]
   -p -  password

   -o -  output file name (%d for page number)
   -F -  output format (default inferred from output file name)
         raster: png, tga, pnm, pam, pbm, pwg, pcl
         vector: svg, pdf, trace
         text: txt, html, stext

   -s -  show extra information:
         m - show memory use
         t - show timings
         f - show page features
         5 - show md5 checksum of rendered image

   -R -  rotate clockwise (default: 0 degrees)
   -r -  resolution in dpi (default: 72)
   -w -  width (in pixels) (maximum width if -r is specified)
   -h -  height (in pixels) (maximum height if -r is specified)
   -f -  fit width and/or height exactly; ignore original aspect ratio
   -B -  maximum bandheight (pgm, ppm, pam, png output only)

   -W -  page width for EPUB layout
   -H -  page height for EPUB layout
   -S -  font size for EPUB layout

   -c -  colorspace (mono, gray, grayalpha, rgb, rgba, cmyk, cmykalpha)
   -G -  apply gamma correction
   -I    invert colors

   -A -  number of bits of antialiasing (0 to 8)
   -D    disable use of display list
   -i    ignore errors

   pages comma separated list of page numbers and ranges

Update (April 2016)

The calling convention of the tool has been changed. It is still part of the MuPDF family, but you run it like this now:

mutool draw

2 of 8

pdf2htmlEX accurately converts PDFs to HTML and retains the formatting. However, the generated HTML code is hard to read and parse programmatically. It is free, open source, and works offline on a variety of platforms.

https://github.com/coolwanglu/pdf2htmlEX

https://github.com/coolwanglu/pdf2htmlEX/wiki/Download

GitHub

github.com › coolwanglu › pdf2htmlEX

GitHub - coolwanglu/pdf2htmlEX: Convert PDF to HTML without losing text or format.

Convert PDF to HTML without losing text or format. - coolwanglu/pdf2htmlEX

Starred by 10.6K users

Forked by 1.9K users

Videos

01:32

YouTube

Convert HTML to PDF in Seconds ⚡ | 100% Free Tool by OutrightCRM ...

July 3, 2025

youtube.com

How to convert HTML to PDF with C# in .NET 8 | IronPDF

03:34

YouTube

How to Convert HTML to PDF FREE - YouTube

August 20, 2024

04:46

YouTube

Finally! A free MODERN Html to Pdf Converter - YouTube

July 7, 2024

02:24

YouTube

How to Convert HTML to PDF | How to HTML to PDF Converter - YouTube

May 8, 2024

00:41

YouTube

How to Convert HTML to PDF for FREE - YouTube

December 1, 2023

View all

GitHub

github.com › pdf2htmlEX › pdf2htmlEX

GitHub - pdf2htmlEX/pdf2htmlEX: Convert PDF to HTML without losing text or format.

Convert PDF to HTML without losing text or format. - pdf2htmlEX/pdf2htmlEX

Starred by 5.3K users

Forked by 497 users

Stack Overflow

stackoverflow.com › questions › 8370014 › how-to-convert-pdf-to-html

How to convert PDF to HTML? - Stack Overflow

Top answer

1 of 5

If you're on Linux, try pdftohtml:

sudo apt-get install poppler-utils
pdftohtml -enc UTF-8 -noframes infile.pdf outfile.html

On MacOS (with homebrew) pdftohtml can be installed with:

brew install pdftohtml

The open source ebook converter Calibre can also convert PDF files to HTML and is available on MacOS, Windows and Linux.

2 of 5

Like I mentioned in the comment above, it is definitely possible to convert pdf to html using the tool Able2Extract7 which can be downloaded from here

I have been using this tool for almost 2 years now and I am pretty happy with it. This tool lets you convert PDF to Word, Excel, PowerPoint, Publisher, HTML, OO etc. See screenshot

Imp Note: This tool is not a freeware.

HTH

FlowPaper

flowpaper.com › blog › Best-ways-to-convert-PDF-to-HTML5

How to best convert PDF to HTML5

The mutool command is used to convert PDF files to HTML. The basic syntax is: ... After running the conversion, the output.html file will be generated in the current directory. Open the resulting HTML file in a web browser to check the output. ... Open-Source and Free.

CloudConvert

cloudconvert.com › pdf-to-html

PDF to HTML | CloudConvert

>PDF to HTML Converter - CloudConvert is a free & fast online file conversion service.

wkhtmltopdf

wkhtmltopdf.org

wkhtmltopdf

Download a precompiled binary or build from source · Create your HTML document that you want to turn into a PDF (or image) Run your HTML document through the tool. For example, if I really like the treatment Google has done to their logo today and want to capture it forever as a PDF:

reddit.com › r/askprogramming › converting pdf into html: is it possble?

r/AskProgramming on Reddit: Converting PDF into HTML: is it possble?

February 3, 2023 -

Currently for a client I am building a script to convert PDF into HTML fitting their CMS. We also do this conversion on DOCX (to HTML) and that goes wel. But PDF is a whole other format and less of a document format and more of a layout format. What would be the most efficient way to convert PDF to HTML in a format that strips the layout as much as possible, but for example keeps all markup (such as bold/italic texts), images, etc.

Does anyone have experience with this kind of conversion? I can use recommendations and advice!

Things I have tried:

pdf2htmlEX: Very elegant for normal conversions for users in the browser, but it is so elegant that it keeps the layout, strips tags and put them as styling (CSS) and converts tables to background images; not something useful for me
pdftohtml: Not the most pretty output, disregards tables, puts a lot of <br/> tags into the HTML.

Things I still want to try:

Parsr: [EDIT] After some experience with Parsr, it might be exactly what I am looking for. And it seems it's capable of Markdown conversion (haven't seen it working yet), that would mean I can easily convert to HTML. [EDIT 2] This tool is performing amazingly well!

Top answer

1 of 3

Oh interesting, I could’ve used this tool before. The best I found before was pdftohtml as well. Generally converting from PDF is awful. PDFs internally are more or less postscript which is like a little programming language that describes how to print things, as opposed to a markup language, so reverse-engineering the content can be whacky. If you haven’t used pandoc before that’s very useful in general (though not for PDf-to-other formats, you’ll need specialty tools like this)

2 of 3

Adobe Acrobat Pro DC: a paid software with powerful PDF editing and conversion capabilities. Online-Convert.com: a free online conversion tool that supports various file formats including PDF to HTML. Calibre: an open-source e-book management software that can also convert PDFs to HTML. SmallPDF: an online PDF tool that offers a simple and easy-to-use PDF to HTML conversion. Nitro Pro: a paid software that offers fast and accurate PDF to HTML conversion.

Find elsewhere

Google Bing Mojeek

BitRecover

bitrecover.com › home › convert your data › how to convert pdf to html free? 2 ways discussed

Convert PDF to HTML Free by Discovering Top 2 Methods

May 28, 2024 - Select the destination path and click “Save” to convert PDF to HTML free. Tips and Tricks: After selecting open with MS Word option, it shows two different options i.e.

IDRsolutions

idrsolutions.com › online-pdf-to-html-converter

PDF to HTML converter - online, free

4 days ago - Convert PDF to HTML online. Smallest file size, highest quality conversion. Free on individual files using BuildVu commercial PDF to HTML converter. 🏆

Stack Overflow

stackoverflow.com › questions › 3199659 › convert-pdf-doc-ppt-to-html5

html - Convert pdf, doc, ppt to html5 - Stack Overflow

Top answer

1 of 6

You're unlikely to find a single offering that does all this, especially in the open source world. It's more likely that you'll end up relying on a mishmash of things, and may even need to chain some converters in order to get to HTML. (Eg PDF -> ps -> HTML)

OpenOffice supports conversion to HTML, and can be called from the command line.

http://pdftohtml.sourceforge.net/ looks reasonably good at converting pdf to html.

For Doc that is Word ML or OpenXML format it's conceivable that you could use XSLT transforms since both input and output formats are XML. I've seen some stylesheets floating around the net that do this, but YMMV.

Incidentally, why is there a specific requirement for open source? MS Powerpoint already supports save-as-HTML for example.

2 of 6

Open Office will convert pdf to html but you'll take a hit to design quality.

I suggest either: Crocodoc as a paid service (It provides different flavours for different platforms such as Python,Ruby,Java,PHP Developers are allowed to work on their APIs.) or waiting for an official Adobe tool (it's in the works).

FreeConvert

freeconvert.com › pdf-to-html

PDF to HTML Converter - FreeConvert.com

You can also batch convert PDF to HTML format. We use both open source and custom software to make sure our conversions are of the highest quality.

IDRsolutions

idrsolutions.com › online-pdf-to-html5-converter

Online PDF converter - PDF to HTML5

4 days ago - Convert PDF to HTML5 online. Smallest file size, highest quality conversion. Free on individual files using BuildVu commercial PDF to HTML5 converter. 🏆

PDF24 Tools

tools.pdf24.org › en › pdf-to-html

PDF to HTML converter - quick, online, free - PDF24

Free online converter to convert PDF to HTML. Quick and Easy. Without installation. Without registration.

Rating: 4.9 - 138 votes

Rasteredge

rasteredge.com › how-to › csharp-imaging › pdf-convert-html

C#: How to convert PDF to html files programmatically using C#.net library with online free open source code

C#: Convert PDF to HTML · C#: Convert PDF to Jpeg · C# File: Compress PDF · C# File: Merge PDF · C# File: Split PDF · C# Page: Insert PDF pages · C# Read: PDF Text Extract · C# Read: PDF Image Extract · C# Write: Insert text into PDF · C# Write: Add Image to PDF ·

Nutrient

nutrient.io › blog › sdk › top ten ways to convert html to pdf

How to convert HTML to PDF: 10 best tools compared

August 20, 2025 - In practice, teams use open source ... break. Start a free trial or talk to Sales. wkhtmltopdf(opens in a new tab) is a command-line tool that uses WebKit to convert HTML to PDF....

reddit.com › r/web_design › converting pdf to html5 magazine - any good open source projects?

r/web_design on Reddit: Converting PDF to HTML5 magazine - any good open source projects?

December 29, 2015 -

I my organization we get a couple of PDF magazines every week. Some of the magazines is only 10-20 pages and 10 MB in size. Others are several hundred pages and up to 250 MB in size.

Most people want to view them on their iPad, and until now we just posted the PDFs to a Apache host where indexing is allowed.

It works, but when the magazine is not showed in the browsers before the entire PDF is downloaded. That can take quite a long time.

The ideal solution would somewhat like Issuu, but that could be hosted locally. If I could just FTP upload the PDFs to a folder and let the software convert the files automatically it would be perfect.

Does such a fantastic piece of software exist?

Top answer

1 of 4

You may want to just learn how to encode PDFs for web-optimized display. For starters just check "Fast Web View" when creating the PDF and it'll load progressively, and downsample raster graphics to lower PPI. The next thing I'd recommend is to take a look at Mozilla's PDF.js ... if you use Firefox you're actually already using it, since this is what Firefox uses to display PDFs in the browser. It's pure Javascript, so you can also easily use it to display a PDF from inside a web page. Both of those are alternatives to converting the PDF to HTML, which you probably already know is going to be problematic if you know anything about host PostScript works. There are libraries to do this, but the results aren't fantastic.

2 of 4

I'd be interested in this solution too. I run a NSFW photography website. Customers download or view my photobooks as PDF files served via woocommerce. It works well on all devices, but I feel an HTML5 solution would be more streamlined. u/Xaositek your solution is a paid program.

Apache PDFBox

products.documentprocessing.com › conversion › python › xhtml2pdf

xhtml2pdf | Convert HTML to PDF in Python | Open Source

June 25, 2025 - Additionally, it lacks support for dynamic pages that depend on JavaScript to fetch content, and it doesn't follow complex CSS-specified layouts (for example, it applies the colors, font sizes etc. mentioned in the CSS but it ignores the layout CSS such as paddings, margin, display etc). Despite these constraints, xhtml2pdf remains a valuable tool for straightforward static HTML-to-PDF conversion needs. Markdown-It API – Convert Markdown to HTML in Python · ODFPy API – Efficient OpenDocument Format (ODF) Conversion in Python