Videos
How can I convert PDF files to HTML?
How to convert PDF to HTML on Mac?
How can I display PDF as HTML?
Currently for a client I am building a script to convert PDF into HTML fitting their CMS. We also do this conversion on DOCX (to HTML) and that goes wel. But PDF is a whole other format and less of a document format and more of a layout format. What would be the most efficient way to convert PDF to HTML in a format that strips the layout as much as possible, but for example keeps all markup (such as bold/italic texts), images, etc.
Does anyone have experience with this kind of conversion? I can use recommendations and advice!
Things I have tried:
-
pdf2htmlEX: Very elegant for normal conversions for users in the browser, but it is so elegant that it keeps the layout, strips tags and put them as styling (CSS) and converts tables to background images; not something useful for me
-
pdftohtml: Not the most pretty output, disregards tables, puts a lot of
<br/>tags into the HTML.
Things I still want to try:
-
Parsr: [EDIT] After some experience with Parsr, it might be exactly what I am looking for. And it seems it's capable of Markdown conversion (haven't seen it working yet), that would mean I can easily convert to HTML. [EDIT 2] This tool is performing amazingly well!