the module PDFWriter is in xtopdf

PDFWriter - a core class of the xtopdf toolkit - can now be used with a Python context manager, a.k.a. the Python with statement.

( http://code.activestate.com/recipes/578790-use-pdfwriter-with-context-manager-support/ )

how to install xtopdf is in https://bitbucket.org/vasudevram/xtopdf :

Installation and usage:

To install the files, first make sure that you have downloaded and installed all the prerequisities mentioned above, including setup steps such as adding needed directories to your PYTHONPATH. Then, copy all the files in xtopdf.zip into a directory which is on your PYTHONPATH.

To use any of the Python programs, run the .py file as:

python filename.py

This will give a usage message about the correct usage and arguments expected.

To run the shell script(s), do the same as above.

Developers can look at the source code for further information.

an alternative is to use pdfdocument to create the pdf, it can be installed using pip ( https://pypi.python.org/pypi/pdfdocument )

parse the data from the json data ( How can I parse GeoJSON with Python, Parse JSON in Python ) and print it as pdf using pdfdocument ( https://pypi.python.org/pypi/pdfdocument )

  import json
  data = json.loads(datastring)

from io import BytesIO
from pdfdocument.document import PDFDocument

def say_hello():
    f = BytesIO()
    pdf = PDFDocument(f)
    pdf.init_report()
    pdf.h1('Hello World')
    pdf.p('Creating PDFs made easy.')
    pdf.generate()
    return f.getvalue()
Answer from ralf htp on Stack Overflow
🌐
PyPI
pypi.org › project › json2pdf-Converter
json2pdf-Converter
JavaScript is disabled in your browser. Please enable JavaScript to proceed · A required part of this site couldn’t load. This may be due to a browser extension, network issues, or browser settings. Please check your connection, disable any ad blockers, or try using a different browser
🌐
Flinks
help.flinks.com › support › solutions › articles › 43000729546-how-to-convert-the-a-json-payload-to-a-pdf-file-python-
How to convert the a .json payload to a .pdf file (python) :
June 6, 2024 - import json from fpdf import FPDF def json_to_pdf(json_path, pdf_path): # Read JSON file with open(json_path, 'r') as f: data = json.load(f) # Convert JSON data to a pretty printed string json_str = json.dumps(data, indent=4) # Create a PDF ...
Discussions

parse pdf to json using python
No, because the pdf format does not save the document structure. The way pdf works is by saving the absolute position of things, not the relative position. More on reddit.com
🌐 r/learnpython
5
2
May 13, 2023
python - Transform a json file into a pdf - Stack Overflow
I'm having trouble creating a table that allows me to make items that are too long wrap automatically and not overflow to the right side. I paste an example of the json code that I should transform into pdf and then my implementation in python (which unfortunately returns a bad result) More on stackoverflow.com
🌐 stackoverflow.com
Convert pdf data to JSON format using Python? - Stack Overflow
I am trying to print data in JSON format but it is being printed in text format import PyPDF2 import json pdf_file = open('data.pdf', 'rb') read_pdf = PyPDF2.PdfFileReader(pdf_file) number_of_page... More on stackoverflow.com
🌐 stackoverflow.com
Create JSON to PDF in Lambda using Python - Serverless Framework - Serverless Forums
Hi All, I need to create PDF file using JSON on http request using python in AWS lambda and then store back the PDF in S3 bucket. Any help on this on how to proceds in saving the runtime pdf in S3 bucket. I am trying to use http://code.activestate.com/recipes/578979-convert-json-to-pdf-with-python-... More on forum.serverless.com
🌐 forum.serverless.com
0
December 27, 2018
Top answer
1 of 3
5

the module PDFWriter is in xtopdf

PDFWriter - a core class of the xtopdf toolkit - can now be used with a Python context manager, a.k.a. the Python with statement.

( http://code.activestate.com/recipes/578790-use-pdfwriter-with-context-manager-support/ )

how to install xtopdf is in https://bitbucket.org/vasudevram/xtopdf :

Installation and usage:

To install the files, first make sure that you have downloaded and installed all the prerequisities mentioned above, including setup steps such as adding needed directories to your PYTHONPATH. Then, copy all the files in xtopdf.zip into a directory which is on your PYTHONPATH.

To use any of the Python programs, run the .py file as:

python filename.py

This will give a usage message about the correct usage and arguments expected.

To run the shell script(s), do the same as above.

Developers can look at the source code for further information.

an alternative is to use pdfdocument to create the pdf, it can be installed using pip ( https://pypi.python.org/pypi/pdfdocument )

parse the data from the json data ( How can I parse GeoJSON with Python, Parse JSON in Python ) and print it as pdf using pdfdocument ( https://pypi.python.org/pypi/pdfdocument )

  import json
  data = json.loads(datastring)

from io import BytesIO
from pdfdocument.document import PDFDocument

def say_hello():
    f = BytesIO()
    pdf = PDFDocument(f)
    pdf.init_report()
    pdf.h1('Hello World')
    pdf.p('Creating PDFs made easy.')
    pdf.generate()
    return f.getvalue()
2 of 3
0
from json2html import *
import json
import tempfile

class PdfConverter(object):

    def __init__(self):
        pass

    def to_html(self, json_doc):
        return json2html.convert(json=json_doc)

    def to_pdf(self, html_str):
        return pdfkit.from_string(html_str, None)

 def main():
     stowflw = {
     "data": [
        {
            "state": "Manchester",
            "quantity": 20
        },
       {
            "state": "Surrey",
            "quantity": 46
       },
       {
            "state": "Scotland",
            "quantity": 36
       },
       {
            "state": "Kent",
            "quantity": 23
       },
       {
             "state": "Devon",
             "quantity": 43
       },
       {
             "state": "Glamorgan",
             "quantity": 43
       }
     ]
   }

    pdfc = PdfConverter()
    with open("sample.pdf", "wb") as pdf_fl:
       pdf_fl.write(pdfc.to_pdf(pdfc.to_html(json.dumps(stowflw))))
  1. install json2html
  2. install pdfkit (requires wkhtmltox)
🌐
Aspose
products.aspose.com › aspose.cells › python via java › conversion › json to pdf
Python JSON to PDF - JSON to PDF Converter | products.aspose.com
November 13, 2025 - Add a library reference (import the library) to your Python project. Load JSON file with an instance of Workbook. Convert JSON to PDF by calling Workbook.save method.
🌐
SysTools Group
systoolsgroup.com › home › how to convert json to pdf in 2026? 3 expert ways
Top Methods to Convert JSON to PDF (Free, Python & Converter)
January 2, 2026 - Learn how to convert JSON to PDF for free with online tools, Python scripts, and Excel. Step-by-step JSON to PDF converter guide included!
🌐
Reddit
reddit.com › r/learnpython › parse pdf to json using python
r/learnpython on Reddit: parse pdf to json using python
May 13, 2023 -

Im searching for a while now for a library that can parse a pdf to json or xml format while keeping the document structure.
the popular libs like pypdf do often not preserve the document structure. Thought about using teseract for OCR and then transforming it into a json format but could not get it working. Is there a library that can parse pdf to json format while preserving the document structure and not just spitt out a block of text ?

🌐
GitHub
gist.github.com › aspose-com-gists › 259a761108688d6db5481256c8606c44
Convert JSON Files to PDF in Python · GitHub
Convert JSON Files to PDF in Python. GitHub Gist: instantly share code, notes, and snippets.
Find elsewhere
🌐
Reportlab
docs.reportlab.com › json2pdf
json2pdf - ReportLab Docs
Authorised json2pdf users can use our pypi server to download the json2pdf package and install it via pip. $ hg clone url $ python -mvirtualenv -p /path/to/desired/python .
🌐
ActiveState
code.activestate.com › recipes › 578979-convert-json-to-pdf-with-python-and-xtopdf
Convert JSON to PDF with Python and xtopdf « Python recipes « ActiveState Code
December 10, 2014 - This recipe show the basic steps needed to convert JSON input to PDF output, using Python and xtopdf, a PDF creation toolkit.
🌐
Aspose
products.aspose.cloud › aspose.cells › python › conversion › json to pdf conversion
Convert JSON to PDF using Python - Aspose Cloud
February 5, 2023 - This Cloud SDK empowers Python developers with powerful functionality and ensures high-quality PDF output. # For complete examples and data files, please go to https://github.com/aspose-cells-cloud/aspose-cells-cloud-python/ import os import shutil from asposecellscloud.apis.cells_api import CellsApi cells_api = CellsApi(os.getenv('ProductClientId'),os.getenv('ProductClientSecret')) file1 = cells_api.cells_workbook_put_convert_workbook("Book1.json",format="pdf") shutil.move(file1, "destFile.pdf")
🌐
PyPI
pypi.org › project › json2pdf
json2pdf
JavaScript is disabled in your browser. Please enable JavaScript to proceed · A required part of this site couldn’t load. This may be due to a browser extension, network issues, or browser settings. Please check your connection, disable any ad blockers, or try using a different browser
Top answer
1 of 2
2

There seems to be a lot of steps in your code. You could simply loop over the columns of your transposed df and export each of them to html. Append all html tables to a root html element and export with pdfkit:

import json
import pandas as pd
import lxml.etree as et
import pdfkit

your_json = """{"url": "https://www.abc123.com", "extensionVersion": "4.51.0", "axeVersion": "4.6.3", "standard": "WCAG 2.1 AA", "testingStartDate": "2023-04-03T09:35:06.177Z", "testingEndDate": "2023-04-03T09:35:06.177Z", "bestPracticesEnabled": false, "issueSummary": {"critical": 2, "moderate": 0, "minor": 0, "serious": 0, "bestPractices": 0, "needsReview": 0}, "remainingTestingSummary": {"run": false}, "igtSummary": [], "failedRules": [{"name": "button-name", "count": 1, "mode": "automated"}, {"name": "select-name", "count": 1, "mode": "automated"}], "needsReview": [], "allIssues": [{"ruleId": "button-name", "description": "Ensures buttons have discernible text", "help": "Buttons must have discernible text", "helpUrl": "https://www.abc123.com", "impact": "critical", "needsReview": false, "isManual": false, "selector": [".livechat-button"], "summary": "Fix any of the following:\\n  Element does not have inner text that is visible to screen readers\\n  aria-label attribute does not exist or is empty\\n  aria-labelledby attribute does not exist, references elements that do not exist or references elements that are empty\\n  Element has no title attribute\\n  Element's default semantics were not overridden with role=\\"none\\" or role=\\"presentation\\"", "source": "<button class=\\"livechat-button items-center bg-black shadow-liveChat rounded-full text-white p-2 h-12 transition-all opacity-0 pointer-events-none w-sp-48 opacity-0 pointer-events-none\\">", "tags": ["cat.name-role-value", "wcag2a", "wcag412", "section508", "section508.22.a", "ACT"], "igt": "", "shareURL": "", "createdAt": "2023-04-03T09:35:06.177Z", "testUrl": "", "testPageTitle": "ABC123", "foundBy": "[email protected]", "axeVersion": "4.6.3"}, {"ruleId": "select-name", "description": "Ensures select element has an accessible name", "help": "Select element must have an accessible name", "helpUrl": "https://www.abc123.com", "impact": "critical", "needsReview": false, "isManual": false, "selector": ["#plp__sortSelected"], "summary": "Fix any of the following:\\n  Form element does not have an implicit (wrapped) <label>\\n  Form element does not have an explicit <label>\\n  aria-label attribute does not exist or is empty\\n  aria-labelledby attribute does not exist, references elements that do not exist or references elements that are empty\\n  Element has no title attribute\\n  Element's default semantics were not overridden with role=\\"none\\" or role=\\"presentation\\"", "source": "<select class=\\"w-full absolute opacity-0 appearance-none text-value-small font-bold text-black uppercase cursor-pointer bg-transparent outline-0\\" id=\\"plp__sortSelected\\">", "tags": ["cat.forms", "wcag2a", "wcag412", "section508", "section508.22.n", "ACT"], "igt": "", "shareURL": "", "createdAt": "2023-04-03T09:35:06.177Z", "testUrl": "https://www.abc123.com", "testPageTitle": "ABC123", "foundBy": "[email protected]", "axeVersion": "4.6.3"}]}"""
data = json.loads(your_json)

## replace the above lines with the following in your case
# with open('your_file.json', 'r') as f:   
#     data = json.load(f)

html = et.Element("html")

# general info
html.append(et.fromstring(f"""<h3>Site link: <a href="{data['url']}">{data['url']}</a></h3>"""))
html.append(et.fromstring(f"""<h4>Date: {data['testingEndDate']}</h4>"""))
html.append(et.fromstring(f"""<h4>Summary:</h4>"""))

# summary table
summary = pd.Series(data['issueSummary'])
summary_table = et.fromstring(summary.to_frame().to_html(header=False))
summary_table.set('class', 'summary')
html.append(summary_table)

# issue tables
cols_of_interest = ['ruleId', 'description', 'help', 'impact', 'selector', 'summary', 'source']
df = pd.DataFrame(data['allIssues'])[cols_of_interest].T
for col in df.columns:
    table = et.fromstring(df[[col]].to_html(header=False))
    table.set('class', 'issue')
    html.append(table)
    html.append(et.fromstring('<br/>'))

pdfkit.from_string(et.tostring(html, encoding="unicode"), "./output.pdf", css='style.css')

With the following css file:

/* style.css */
* {
    font-family: 'Liberation Sans';
}

table {
    margin: 20px;
    margin-left: auto;
    margin-right: auto;
}

table.summary {
    width: 50%;
}

table.issue{
    border: 0;
    width: 100%;
    border-collapse: collapse;
  }
  
table.issue td,
table.issue th {
    border: 0;
    text-align: left;
    padding: 5px;
}

table.issue tr {
border-bottom: 1px solid #dddddd;
}

You'll get:

Edit: updated json with the data you provided + exporting additional data + improved css

Note: you will need to install wkhtmltopdf and make sure that it is in your path.

Edit2: limiting output to desired fields

2 of 2
0

disclaimer: I am the author of borb, the library used in this answer.

Assuming your data looks like this:

data = [
      {
         "ruleId":"name",
         "description":"Description123",
         "help":"Description234",
         "impact":"critical",
         "selector":[
            "abc1234"
         ],
         "summary":"long text",
         "source":"long text2",
      },
]

You can run the following code:

from borb.pdf import Document, Page, PageLayout, SingleColumnLayout, Paragraph, HexColor, Table, TableUtil
from decimal import Decimal

# create empty document
doc: Document = Document()

# create empty page
page: Page = Page()
doc.add_page(page)

# use a PageLayout to be able to add things easily
layout: PageLayout = SingleColumnLayout(page)

# generate a Table for each issue
for i, issue in enumerate(data):

  # add a header (Paragraph)
  layout.add(Paragraph("Issue %d" % i, font_size=Decimal(20), font_color=HexColor("#B5F8FE")))

  # add a Table (using the convenient TableUtil class)
  table: Table = TableUtil.from_2d_array([["Rule ID", issue.get("ruleId", "N.A.")],
                                          ["Description", issue.get("description", "N.A.")],
                                          ["Help", issue.get("help", "N.A.")],
                                          ["Impact", issue.get("impact", "N.A.")],
                                          ["Selector", str(issue.get("selector", []))],
                                          ["Summary", issue.get("summary", "N.A.")],
                                          ["Source", issue.get("source", "N.A.")],
                                          ], header_row=False, header_col=True, flexible_column_width=False)
  layout.add(table)

# store the PDF
with open("output.pdf", "wb") as fh:
  PDF.dumps(fh, doc)

This generates the following PDF:

🌐
ConvertAPI
convertapi.com › template-to-pdf › python
Dynamic PDF Python SDK - Generate PDFs using Word templates and JSON
Dynamic PDF Python library is a tool that allows you to dynamically generate PDF documents based on a MS Word (DOCX) template by injecting custom properties using a JSON object that contains your data.
🌐
PyPI
pypi.org › project › pydf2json
pydf2json · PyPI
PyDF2JSON simply creates a json structure out of PDF documents. It breaks a PDF document down into all its individual parts, and retains those parts for analysis. Once this is done, a more detailed analysis should be possible.
      » pip install pydf2json
    
Published   Sep 15, 2022
Version   2.4.0
Top answer
1 of 2
5

My guess is that you're expecting to see more structure in the JSON you are getting, like seeing a pair of curly braces or square brackets?. But curlies represent a dictionary (key/value pairs), and square brackets represent an array or list. What you are encoding as JSON is neither of those things.

page.extractText returns text from the PDF being read as a single Python string value. The JSON encoding of a Python string value is the text of that string within a pair of double quotes. So the JSON you're getting will be of the form:

"<text from pdf document>"

It doesn't matter what's in the PDF. Whatever text you get back from page.extractText will always be a single Python string. What you get when you encode that string as JSON will always be that same text, with double quotes before and after it.

Here's a little code to illustrate this:

import json
s1 = "This is a Python string.  A Python string encoded as JSON is the text of that string surrounded by double quotes"
print(s1)
print(json.dumps(s1))

Result:

This is a Python string.  A Python string encoded as JSON is the text of that string surrounded by double quotes
"This is a Python string.  A Python string encoded as JSON is the text of that string surrounded by double quotes"
2 of 2
3

Simply converting a string with json.dumps() will not yield your desired result, since the string first needs to be split into key-value pairs.

If you need to extract a lot of data from an unstructured PDF, you may want to consider using Adobe's extract PDF Python SDK. The API converts all the structural and text information from a PDF directly into JSON, so you don't have to do it manually.

The JSON data will contain an array of elements with information such as the following:

{
"Page": 1,
"Path": "//Document/P",
"Text": "The quick brown fox jumps over the lazy dog "
}
🌐
YouTube
youtube.com › watch
Python - How to Read OR Convert PDF Files into JSON files - YouTube
In this tutorial, you will learn "How to Read OR Convert PDF Files into JSON files" in Python .To read a PDF file page-wise into text and then add each page ...
Published   April 30, 2024
🌐
i2PDF
i2pdf.com › home › json to pdf
JSON to PDF Converter Online – Convert JSON File to PDF Free | i2PDF
By bridging the gap between structured data and readily consumable documentation, JSON to PDF conversion empowers businesses to unlock the full potential of their data and communicate effectively with a wider audience. As data continues to grow in volume and complexity, the importance of this conversion process will only continue to increase. 00:00 · JavaScript to PDF · Python to PDF ·
🌐
Serverless Forums
forum.serverless.com › serverless framework
Create JSON to PDF in Lambda using Python - Serverless Framework - Serverless Forums
December 27, 2018 - Hi All, I need to create PDF file using JSON on http request using python in AWS lambda and then store back the PDF in S3 bucket. Any help on this on how to proceds in saving the runtime pdf in S3 bucket. I am trying to use http://code.activestate.com/recipes/578979-convert-json-to-pdf-with-python-...