Brave Search

How to convert JSON data to PDF using python script

stackoverflow.com › questions › 48262492 › how-to-convert-json-data-to-pdf-using-python-script

the module PDFWriter is in xtopdf

PDFWriter - a core class of the xtopdf toolkit - can now be used with a Python context manager, a.k.a. the Python with statement.

( http://code.activestate.com/recipes/578790-use-pdfwriter-with-context-manager-support/ )

how to install xtopdf is in https://bitbucket.org/vasudevram/xtopdf :

Installation and usage:

To install the files, first make sure that you have downloaded and installed all the prerequisities mentioned above, including setup steps such as adding needed directories to your PYTHONPATH. Then, copy all the files in xtopdf.zip into a directory which is on your PYTHONPATH.

To use any of the Python programs, run the .py file as:

python filename.py

This will give a usage message about the correct usage and arguments expected.

To run the shell script(s), do the same as above.

Developers can look at the source code for further information.

an alternative is to use pdfdocument to create the pdf, it can be installed using pip ( https://pypi.python.org/pypi/pdfdocument )

parse the data from the json data ( How can I parse GeoJSON with Python, Parse JSON in Python ) and print it as pdf using pdfdocument ( https://pypi.python.org/pypi/pdfdocument )

  import json
  data = json.loads(datastring)

from io import BytesIO
from pdfdocument.document import PDFDocument

def say_hello():
    f = BytesIO()
    pdf = PDFDocument(f)
    pdf.init_report()
    pdf.h1('Hello World')
    pdf.p('Creating PDFs made easy.')
    pdf.generate()
    return f.getvalue()

Answer from ralf htp on Stack Overflow

PyPI

pypi.org › project › json2pdf-Converter

json2pdf-Converter

JavaScript is disabled in your browser. Please enable JavaScript to proceed · A required part of this site couldn’t load. This may be due to a browser extension, network issues, or browser settings. Please check your connection, disable any ad blockers, or try using a different browser

Flinks

help.flinks.com › support › solutions › articles › 43000729546-how-to-convert-the-a-json-payload-to-a-pdf-file-python-

How to convert the a .json payload to a .pdf file (python) :

June 6, 2024 - import json from fpdf import FPDF def json_to_pdf(json_path, pdf_path): # Read JSON file with open(json_path, 'r') as f: data = json.load(f) # Convert JSON data to a pretty printed string json_str = json.dumps(data, indent=4) # Create a PDF ...

Discussions

parse pdf to json using python

No, because the pdf format does not save the document structure. The way pdf works is by saving the absolute position of things, not the relative position. More on reddit.com

r/learnpython

May 13, 2023

python - Transform a json file into a pdf - Stack Overflow

I'm having trouble creating a table that allows me to make items that are too long wrap automatically and not overflow to the right side. I paste an example of the json code that I should transform into pdf and then my implementation in python (which unfortunately returns a bad result) More on stackoverflow.com

stackoverflow.com

Convert pdf data to JSON format using Python? - Stack Overflow

I am trying to print data in JSON format but it is being printed in text format import PyPDF2 import json pdf_file = open('data.pdf', 'rb') read_pdf = PyPDF2.PdfFileReader(pdf_file) number_of_page... More on stackoverflow.com

stackoverflow.com

Create JSON to PDF in Lambda using Python - Serverless Framework - Serverless Forums

Hi All, I need to create PDF file using JSON on http request using python in AWS lambda and then store back the PDF in S3 bucket. Any help on this on how to proceds in saving the runtime pdf in S3 bucket. I am trying to use http://code.activestate.com/recipes/578979-convert-json-to-pdf-with-python-... More on forum.serverless.com

forum.serverless.com

December 27, 2018

Videos

20:48

YouTube

PDF to JSON: LLM-Powered Data Extraction In Python - YouTube

Easiest Way to Convert a PDF to JSON using LangChain Output Parsers ...

python code to convert pdf to json - YouTube

January 20, 2024

youtube.com

How to Convert PDF to JSON from a File in Python ... - YouTube

youtube.com

How to convert PDFs to JSON

View all

Stack Overflow

stackoverflow.com › questions › 48262492 › how-to-convert-json-data-to-pdf-using-python-script

How to convert JSON data to PDF using python script - Stack Overflow

Top answer

1 of 3

the module PDFWriter is in xtopdf

PDFWriter - a core class of the xtopdf toolkit - can now be used with a Python context manager, a.k.a. the Python with statement.

( http://code.activestate.com/recipes/578790-use-pdfwriter-with-context-manager-support/ )

how to install xtopdf is in https://bitbucket.org/vasudevram/xtopdf :

Installation and usage:

To install the files, first make sure that you have downloaded and installed all the prerequisities mentioned above, including setup steps such as adding needed directories to your PYTHONPATH. Then, copy all the files in xtopdf.zip into a directory which is on your PYTHONPATH.

To use any of the Python programs, run the .py file as:

python filename.py

This will give a usage message about the correct usage and arguments expected.

To run the shell script(s), do the same as above.

Developers can look at the source code for further information.

an alternative is to use pdfdocument to create the pdf, it can be installed using pip ( https://pypi.python.org/pypi/pdfdocument )

parse the data from the json data ( How can I parse GeoJSON with Python, Parse JSON in Python ) and print it as pdf using pdfdocument ( https://pypi.python.org/pypi/pdfdocument )

  import json
  data = json.loads(datastring)

from io import BytesIO
from pdfdocument.document import PDFDocument

def say_hello():
    f = BytesIO()
    pdf = PDFDocument(f)
    pdf.init_report()
    pdf.h1('Hello World')
    pdf.p('Creating PDFs made easy.')
    pdf.generate()
    return f.getvalue()

2 of 3

from json2html import *
import json
import tempfile

class PdfConverter(object):

    def __init__(self):
        pass

    def to_html(self, json_doc):
        return json2html.convert(json=json_doc)

    def to_pdf(self, html_str):
        return pdfkit.from_string(html_str, None)

 def main():
     stowflw = {
     "data": [
        {
            "state": "Manchester",
            "quantity": 20
        },
       {
            "state": "Surrey",
            "quantity": 46
       },
       {
            "state": "Scotland",
            "quantity": 36
       },
       {
            "state": "Kent",
            "quantity": 23
       },
       {
             "state": "Devon",
             "quantity": 43
       },
       {
             "state": "Glamorgan",
             "quantity": 43
       }
     ]
   }

    pdfc = PdfConverter()
    with open("sample.pdf", "wb") as pdf_fl:
       pdf_fl.write(pdfc.to_pdf(pdfc.to_html(json.dumps(stowflw))))

install json2html
install pdfkit (requires wkhtmltox)

Aspose

products.aspose.com › aspose.cells › python via java › conversion › json to pdf

Python JSON to PDF - JSON to PDF Converter | products.aspose.com

November 13, 2025 - Add a library reference (import the library) to your Python project. Load JSON file with an instance of Workbook. Convert JSON to PDF by calling Workbook.save method.

SysTools Group

systoolsgroup.com › home › how to convert json to pdf in 2026? 3 expert ways

Top Methods to Convert JSON to PDF (Free, Python & Converter)

January 2, 2026 - Learn how to convert JSON to PDF for free with online tools, Python scripts, and Excel. Step-by-step JSON to PDF converter guide included!

reddit.com › r/learnpython › parse pdf to json using python

r/learnpython on Reddit: parse pdf to json using python

May 13, 2023 -

Im searching for a while now for a library that can parse a pdf to json or xml format while keeping the document structure.
the popular libs like pypdf do often not preserve the document structure. Thought about using teseract for OCR and then transforming it into a json format but could not get it working. Is there a library that can parse pdf to json format while preserving the document structure and not just spitt out a block of text ?

Top answer

1 of 4

No, because the pdf format does not save the document structure. The way pdf works is by saving the absolute position of things, not the relative position.

2 of 4

With this tool you can also make the conversions to JSON: https://monkt.com/ API or UI should be fine for your use case. You can define a schema and include in the final JSON whatever you want from the PDF.

GitHub

gist.github.com › aspose-com-gists › 259a761108688d6db5481256c8606c44

Convert JSON Files to PDF in Python · GitHub

Convert JSON Files to PDF in Python. GitHub Gist: instantly share code, notes, and snippets.

Find elsewhere

Google Bing Mojeek

Reportlab

docs.reportlab.com › json2pdf

json2pdf - ReportLab Docs

Authorised json2pdf users can use our pypi server to download the json2pdf package and install it via pip. $ hg clone url $ python -mvirtualenv -p /path/to/desired/python .

ActiveState

code.activestate.com › recipes › 578979-convert-json-to-pdf-with-python-and-xtopdf

Convert JSON to PDF with Python and xtopdf « Python recipes « ActiveState Code

December 10, 2014 - This recipe show the basic steps needed to convert JSON input to PDF output, using Python and xtopdf, a PDF creation toolkit.

Aspose

products.aspose.cloud › aspose.cells › python › conversion › json to pdf conversion

Convert JSON to PDF using Python - Aspose Cloud

February 5, 2023 - This Cloud SDK empowers Python developers with powerful functionality and ensures high-quality PDF output. # For complete examples and data files, please go to https://github.com/aspose-cells-cloud/aspose-cells-cloud-python/ import os import shutil from asposecellscloud.apis.cells_api import CellsApi cells_api = CellsApi(os.getenv('ProductClientId'),os.getenv('ProductClientSecret')) file1 = cells_api.cells_workbook_put_convert_workbook("Book1.json",format="pdf") shutil.move(file1, "destFile.pdf")

PyPI

pypi.org › project › json2pdf

json2pdf

Stack Overflow

stackoverflow.com › questions › 75660127 › transform-a-json-file-into-a-pdf

python - Transform a json file into a pdf - Stack Overflow

Top answer

1 of 2

There seems to be a lot of steps in your code. You could simply loop over the columns of your transposed df and export each of them to html. Append all html tables to a root html element and export with pdfkit:

import json
import pandas as pd
import lxml.etree as et
import pdfkit

your_json = """{"url": "https://www.abc123.com", "extensionVersion": "4.51.0", "axeVersion": "4.6.3", "standard": "WCAG 2.1 AA", "testingStartDate": "2023-04-03T09:35:06.177Z", "testingEndDate": "2023-04-03T09:35:06.177Z", "bestPracticesEnabled": false, "issueSummary": {"critical": 2, "moderate": 0, "minor": 0, "serious": 0, "bestPractices": 0, "needsReview": 0}, "remainingTestingSummary": {"run": false}, "igtSummary": [], "failedRules": [{"name": "button-name", "count": 1, "mode": "automated"}, {"name": "select-name", "count": 1, "mode": "automated"}], "needsReview": [], "allIssues": [{"ruleId": "button-name", "description": "Ensures buttons have discernible text", "help": "Buttons must have discernible text", "helpUrl": "https://www.abc123.com", "impact": "critical", "needsReview": false, "isManual": false, "selector": [".livechat-button"], "summary": "Fix any of the following:\\n  Element does not have inner text that is visible to screen readers\\n  aria-label attribute does not exist or is empty\\n  aria-labelledby attribute does not exist, references elements that do not exist or references elements that are empty\\n  Element has no title attribute\\n  Element's default semantics were not overridden with role=\\"none\\" or role=\\"presentation\\"", "source": "<button class=\\"livechat-button items-center bg-black shadow-liveChat rounded-full text-white p-2 h-12 transition-all opacity-0 pointer-events-none w-sp-48 opacity-0 pointer-events-none\\">", "tags": ["cat.name-role-value", "wcag2a", "wcag412", "section508", "section508.22.a", "ACT"], "igt": "", "shareURL": "", "createdAt": "2023-04-03T09:35:06.177Z", "testUrl": "", "testPageTitle": "ABC123", "foundBy": "[email protected]", "axeVersion": "4.6.3"}, {"ruleId": "select-name", "description": "Ensures select element has an accessible name", "help": "Select element must have an accessible name", "helpUrl": "https://www.abc123.com", "impact": "critical", "needsReview": false, "isManual": false, "selector": ["#plp__sortSelected"], "summary": "Fix any of the following:\\n  Form element does not have an implicit (wrapped) <label>\\n  Form element does not have an explicit <label>\\n  aria-label attribute does not exist or is empty\\n  aria-labelledby attribute does not exist, references elements that do not exist or references elements that are empty\\n  Element has no title attribute\\n  Element's default semantics were not overridden with role=\\"none\\" or role=\\"presentation\\"", "source": "<select class=\\"w-full absolute opacity-0 appearance-none text-value-small font-bold text-black uppercase cursor-pointer bg-transparent outline-0\\" id=\\"plp__sortSelected\\">", "tags": ["cat.forms", "wcag2a", "wcag412", "section508", "section508.22.n", "ACT"], "igt": "", "shareURL": "", "createdAt": "2023-04-03T09:35:06.177Z", "testUrl": "https://www.abc123.com", "testPageTitle": "ABC123", "foundBy": "[email protected]", "axeVersion": "4.6.3"}]}"""
data = json.loads(your_json)

## replace the above lines with the following in your case
# with open('your_file.json', 'r') as f:   
#     data = json.load(f)

html = et.Element("html")

# general info
html.append(et.fromstring(f"""<h3>Site link: <a href="{data['url']}">{data['url']}</a></h3>"""))
html.append(et.fromstring(f"""<h4>Date: {data['testingEndDate']}</h4>"""))
html.append(et.fromstring(f"""<h4>Summary:</h4>"""))

# summary table
summary = pd.Series(data['issueSummary'])
summary_table = et.fromstring(summary.to_frame().to_html(header=False))
summary_table.set('class', 'summary')
html.append(summary_table)

# issue tables
cols_of_interest = ['ruleId', 'description', 'help', 'impact', 'selector', 'summary', 'source']
df = pd.DataFrame(data['allIssues'])[cols_of_interest].T
for col in df.columns:
    table = et.fromstring(df[[col]].to_html(header=False))
    table.set('class', 'issue')
    html.append(table)
    html.append(et.fromstring('<br/>'))

pdfkit.from_string(et.tostring(html, encoding="unicode"), "./output.pdf", css='style.css')

With the following css file:

/* style.css */
* {
    font-family: 'Liberation Sans';
}

table {
    margin: 20px;
    margin-left: auto;
    margin-right: auto;
}

table.summary {
    width: 50%;
}

table.issue{
    border: 0;
    width: 100%;
    border-collapse: collapse;
  }
  
table.issue td,
table.issue th {
    border: 0;
    text-align: left;
    padding: 5px;
}

table.issue tr {
border-bottom: 1px solid #dddddd;
}

You'll get:

Edit: updated json with the data you provided + exporting additional data + improved css

Note: you will need to install wkhtmltopdf and make sure that it is in your path.

Edit2: limiting output to desired fields

2 of 2

disclaimer: I am the author of borb, the library used in this answer.

Assuming your data looks like this:

data = [
      {
         "ruleId":"name",
         "description":"Description123",
         "help":"Description234",
         "impact":"critical",
         "selector":[
            "abc1234"
         ],
         "summary":"long text",
         "source":"long text2",
      },
]

You can run the following code:

from borb.pdf import Document, Page, PageLayout, SingleColumnLayout, Paragraph, HexColor, Table, TableUtil
from decimal import Decimal

# create empty document
doc: Document = Document()

# create empty page
page: Page = Page()
doc.add_page(page)

# use a PageLayout to be able to add things easily
layout: PageLayout = SingleColumnLayout(page)

# generate a Table for each issue
for i, issue in enumerate(data):

  # add a header (Paragraph)
  layout.add(Paragraph("Issue %d" % i, font_size=Decimal(20), font_color=HexColor("#B5F8FE")))

  # add a Table (using the convenient TableUtil class)
  table: Table = TableUtil.from_2d_array([["Rule ID", issue.get("ruleId", "N.A.")],
                                          ["Description", issue.get("description", "N.A.")],
                                          ["Help", issue.get("help", "N.A.")],
                                          ["Impact", issue.get("impact", "N.A.")],
                                          ["Selector", str(issue.get("selector", []))],
                                          ["Summary", issue.get("summary", "N.A.")],
                                          ["Source", issue.get("source", "N.A.")],
                                          ], header_row=False, header_col=True, flexible_column_width=False)
  layout.add(table)

# store the PDF
with open("output.pdf", "wb") as fh:
  PDF.dumps(fh, doc)

This generates the following PDF:

ConvertAPI

convertapi.com › template-to-pdf › python

Dynamic PDF Python SDK - Generate PDFs using Word templates and JSON

Dynamic PDF Python library is a tool that allows you to dynamically generate PDF documents based on a MS Word (DOCX) template by injecting custom properties using a JSON object that contains your data.

PyPI

pypi.org › project › pydf2json

pydf2json · PyPI

PyDF2JSON simply creates a json structure out of PDF documents. It breaks a PDF document down into all its individual parts, and retains those parts for analysis. Once this is done, a more detailed analysis should be possible.

      » pip install pydf2json

Published Sep 15, 2022

Version 2.4.0

Homepage https://github.com/kingaling/pydf2json

Stack Overflow

stackoverflow.com › questions › 65546921 › convert-pdf-data-to-json-format-using-python

Convert pdf data to JSON format using Python? - Stack Overflow

Top answer

1 of 2

My guess is that you're expecting to see more structure in the JSON you are getting, like seeing a pair of curly braces or square brackets?. But curlies represent a dictionary (key/value pairs), and square brackets represent an array or list. What you are encoding as JSON is neither of those things.

page.extractText returns text from the PDF being read as a single Python string value. The JSON encoding of a Python string value is the text of that string within a pair of double quotes. So the JSON you're getting will be of the form:

"<text from pdf document>"

It doesn't matter what's in the PDF. Whatever text you get back from page.extractText will always be a single Python string. What you get when you encode that string as JSON will always be that same text, with double quotes before and after it.

Here's a little code to illustrate this:

import json
s1 = "This is a Python string.  A Python string encoded as JSON is the text of that string surrounded by double quotes"
print(s1)
print(json.dumps(s1))

Result:

This is a Python string.  A Python string encoded as JSON is the text of that string surrounded by double quotes
"This is a Python string.  A Python string encoded as JSON is the text of that string surrounded by double quotes"

2 of 2

Simply converting a string with json.dumps() will not yield your desired result, since the string first needs to be split into key-value pairs.

If you need to extract a lot of data from an unstructured PDF, you may want to consider using Adobe's extract PDF Python SDK. The API converts all the structural and text information from a PDF directly into JSON, so you don't have to do it manually.

The JSON data will contain an array of elements with information such as the following:

{
"Page": 1,
"Path": "//Document/P",
"Text": "The quick brown fox jumps over the lazy dog "
}

GitHub

gist.github.com › aspose-com-kb › 437f13752f9679dbf7094d3a5dad496e

How to Convert JSON to PDF in Python. For more details: https://kb.aspose.com/cells/python/how-to-convert-json-to-pdf-in-python/ · GitHub

How to Convert JSON to PDF in Python.

YouTube

youtube.com › watch

Python - How to Read OR Convert PDF Files into JSON files - YouTube

07:41

In this tutorial, you will learn "How to Read OR Convert PDF Files into JSON files" in Python .To read a PDF file page-wise into text and then add each page ...

Published April 30, 2024

i2PDF

i2pdf.com › home › json to pdf

JSON to PDF Converter Online – Convert JSON File to PDF Free | i2PDF

By bridging the gap between structured data and readily consumable documentation, JSON to PDF conversion empowers businesses to unlock the full potential of their data and communicate effectively with a wider audience. As data continues to grow in volume and complexity, the importance of this conversion process will only continue to increase. 00:00 · JavaScript to PDF · Python to PDF ·

Serverless Forums

forum.serverless.com › serverless framework

Create JSON to PDF in Lambda using Python - Serverless Framework - Serverless Forums

December 27, 2018 - Hi All, I need to create PDF file using JSON on http request using python in AWS lambda and then store back the PDF in S3 bucket. Any help on this on how to proceds in saving the runtime pdf in S3 bucket. I am trying to use http://code.activestate.com/recipes/578979-convert-json-to-pdf-with-python-...

Nanonets

tools.nanonets.com › pdf-to-json

Convert PDF documents into JSON instantly

We cannot provide a description for this page right now