After struggling a little bit, I found a way.

For each page of the file, it was necessary to define into tabula's read_pdf function the area of the table and the limits of the columns.

Here is the working code:

import pypdf
from tabula import read_pdf

# Get the number of pages in the file
pdf_reader = pypdf.PdfReader(pdf_file)
n_pages = len(pdf_reader.pages)

# For each page the table can be read with the following code
table_pdf = read_pdf(
    pdf_file,
    guess=False,
    pages=1,
    stream=True,
    encoding="utf-8",
    area=(96, 24, 558, 750),
    columns=(24, 127, 220, 274, 298, 325, 343, 364, 459, 545, 591, 748),
)
Answer from fmarques on Stack Overflow
🌐
Realcode4you
realcode4you.com › post › extracting-text-tables-from-pdfs-using-pypdf2-library-in-python-nlp-assignment-help
Extracting Text, Tables From PDFs Using PyPDF2 Library in Python | NLP Assignment Help
February 28, 2022 - In this blog, you will learn how you can extract tables in PDF using PyPDF2 library in Python.#!pip install PyPDF2 camelot-py tabula-py #conda install -c conda-forge camelot-py import PyPDF2#Read the PDF File fileName = 'WhenisEarlyClassifi...
Discussions

PyPDF2 to parse through tables in PDF

check out this thread

https://www.reddit.com/r/learnpython/comments/7x9inm/project_help_pdf_extraction_to_csv/du7jyzf/?context=3

More on reddit.com
🌐 r/learnpython
9
6
February 14, 2018
python - How to extract table value from pdf using PYPDF2? - Stack Overflow
I am trying to search through a pdf file to find the value associated with "Unit of Issue" or UI. I have a lot of pdfs to look through with potentially varying format. Here's a sample pdf and below... More on stackoverflow.com
🌐 stackoverflow.com
September 5, 2019
python - PyPDF2 : extract table of contents/outlines and their page number - Stack Overflow
I am trying to extract the TOC/outlines from PDFs and their page number using Python (PyPDF2), I am aware of the reader.outlines but it does not return the correct page number. Pdf example: https:/... More on stackoverflow.com
🌐 stackoverflow.com
Extract text and tables of a PDF file in Python - Stack Overflow
I am looking for a solution to extract both text and tables out of a PDF file. While some packages are good for extracting text, they are not enough good to extract tables. One solution would be u... More on stackoverflow.com
🌐 stackoverflow.com
🌐
PyPI
pypi.org › project › pypdf-table-extraction
pypdf-table-extraction · PyPI
pypdf_table_extraction Formerly known as Camelot is a Python library that can help you extract tables from PDFs!
      » pip install pypdf-table-extraction
    
Published   Apr 02, 2025
Version   1.0.2
🌐
Unstract
unstract.com › home › product › python libraries to extract table from pdf
Best Python Libraries to Extract Tables From PDF in 2026
December 16, 2025 - It is because there is currently an incompatibility of Camelot with PyPDF2 ≥ 3.0.0, so you might need to specify an older version of PyPDF2: ... import camelot # Extract tables from the PDF tables = camelot.read_pdf('best-unicef-1.pdf') # Print the number of tables extracted print(f"Number of tables extracted: {len(tables)}") # Print the first table print(tables[0].df)
🌐
GitHub
github.com › softhints › python › blob › master › notebooks › Python Extract Table from PDF.ipynb
python/notebooks/Python Extract Table from PDF.ipynb at master · softhints/python
" print (tabulate(tables[1].df))\n", " except IndexError:\n", " print('NOK')\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Extract by PyPDF2\n", "\n", "#### Installation\n", "\n", "https://pypi.org/project/PyPDF2/\n", "\n", "`pip install PyPDF2`" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "ename": "FileNotFoundError", "evalue": "[Errno 2] No such file or directory: './tmp/pdf/Food Calories List.pdf'", "output_type": "error",
Author   softhints
🌐
Stack Overflow
stackoverflow.com › questions › 57797227 › how-to-extract-table-value-from-pdf-using-pypdf2
python - How to extract table value from pdf using PYPDF2? - Stack Overflow
September 5, 2019 - Here's a sample pdf and below is a screenshot of the top of the page with the table: ... import PyPDF2 try: pdfFileObj = open('test.pdf', 'rb') pdfReader = PyPDF2.PdfFileReader(pdfFileObj) pageNumber = pdfReader.numPages page = pdfReader.getPage(0) print(pageNumber) pagecontent = page.extractText() print(pagecontent) except Exception as e: print(e)
Find elsewhere
🌐
Towards Data Science
towardsdatascience.com › home › latest › 5 python open-source tools to extract text and tabular data from pdf files
5 Python open-source tools to extract text and tabular data from PDF Files | Towards Data Science
March 5, 2025 - pip install PyPDF2 · Most of the time, Businesses look for solutions to convert data of PDF files into editable formats. Such a task can be performed using the following python libraries: tabula-py and Camelot. We use this Food Calories list to highlight the scenario. This library is a python wrapper of tabula-java, used to read tables from PDF files, and convert those tables into xlsx, csv, tsv, and JSON files.
🌐
Grippybyte
blog.grippybyte.com › extracting-tables-from-pdf-documents-using-pypdf2-in-python
pypdf2 extract table
April 23, 2024 - Firstly, you need to identify the structure of the table within the text extracted by PyPDF2. Typically, tables in PDFs are represented as plain text formatted in a consistent manner. Look for patterns such as equal spacing, newline characters (\n) at the end of a row, or specific keywords that indicate the start or end of a table.
🌐
woteq
woteq.com › home › how to extract table data using pypdf2
How to extract table data using PyPDF2 - Woteq Zone
February 10, 2026 - For this level of control, other libraries like pdfplumber or PyMuPDF (fitz) are significantly more powerful. However, for the sake of understanding the process with PyPDF2, let’s consider a scenario where the text extraction works reasonably well. If your table data is separated by a consistent delimiter like multiple spaces or tabs, you can use Python’s string methods to parse it. # Let's assume 'full_text' contains the text from our PDF lines = full_text.split('\n') # Split the text into lines table_data = [] for line in lines: # If a line looks like it has multiple columns (split by 2
🌐
Artifex
artifex.com › blog › table-recognition-extraction-from-pdfs-pymupdf-python
Table Recognition and Extraction With PyMuPDF | Artifex
August 23, 2023 - PyMuPDF offers a straightforward and efficient method for extracting tables from PDF (and other document type) pages. Table data are extracted to elementary Python object types which easily lend themselves to be further processed by downstream software, for instance pandas.
🌐
Medium
medium.com › @winston.smith.spb › python-an-easy-way-to-extract-data-from-pdf-tables-c8de22308341
Python: An easy way to extract data from PDF tables | by dmitriiweb | Medium
April 30, 2020 - For this reason, the PyPDF2 can return useless jumble of signs or you can see PyPDF2.utils.PdfReadError: EOF marker not found error. These problems could be solved, but it makes sense only if you have a few files, so, my suggestion is to use another library — pdfminer.six. With pdfminer.six we also can extract text data from PDF documents:
🌐
GeeksforGeeks
geeksforgeeks.org › python › how-to-extract-pdf-tables-in-python
How to Extract PDF Tables in Python? - GeeksforGeeks
July 23, 2025 - If you don’t mind installing a bit of Java on your computer, Tabula-py is a powerful helper that uses a popular Java tool behind the scenes. It’s super good at grabbing tables from PDFs, even complex ones, and hands you the data as tidy tables inside Python.
🌐
Saturn Cloud
saturncloud.io › blog › how-to-open-a-pdf-and-read-in-tables-with-python-pandas
How to Open a PDF and Read in Tables with Python Pandas | Saturn Cloud Blog
December 15, 2023 - In this article, we have demonstrated how to open a PDF file and read in tables using Python pandas. We have covered the installation of required libraries, opening a PDF file with PyPDF2, reading tables from PDFs with pandas, cleaning and manipulating extracted tables, and exporting tables to CSV or Excel.
🌐
DataScientYst
datascientyst.com › extract-table-from-pdf-with-python-pandas
How to Extract Table from PDF with Python and Pandas
February 14, 2025 - PyPDF2 - A pure-python PDF library capable of splitting, merging, cropping, and transforming PDF files · html-table-parser-python3 - parse HTML tables with Python 3 to list of values · tablextract - extracts the information represented in any HTML table
🌐
Qxf2 BLOG
qxf2.com › home › extracting data from pdfs using python
Extracting data from PDFs using Python
April 2, 2018 - But it can extract text and return it as a Python string. Reading a PDF document is pretty simple and straight forward. I used PdfFileReader() and PdfFileWriter() classes for reading and writing the table data.
🌐
Python Forum
python-forum.io › thread-39210.html
Extracting tables and text above the table from a PDF to CSV
January 16, 2023 - Hi I have a PDF file from where i need to extract all the tables and also the text above the tables and output the results to a csv file.By using tabula, i have tried extracting the tables, but i am not sure on how to extract the texts which are abo...