If you have pip, PyPDF2 is on the Python Package Index, so you can install it with the following in your terminal/command prompt:
Python 2:
pip install PyPDF2
Python 3:
pip3 install PyPDF2
Videos
» pip install PyPDF2
If you have pip, PyPDF2 is on the Python Package Index, so you can install it with the following in your terminal/command prompt:
Python 2:
pip install PyPDF2
Python 3:
pip3 install PyPDF2
To install setup.py files under Windows you can choose this way with the command line:
- hit windows key
- type cmd
- excute the command line (black window)
- type
cd C:\Users\User\Downloads\pyPDF2to go into the directory where thesetup.pyis (this is mine if I downloaded it) The path can be copied from the explorer window. - type
dirnow you should see the name setup.py in the listing of all contents - type
C:\python27\python.exe setup.py installI use Python2.7 here. UseC:\python33\python.exe setup.py installfor python 3.3 and so on. You can follow these instructions now if you wish: http://docs.python.org/2/install/index.html
Another way, that does not show when there are problems, is:
- create a shortcut to
setup.py - open the properties of the shortcut. There should be a path like this:
C:\Users\User\Downloads\pyPDF2\setup.py(this is where my setup.py is) you modify that path in the following way:
"C:\Users\User\Downloads\pyPDF2\setup.py" installThe
"are important if you have white spaces in the path name- click OK to save the modifications to the setup.py - shortcut
- double-click the setup.py - shortcut.
In all cases you may need to restart your python to be able to import the module.
When you do this feel free to post your solution also with pictures for other newbies looking for it.
You need to extract text from the PDF pages using extract_text:
import PyPDF2
with open('dummy.pdf', 'rb') as file:
reader = PyPDF2.PdfReader(file)
for page in reader.pages:
print(page.extract_text())
Check the documentation here
Following code will give you text instead of object
import PyPDF2
with open('dummy.pdf', 'rb') as file:
reader = PyPDF2.PdfReader(file)
#print(reader)
for page in reader.pages:
text = page.extract_text()
print(text)