As far as I can tell, the proper way to do this in Python 2 is:
import requests, zipfile, StringIO
r = requests.get(zip_file_url, stream=True)
z = zipfile.ZipFile(StringIO.StringIO(r.content))
z.extractall()
of course you'd want to check that the GET was successful with r.ok.
For python 3+, sub the StringIO module with the io module and use BytesIO instead of StringIO: Here are release notes that mention this change.
import requests, zipfile, io
r = requests.get(zip_file_url)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall("/path/to/destination_directory")
Answer from yoavram on Stack OverflowAs far as I can tell, the proper way to do this in Python 2 is:
import requests, zipfile, StringIO
r = requests.get(zip_file_url, stream=True)
z = zipfile.ZipFile(StringIO.StringIO(r.content))
z.extractall()
of course you'd want to check that the GET was successful with r.ok.
For python 3+, sub the StringIO module with the io module and use BytesIO instead of StringIO: Here are release notes that mention this change.
import requests, zipfile, io
r = requests.get(zip_file_url)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall("/path/to/destination_directory")
Most people recommend using requests if it is available, and the requests documentation recommends this for downloading and saving raw data from a url:
import requests
def download_url(url, save_path, chunk_size=128):
r = requests.get(url, stream=True)
with open(save_path, 'wb') as fd:
for chunk in r.iter_content(chunk_size=chunk_size):
fd.write(chunk)
Since the answer asks about downloading and saving the zip file, I haven't gone into details regarding reading the zip file. See one of the many answers below for possibilities.
If for some reason you don't have access to requests, you can use urllib.request instead. It may not be quite as robust as the above.
import urllib.request
def download_url(url, save_path):
with urllib.request.urlopen(url) as dl_file:
with open(save_path, 'wb') as out_file:
out_file.write(dl_file.read())
Finally, if you are using Python 2 still, you can use urllib2.urlopen.
from contextlib import closing
def download_url(url, save_path):
with closing(urllib2.urlopen(url)) as dl_file:
with open(save_path, 'wb') as out_file:
out_file.write(dl_file.read())
I made RemoteZipFile to download individual files from INSIDE a .zip
Trying to scrape and download zipfiles
Web scraping and download zipfiles
Download Zip Files from a website using python
Videos
Link to unzip-http on Github
Hey everyone, this was originally part of my new readysetdata library, but it turned to be so useful that I cleaned it up and turned it into its own library.
Sometimes data is published in giant GB or even TB .zip archives, and you may only need a couple of files--sometimes you only just want to know what files are inside the archive! But the .zip central directory is at the end of the file, so you have to download the whole thing for any zip utility to work.
RemoteZipFile is a ZipFile-like object that can extract individual files using HTTP Range Requests. Given a URL it will generate ZipInfo objects for the files inside (now including the date/time), and allow you to open() a file and do whatever you want with it. Streaming (and read-only) of course.
I've also incorporated it into VisiData so if you use that, you can look forward in the next version (should be released in the next week or two) to just browsing online .zip files like it's nobody's business.
Both the library and command-line application can be installed from PyPI via pip install unzip-http. Share and enjoy!
I'll be the first to admit I'm not a programmer and am more of a hack it together kind of guy. But I thought this was a bit of an accomplishment on my part. I created this python script to scrape through a website and download all the .zip files on it and save them to a new directory.
Small challenges that I needed to over come included: The path to the zip files were relative paths and there for I needed to concatenate the paths in order for the urls to work. I needed to add error handling as one of the links was known to produce a 404. I also needed to give each downloaded zip file a new unique name, so needed to parse the zip file name out of the url.
#Author: Bellerophon_
#Date Created: 04/12/15
#Usage: Used to scrape a website for links that end in .zip and list them
#Requirements: BeautifulSoup lib
#Notes:
import urllib2
from urllib2 import Request, urlopen, URLError
#import urllib
import os
from bs4 import BeautifulSoup
#Create a new directory to put the files into
#Get the current working directory and create a new directory in it named test
cwd = os.getcwd()
newdir = cwd +"\\test"
print "The current Working directory is " + cwd
os.mkdir( newdir, 0777);
print "Created new directory " + newdir
newfile = open('zipfiles.txt','w')
print newfile
print "Running script.. "
#Set variable for page to be open and url to be concatenated
url = "http://www.elections.bc.ca"
page = urllib2.urlopen('http://www.elections.bc.ca/index.php/maps/electoral-maps/geographic-information-system-spatial-data-files-2012/').read()
#File extension to be looked for.
extension = ".zip"
#Use BeautifulSoup to clean up the page
soup = BeautifulSoup(page)
soup.prettify()
#Find all the links on the page that end in .zip
for anchor in soup.findAll('a', href=True):
links = url + anchor['href']
if links.endswith(extension):
newfile.write(links + '\n')
newfile.close()
#Read what is saved in zipfiles.txt and output it to the user
#This is done to create presistent data
newfile = open('zipfiles.txt', 'r')
for line in newfile:
print line + '/n'
newfile.close()
#Read through the lines in the text file and download the zip files.
#Handle exceptions and print exceptions to the console
with open('zipfiles.txt', 'r') as url:
for line in url:
if line:
try:
ziplink = line
#Removes the first 48 characters of the url to get the name of the file
zipfile = line[48:]
#Removes the last 4 characters to remove the .zip
zipfile2 = zipfile[:3]
print "Trying to reach " + ziplink
response = urllib2.urlopen(ziplink)
except URLError as e:
if hasattr(e, 'reason'):
print 'We failed to reach a server.'
print 'Reason: ', e.reason
continue
elif hasattr(e, 'code'):
print 'The server couldn\'t fulfill the request.'
print 'Error code: ', e.code
continue
else:
zipcontent = response.read()
completeName = os.path.join(newdir, zipfile2+ ".zip")
with open (completeName, 'w') as f:
print "downloading.. " + zipfile
f.write(zipcontent)
f.close()
print "Script completed"
» pip install zip-files
Below is a code snippet I used to fetch zipped csv file, please have a look:
Python 2:
from StringIO import StringIO
from zipfile import ZipFile
from urllib import urlopen
resp = urlopen("http://www.test.com/file.zip")
myzip = ZipFile(StringIO(resp.read()))
for line in myzip.open(file).readlines():
print line
Python 3:
from io import BytesIO
from zipfile import ZipFile
from urllib.request import urlopen
# or: requests.get(url).content
resp = urlopen("http://www.test.com/file.zip")
myzip = ZipFile(BytesIO(resp.read()))
for line in myzip.open(file).readlines():
print(line.decode('utf-8'))
Here file is a string. To get the actual string that you want to pass, you can use zipfile.namelist(). For instance,
resp = urlopen('http://mlg.ucd.ie/files/datasets/bbc.zip')
myzip = ZipFile(BytesIO(resp.read()))
myzip.namelist()
# ['bbc.classes', 'bbc.docs', 'bbc.mtx', 'bbc.terms']
My suggestion would be to use a StringIO object. They emulate files, but reside in memory. So you could do something like this:
# get_zip_data() gets a zip archive containing 'foo.txt', reading 'hey, foo'
import zipfile
from StringIO import StringIO
zipdata = StringIO()
zipdata.write(get_zip_data())
myzipfile = zipfile.ZipFile(zipdata)
foofile = myzipfile.open('foo.txt')
print foofile.read()
# output: "hey, foo"
Or more simply (apologies to Vishal):
myzipfile = zipfile.ZipFile(StringIO(get_zip_data()))
for name in myzipfile.namelist():
[ ... ]
In Python 3 use BytesIO instead of StringIO:
import zipfile
from io import BytesIO
filebytes = BytesIO(get_zip_data())
myzipfile = zipfile.ZipFile(filebytes)
for name in myzipfile.namelist():
[ ... ]




