You've stated that you need to support "tar, bz2, zip or tar.gz". Python's tarfile module will automatically handle gz and bz2 compressed tar files, so there is really only 2 types of archive that you need to support: tar and zip. (bz2 by itself is not an archive format, it's just compression).
You can determine whether a given file is a tar file with tarfile.is_tarfile(). This will also work on tar files compressed with gzip or bzip2 compression. Within a tar file you can determine whether a file is a directory using TarInfo.isdir() or a file with TarInfo.isfile().
Similarly you can determine whether a file is a zip file using zipfile.is_zipfile(). With zipfile there is no method to distinguish directories from normal file, but files that end with / are directories.
So, given a file name, you can do this:
import zipfile
import tarfile
filename = 'test.tgz'
if tarfile.is_tarfile(filename):
f = tarfile.open(filename)
for info in f:
if info.isdir():
file_type = 'directory'
elif info.isfile():
file_type = 'file'
else:
file_type = 'unknown'
print('{} is a {}'.format(info.name, file_type))
elif zipfile.is_zipfile(filename):
f = zipfile.ZipFile(filename)
for name in f.namelist():
print('{} is a {}'.format(name, 'directory' if name.endswith('/') else 'file'))
else:
print('{} is not an accepted archive file'.format(filename))
When run on a tar file with this structure:
(py2)[mhawke@localhost tmp]$ tar tvfz /tmp/test.tgz drwxrwxr-x mhawke/mhawke 0 2016-02-29 12:38 x/ lrwxrwxrwx mhawke/mhawke 0 2016-02-29 12:38 x/4 -> 3 drwxrwxr-x mhawke/mhawke 0 2016-02-28 21:14 x/3/ drwxrwxr-x mhawke/mhawke 0 2016-02-28 21:14 x/3/4/ -rw-rw-r-- mhawke/mhawke 0 2016-02-28 21:14 x/3/4/zzz drwxrwxr-x mhawke/mhawke 0 2016-02-28 21:13 x/2/ -rw-rw-r-- mhawke/mhawke 0 2016-02-28 21:13 x/2/aa drwxrwxr-x mhawke/mhawke 0 2016-02-28 21:13 x/1/ -rw-rw-r-- mhawke/mhawke 0 2016-02-28 21:13 x/1/abc -rw-rw-r-- mhawke/mhawke 0 2016-02-28 21:13 x/1/ab -rw-rw-r-- mhawke/mhawke 0 2016-02-28 21:13 x/1/a
The output is:
x is a directory x/4 is a unknown x/3 is a directory x/3/4 is a directory x/3/4/zzz is a file x/2 is a directory x/2/aa is a file x/1 is a directory x/1/abc is a file x/1/ab is a file x/1/a is a file
Notice that x/4 is "unknown" because it is a symbolic link.
There is no easy way, with zipfile, to distinguish a symlink (or other file types) from a directory or normal file. The information is there in the ZipInfo.external_attr attribute, but it's messy to get it back out:
import stat
linked_file = f.filelist[1]
is_symlink = stat.S_ISLNK(linked_file.external_attr >> 16L)
Answer from mhawke on Stack Overflow
» pip install python-archive
Videos
» pip install Archive
You've stated that you need to support "tar, bz2, zip or tar.gz". Python's tarfile module will automatically handle gz and bz2 compressed tar files, so there is really only 2 types of archive that you need to support: tar and zip. (bz2 by itself is not an archive format, it's just compression).
You can determine whether a given file is a tar file with tarfile.is_tarfile(). This will also work on tar files compressed with gzip or bzip2 compression. Within a tar file you can determine whether a file is a directory using TarInfo.isdir() or a file with TarInfo.isfile().
Similarly you can determine whether a file is a zip file using zipfile.is_zipfile(). With zipfile there is no method to distinguish directories from normal file, but files that end with / are directories.
So, given a file name, you can do this:
import zipfile
import tarfile
filename = 'test.tgz'
if tarfile.is_tarfile(filename):
f = tarfile.open(filename)
for info in f:
if info.isdir():
file_type = 'directory'
elif info.isfile():
file_type = 'file'
else:
file_type = 'unknown'
print('{} is a {}'.format(info.name, file_type))
elif zipfile.is_zipfile(filename):
f = zipfile.ZipFile(filename)
for name in f.namelist():
print('{} is a {}'.format(name, 'directory' if name.endswith('/') else 'file'))
else:
print('{} is not an accepted archive file'.format(filename))
When run on a tar file with this structure:
(py2)[mhawke@localhost tmp]$ tar tvfz /tmp/test.tgz drwxrwxr-x mhawke/mhawke 0 2016-02-29 12:38 x/ lrwxrwxrwx mhawke/mhawke 0 2016-02-29 12:38 x/4 -> 3 drwxrwxr-x mhawke/mhawke 0 2016-02-28 21:14 x/3/ drwxrwxr-x mhawke/mhawke 0 2016-02-28 21:14 x/3/4/ -rw-rw-r-- mhawke/mhawke 0 2016-02-28 21:14 x/3/4/zzz drwxrwxr-x mhawke/mhawke 0 2016-02-28 21:13 x/2/ -rw-rw-r-- mhawke/mhawke 0 2016-02-28 21:13 x/2/aa drwxrwxr-x mhawke/mhawke 0 2016-02-28 21:13 x/1/ -rw-rw-r-- mhawke/mhawke 0 2016-02-28 21:13 x/1/abc -rw-rw-r-- mhawke/mhawke 0 2016-02-28 21:13 x/1/ab -rw-rw-r-- mhawke/mhawke 0 2016-02-28 21:13 x/1/a
The output is:
x is a directory x/4 is a unknown x/3 is a directory x/3/4 is a directory x/3/4/zzz is a file x/2 is a directory x/2/aa is a file x/1 is a directory x/1/abc is a file x/1/ab is a file x/1/a is a file
Notice that x/4 is "unknown" because it is a symbolic link.
There is no easy way, with zipfile, to distinguish a symlink (or other file types) from a directory or normal file. The information is there in the ZipInfo.external_attr attribute, but it's messy to get it back out:
import stat
linked_file = f.filelist[1]
is_symlink = stat.S_ISLNK(linked_file.external_attr >> 16L)
You can use the string.endswith(string) method to check whether it has the proper file-name extension:
filenames = ['code.tar.gz', 'code2.bz2', 'code3.zip']
fileexts = ['.tar.gz', '.bz2', '.zip']
def check_extension():
for name in filenames:
for ext in fileexts:
if name.endswith(ext):
print ('The file: ', name, ' has the extension: ', ext)
check_extension()
which outputs:
The file: code.tar.gz has the extension: .tar.gz
The file: code2.bz2 has the extension: .bz2
The file: code3.zip has the extension: .zip
You would have to create a list of the file extensions for each and every archive file-type you'd want to check against, and would need to load in the file-name into a list where you can easily execute the check, but I think this would be a fairly effective way to solve your issue.
The easiest way is to use shutil.make_archive. It supports both zip and tar formats.
import shutil
shutil.make_archive(output_filename, 'zip', dir_name)
If you need to do something more complicated than zipping the whole directory (such as skipping certain files), then you'll need to dig into the zipfile module as others have suggested.
As others have pointed out, you should use zipfile. The documentation tells you what functions are available, but doesn't really explain how you can use them to zip an entire directory. I think it's easiest to explain with some example code:
import os
import zipfile
def zipdir(path, ziph):
# ziph is zipfile handle
for root, dirs, files in os.walk(path):
for file in files:
ziph.write(os.path.join(root, file),
os.path.relpath(os.path.join(root, file),
os.path.join(path, '..')))
with zipfile.ZipFile('Python.zip', 'w', zipfile.ZIP_DEFLATED) as zipf:
zipdir('tmp/', zipf)
» pip install internetarchive