You've stated that you need to support "tar, bz2, zip or tar.gz". Python's tarfile module will automatically handle gz and bz2 compressed tar files, so there is really only 2 types of archive that you need to support: tar and zip. (bz2 by itself is not an archive format, it's just compression).

You can determine whether a given file is a tar file with tarfile.is_tarfile(). This will also work on tar files compressed with gzip or bzip2 compression. Within a tar file you can determine whether a file is a directory using TarInfo.isdir() or a file with TarInfo.isfile().

Similarly you can determine whether a file is a zip file using zipfile.is_zipfile(). With zipfile there is no method to distinguish directories from normal file, but files that end with / are directories.

So, given a file name, you can do this:

import zipfile
import tarfile

filename = 'test.tgz'

if tarfile.is_tarfile(filename):
    f = tarfile.open(filename)
    for info in f:
        if info.isdir():
            file_type = 'directory'
        elif info.isfile():
            file_type = 'file'
        else:
            file_type = 'unknown'
        print('{} is a {}'.format(info.name, file_type))

elif zipfile.is_zipfile(filename):
    f = zipfile.ZipFile(filename)
    for name in f.namelist():
         print('{} is a {}'.format(name, 'directory' if name.endswith('/') else 'file'))

else:
    print('{} is not an accepted archive file'.format(filename))

When run on a tar file with this structure:

(py2)[mhawke@localhost tmp]$ tar tvfz /tmp/test.tgz
drwxrwxr-x mhawke/mhawke     0 2016-02-29 12:38 x/
lrwxrwxrwx mhawke/mhawke     0 2016-02-29 12:38 x/4 -> 3
drwxrwxr-x mhawke/mhawke     0 2016-02-28 21:14 x/3/
drwxrwxr-x mhawke/mhawke     0 2016-02-28 21:14 x/3/4/
-rw-rw-r-- mhawke/mhawke     0 2016-02-28 21:14 x/3/4/zzz
drwxrwxr-x mhawke/mhawke     0 2016-02-28 21:13 x/2/
-rw-rw-r-- mhawke/mhawke     0 2016-02-28 21:13 x/2/aa
drwxrwxr-x mhawke/mhawke     0 2016-02-28 21:13 x/1/
-rw-rw-r-- mhawke/mhawke     0 2016-02-28 21:13 x/1/abc
-rw-rw-r-- mhawke/mhawke     0 2016-02-28 21:13 x/1/ab
-rw-rw-r-- mhawke/mhawke     0 2016-02-28 21:13 x/1/a

The output is:

x is a directory
x/4 is a unknown
x/3 is a directory
x/3/4 is a directory
x/3/4/zzz is a file
x/2 is a directory
x/2/aa is a file
x/1 is a directory
x/1/abc is a file
x/1/ab is a file
x/1/a is a file

Notice that x/4 is "unknown" because it is a symbolic link.

There is no easy way, with zipfile, to distinguish a symlink (or other file types) from a directory or normal file. The information is there in the ZipInfo.external_attr attribute, but it's messy to get it back out:

import stat

linked_file = f.filelist[1]
is_symlink = stat.S_ISLNK(linked_file.external_attr >> 16L)
Answer from mhawke on Stack Overflow
🌐
Internet Archive
archive.org › developers › internetarchive
The Internet Archive Python Library — Internet Archive Developer Portal
Welcome to the documentation for the internetarchive Python library. This tool provides both a command-line interface (CLI) and a Python API for interacting with archive.org, allowing you to search, download, upload and interact with archive.org services from your terminal or in Python.
🌐
PyPI
pypi.org › project › python-archive
python-archive · PyPI
Simple library that provides a common interface for extracting zip and tar archives.
      » pip install python-archive
    
Published   Jul 12, 2012
Version   0.2
🌐
Python
docs.python.org › 3 › library › archiving.html
Data Compression and Archiving — Python 3.14.4 documentation
The modules described in this chapter support data compression with the zlib, gzip, bzip2, lzma, and zstd algorithms, and the creation of ZIP- and tar-format archives. See also Archiving operations provided by the shutil module. ... © Copyright 2001 Python Software Foundation.
🌐
Python
python.org › download › unpacking
Unpacking Archives | Python.org
Downloads available from python.org may be of many different types. The following provides a bit of information about them, and what applications can be used to unpack them. ... These files are TAR archives that have been compressed using the bzip2 application.
🌐
GitHub
github.com › gdub › python-archive
GitHub - gdub/python-archive: Python package providing a common interface for unpacking zip and tar achives.
Python package providing a common interface for unpacking zip and tar achives. - gdub/python-archive
Starred by 15 users
Forked by 9 users
Languages   Python 100.0% | Python 100.0%
🌐
PyPI
pypi.org › project › Archive
Archive · PyPI
Simple library that provides a common interface for extracting zip and tar archives. ... Author: Gary Wilson Jr. ... This package provides a simple, pure-Python interface for handling various archive file formats.
      » pip install Archive
    
Published   Feb 06, 2012
Version   0.3
Top answer
1 of 4
18

You've stated that you need to support "tar, bz2, zip or tar.gz". Python's tarfile module will automatically handle gz and bz2 compressed tar files, so there is really only 2 types of archive that you need to support: tar and zip. (bz2 by itself is not an archive format, it's just compression).

You can determine whether a given file is a tar file with tarfile.is_tarfile(). This will also work on tar files compressed with gzip or bzip2 compression. Within a tar file you can determine whether a file is a directory using TarInfo.isdir() or a file with TarInfo.isfile().

Similarly you can determine whether a file is a zip file using zipfile.is_zipfile(). With zipfile there is no method to distinguish directories from normal file, but files that end with / are directories.

So, given a file name, you can do this:

import zipfile
import tarfile

filename = 'test.tgz'

if tarfile.is_tarfile(filename):
    f = tarfile.open(filename)
    for info in f:
        if info.isdir():
            file_type = 'directory'
        elif info.isfile():
            file_type = 'file'
        else:
            file_type = 'unknown'
        print('{} is a {}'.format(info.name, file_type))

elif zipfile.is_zipfile(filename):
    f = zipfile.ZipFile(filename)
    for name in f.namelist():
         print('{} is a {}'.format(name, 'directory' if name.endswith('/') else 'file'))

else:
    print('{} is not an accepted archive file'.format(filename))

When run on a tar file with this structure:

(py2)[mhawke@localhost tmp]$ tar tvfz /tmp/test.tgz
drwxrwxr-x mhawke/mhawke     0 2016-02-29 12:38 x/
lrwxrwxrwx mhawke/mhawke     0 2016-02-29 12:38 x/4 -> 3
drwxrwxr-x mhawke/mhawke     0 2016-02-28 21:14 x/3/
drwxrwxr-x mhawke/mhawke     0 2016-02-28 21:14 x/3/4/
-rw-rw-r-- mhawke/mhawke     0 2016-02-28 21:14 x/3/4/zzz
drwxrwxr-x mhawke/mhawke     0 2016-02-28 21:13 x/2/
-rw-rw-r-- mhawke/mhawke     0 2016-02-28 21:13 x/2/aa
drwxrwxr-x mhawke/mhawke     0 2016-02-28 21:13 x/1/
-rw-rw-r-- mhawke/mhawke     0 2016-02-28 21:13 x/1/abc
-rw-rw-r-- mhawke/mhawke     0 2016-02-28 21:13 x/1/ab
-rw-rw-r-- mhawke/mhawke     0 2016-02-28 21:13 x/1/a

The output is:

x is a directory
x/4 is a unknown
x/3 is a directory
x/3/4 is a directory
x/3/4/zzz is a file
x/2 is a directory
x/2/aa is a file
x/1 is a directory
x/1/abc is a file
x/1/ab is a file
x/1/a is a file

Notice that x/4 is "unknown" because it is a symbolic link.

There is no easy way, with zipfile, to distinguish a symlink (or other file types) from a directory or normal file. The information is there in the ZipInfo.external_attr attribute, but it's messy to get it back out:

import stat

linked_file = f.filelist[1]
is_symlink = stat.S_ISLNK(linked_file.external_attr >> 16L)
2 of 4
1

You can use the string.endswith(string) method to check whether it has the proper file-name extension:

filenames = ['code.tar.gz', 'code2.bz2', 'code3.zip']
fileexts = ['.tar.gz', '.bz2', '.zip']

def check_extension():
    for name in filenames:
        for ext in fileexts:
            if name.endswith(ext):
                print ('The file: ', name, ' has the extension: ', ext)


check_extension()

which outputs:

The file:  code.tar.gz  has the extension:  .tar.gz
The file:  code2.bz2  has the extension:  .bz2
The file:  code3.zip  has the extension:  .zip

You would have to create a list of the file extensions for each and every archive file-type you'd want to check against, and would need to load in the file-name into a list where you can easily execute the check, but I think this would be a fairly effective way to solve your issue.

🌐
GitHub
github.com › webrecorder › pywb
GitHub - webrecorder/pywb: Core Python Web Archiving Toolkit for replay and recording of web archives · GitHub
pywb is a Python 3 web archiving toolkit for replaying web archives large and small as accurately as possible.
Starred by 1.6K users
Forked by 239 users
Languages   JavaScript 57.8% | Python 38.5% | Vue 1.9% | HTML 1.5% | CSS 0.1% | Shell 0.1%
Find elsewhere
🌐
Google Code
code.google.com › archive › p › python-archive
Google Code Archive - Long-term storage for Google Code Project Hosting.
Archive · Skip to content · The Google Code Archive requires JavaScript to be enabled in your browser · Google · About Google · Privacy · Terms
🌐
Rpaframework
rpaframework.org › libraries › archive › python.html
Python API - Archive
Archive · Python API · View page source · class RPA.Archive.Archive · Archive is a library for operating with ZIP and TAR packages.
🌐
GitHub
github.com › jjjake › internetarchive
GitHub - jjjake/internetarchive: A Python and Command-Line Interface to Archive.org · GitHub
This package installs a command-line tool named ia for using Archive.org from the command-line. It also installs the internetarchive Python module for programmatic access to archive.org.
Starred by 1.8K users
Forked by 246 users
Languages   Python 99.2% | Makefile 0.8%
🌐
Cuny
emerging.commons.gc.cuny.edu › 2014 › 03 › downloading-items-internet-archive-collection-using-python
Downloading all the items in an Internet Archive collection using Python – Emerging Tech in Libraries
March 24, 2014 - before running, you’ll need to sudo pip install internetarchive in Terminal (if using a Mac) or do whatever is the equivalent with Windows for the internetarchive Python library. your files will download into their own folders, under the IA identifier, wherever you save this .py file · ## downloads all items in a given Internet Archive collection ## See http://programminghistorian.org/lessons/data-mining-the-internet-archive for more detailed info
🌐
GeeksforGeeks
geeksforgeeks.org › python › python-create-archives-and-find-files-by-name
Python | Create Archives and Find Files by Name - GeeksforGeeks
December 29, 2020 - To get a list of supported archive formats, use get_archive_formats(). Code #2 : ... [('bztar', "bzip2'ed tar-file"), ('gztar', "gzip'ed tar-file"), ('tar', 'uncompressed tar file'), ('zip', 'ZIP file')] Python has other library modules for dealing with the low-level details of various archive formats (e.g., tarfile, zipfile, gzip, bz2, etc.).
Top answer
1 of 2
3

Archives don't really make sense if all you going to do is put one file into each of them.

Or did you mean to compress them, e.g. using the gzip or bz2 module?

If you indeed really want archives with only a single file, create a tar or ZIP object and just add it straight away, e.g. for TarFile.add.

Note that while it is very common when using unix-like operating systems to compress single files using bz2 or gzip doing so is very uncommon on other platforms, e.g. Windows. There the recommendation would be to use ZIP files, even for single files, since they are handled well by applications (Windows Explorer and others).

To put a single file into a ZIP file do something similar to this:

import zipfile
# ...

with zipfile.ZipFile(nameOfOrginalFile + ".zip", "w", zipfile.ZIP_DEFLATED) as zip_file:
    zip_file.write(nameOfOriginalFile)

Not passing ZIP_DEFLATED to ZipFile will result in an uncompressed zip file.

To compress a single file using e.g. gzip:

import gzip

with gzip.GzipFile(nameoforiginalFile + ".gz", "w") as gz:
    with open(nameoforignalfile) as inp_file;
        shutil.copyfileobj(inp_file, gz)

The bz2 and lzma (not available for Python 2) APIs are the same, just import bz2/lzma and use bz2.BZ2File instead.

After both with blocks you can delete the original file (os.remove(file)) and move the archive file to the correct location. Alternatively, create the archive file directly in the correct location (os.path.join the target location and the archive name).

2 of 2
0

The standard library contains zipfile module for working with .zip archives, gzip module for working with .gz compressed files and bz2 module for working with .bz2 compressed files (the later is slower, but yields better compression).

Python 3.3 also introduces [lzma] (for .xz and .lzma files), which has even better compression ratio, but it does not seem to be backported to 2.7.

Note that a single file does not need .tar.gz, a .gz will do. Because .tar.gz is two levels. .tar to put several files together and .gz to compress it and you don't need the first part if you have just one. Zip does both things, so for single file it is slightly less efficient than gz (they use the same compression method), but you may have some tool that understands zip files and not gz files, so there may be some reason to use it.

To create single compressed file with gzip, bz2 or [lzma], you just use open function from the respective module and then use shutil.copyfileobj to copy the content of the source file to the archive.

🌐
GitHub
github.com › palewire › archiveis
GitHub - palewire/archiveis: A simple Python wrapper for the archive.is capturing service · GitHub
A simple Python wrapper for the archive.is capturing service - palewire/archiveis
Starred by 217 users
Forked by 19 users
Languages   Python
🌐
Python
docs.python.org › 3 › library › zipfile.html
zipfile — Work with ZIP archives
February 23, 2026 - However, some tools (including older Python releases) do not support these compression methods, and may either refuse to process the ZIP file altogether, or fail to extract individual files. ... Documentation on the ZIP file format by Phil Katz, the creator of the format and algorithms used. ... Information about the Info-ZIP project’s ZIP archive programs and development libraries.
🌐
Internet Archive
archive.org › videos
Python - The Complete Python Course For Beginners : Free Download, Borrow, and Streaming : Internet Archive
May 19, 2020 - the-complete-python-course-for-beginners · Scanner · Internet Archive HTML5 Uploader 1.6.4 · 1,192 Views · 7 Favorites · 1 Review · download 1 file ITEM TILE download · download 1 file MPEG4 download · download 1 file TORRENT download · download 256 Files download 7 Original SHOW ALL ·