Brave Search

How are zlib, gzip and zip related? What do they have in common and how are they different? [closed]

stackoverflow.com › questions › 20762094 › how-are-zlib-gzip-and-zip-related-what-do-they-have-in-common-and-how-are-they

Short form:

.zip is an archive format using, usually, the Deflate compression method. The .gz gzip format is for single files, also using the Deflate compression method. Often gzip is used in combination with tar to make a compressed archive format, .tar.gz. The zlib library provides Deflate compression and decompression code for use by zip, gzip, png (which uses the zlib wrapper on deflate data), and many other applications.

Long form:

The ZIP format was developed by Phil Katz as an open format with an open specification, where his implementation, PKZIP, was shareware. It is an archive format that stores files and their directory structure, where each file is individually compressed. The file type is .zip. The files, as well as the directory structure, can optionally be encrypted.

The ZIP format supports several compression methods:

    0 - The file is stored (no compression)
    1 - The file is Shrunk
    2 - The file is Reduced with compression factor 1
    3 - The file is Reduced with compression factor 2
    4 - The file is Reduced with compression factor 3
    5 - The file is Reduced with compression factor 4
    6 - The file is Imploded
    7 - Reserved for Tokenizing compression algorithm
    8 - The file is Deflated
    9 - Enhanced Deflating using Deflate64(tm)
   10 - PKWARE Data Compression Library Imploding (old IBM TERSE)
   11 - Reserved by PKWARE
   12 - File is compressed using BZIP2 algorithm
   13 - Reserved by PKWARE
   14 - LZMA
   15 - Reserved by PKWARE
   16 - IBM z/OS CMPSC Compression
   17 - Reserved by PKWARE
   18 - File is compressed using IBM TERSE (new)
   19 - IBM LZ77 z Architecture 
   20 - deprecated (use method 93 for zstd)
   93 - Zstandard (zstd) Compression 
   94 - MP3 Compression 
   95 - XZ Compression 
   96 - JPEG variant
   97 - WavPack compressed data
   98 - PPMd version I, Rev 1
   99 - AE-x encryption marker (see APPENDIX E)

Methods 1 to 7 are historical and are not in use. Methods 9 through 98 are relatively recent additions and are in varying, small amounts of use. The only method in truly widespread use in the ZIP format is method 8, Deflate, and to some smaller extent method 0, which is no compression at all. Virtually every .zip file that you will come across in the wild will use exclusively methods 8 and 0, likely just method 8. (Method 8 also has a means to effectively store the data with no compression and relatively little expansion, and Method 0 cannot be streamed whereas Method 8 can be.)

The ISO/IEC 21320-1:2015 standard for file containers is a restricted zip format, such as used in Java archive files (.jar), Office Open XML files (Microsoft Office .docx, .xlsx, .pptx), Office Document Format files (.odt, .ods, .odp), and EPUB files (.epub). That standard limits the compression methods to 0 and 8, as well as other constraints such as no encryption or signatures.

Around 1990, the Info-ZIP group wrote portable, free, open-source implementations of zip and unzip utilities, supporting compression with the Deflate format, and decompression of that and the earlier formats. This greatly expanded the use of the .zip format.

In the early '90s, the gzip format was developed as a replacement for the Unix compress utility, derived from the Deflate code in the Info-ZIP utilities. Unix compress was designed to compress a single file or stream, appending a .Z to the file name. compress uses the LZW compression algorithm, which at the time was under patent and its free use was in dispute by the patent holders. Though some specific implementations of Deflate were patented by Phil Katz, the format was not, and so it was possible to write a Deflate implementation that did not infringe on any patents. That implementation has not been so challenged in the last 20+ years. The Unix gzip utility was intended as a drop-in replacement for compress, and in fact is able to decompress compress-compressed data (assuming that you were able to parse that sentence). gzip appends a .gz to the file name. gzip uses the Deflate compressed data format, which compresses quite a bit better than Unix compress, has very fast decompression, and adds a CRC-32 as an integrity check for the data. The header format also permits the storage of more information than the compress format allowed, such as the original file name and the file modification time.

Though compress only compresses a single file, it was common to use the tar utility to create an archive of files, their attributes, and their directory structure into a single .tar file, and then compress it with compress to make a .tar.Z file. In fact, the tar utility had and still has the option to do the compression at the same time, instead of having to pipe the output of tar to compress. This all carried forward to the gzip format, and tar has an option to compress directly to the .tar.gz format. The tar.gz format compresses better than the .zip approach, since the compression of a .tar can take advantage of redundancy across files, especially many small files. .tar.gz is the most common archive format in use on Unix due to its very high portability, but there are more effective compression methods in use as well, so you will often see .tar.bz2 and .tar.xz archives.

Unlike .tar, .zip has a central directory at the end, which provides a list of the contents. That and the separate compression provides random access to the individual entries in a .zip file. A .tar file would have to be decompressed and scanned from start to end in order to build a directory, which is how a .tar file is listed.

Shortly after the introduction of gzip, around the mid-1990s, the same patent dispute called into question the free use of the .gif image format, very widely used on bulletin boards and the World Wide Web (a new thing at the time). So a small group created the PNG losslessly compressed image format, with file type .png, to replace .gif. That format also uses the Deflate format for compression, which is applied after filters on the image data expose more of the redundancy. In order to promote widespread usage of the PNG format, two free code libraries were created. libpng and zlib. libpng handled all of the features of the PNG format, and zlib provided the compression and decompression code for use by libpng, as well as for other applications. zlib was adapted from the gzip code.

All of the mentioned patents have since expired.

The zlib library supports Deflate compression and decompression, and three kinds of wrapping around the deflate streams. Those are no wrapping at all ("raw" deflate), zlib wrapping, which is used in the PNG format data blocks, and gzip wrapping, to provide gzip routines for the programmer. The main difference between zlib and gzip wrapping is that the zlib wrapping is more compact, six bytes vs. a minimum of 18 bytes for gzip, and the integrity check, Adler-32, runs faster than the CRC-32 that gzip uses. Raw deflate is used by programs that read and write the .zip format, which is another format that wraps around deflate compressed data.

zlib is now in wide use for data transmission and storage. For example, most HTTP transactions by servers and browsers compress and decompress the data using zlib, specifically HTTP header Content-Encoding: deflate means deflate compression method wrapped inside the zlib data format.

Different implementations of deflate can result in different compressed output for the same input data, as evidenced by the existence of selectable compression levels that allow trading off compression effectiveness for CPU time. zlib and PKZIP are not the only implementations of deflate compression and decompression. Both the 7-Zip archiving utility and Google's zopfli library have the ability to use much more CPU time than zlib in order to squeeze out the last few bits possible when using the deflate format, reducing compressed sizes by a few percent as compared to zlib's highest compression level. The pigz utility, a parallel implementation of gzip, includes the option to use zlib (compression levels 1-9) or zopfli (compression level 11), and somewhat mitigates the time impact of using zopfli by splitting the compression of large files over multiple processors and cores.

Answer from Mark Adler on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 20762094 › how-are-zlib-gzip-and-zip-related-what-do-they-have-in-common-and-how-are-they

compression - How are zlib, gzip and zip related? What do they have in common and how are they different? - Stack Overflow

Top answer

1 of 3

3283

Short form:

.zip is an archive format using, usually, the Deflate compression method. The .gz gzip format is for single files, also using the Deflate compression method. Often gzip is used in combination with tar to make a compressed archive format, .tar.gz. The zlib library provides Deflate compression and decompression code for use by zip, gzip, png (which uses the zlib wrapper on deflate data), and many other applications.

Long form:

The ZIP format was developed by Phil Katz as an open format with an open specification, where his implementation, PKZIP, was shareware. It is an archive format that stores files and their directory structure, where each file is individually compressed. The file type is .zip. The files, as well as the directory structure, can optionally be encrypted.

The ZIP format supports several compression methods:

    0 - The file is stored (no compression)
    1 - The file is Shrunk
    2 - The file is Reduced with compression factor 1
    3 - The file is Reduced with compression factor 2
    4 - The file is Reduced with compression factor 3
    5 - The file is Reduced with compression factor 4
    6 - The file is Imploded
    7 - Reserved for Tokenizing compression algorithm
    8 - The file is Deflated
    9 - Enhanced Deflating using Deflate64(tm)
   10 - PKWARE Data Compression Library Imploding (old IBM TERSE)
   11 - Reserved by PKWARE
   12 - File is compressed using BZIP2 algorithm
   13 - Reserved by PKWARE
   14 - LZMA
   15 - Reserved by PKWARE
   16 - IBM z/OS CMPSC Compression
   17 - Reserved by PKWARE
   18 - File is compressed using IBM TERSE (new)
   19 - IBM LZ77 z Architecture 
   20 - deprecated (use method 93 for zstd)
   93 - Zstandard (zstd) Compression 
   94 - MP3 Compression 
   95 - XZ Compression 
   96 - JPEG variant
   97 - WavPack compressed data
   98 - PPMd version I, Rev 1
   99 - AE-x encryption marker (see APPENDIX E)

Methods 1 to 7 are historical and are not in use. Methods 9 through 98 are relatively recent additions and are in varying, small amounts of use. The only method in truly widespread use in the ZIP format is method 8, Deflate, and to some smaller extent method 0, which is no compression at all. Virtually every .zip file that you will come across in the wild will use exclusively methods 8 and 0, likely just method 8. (Method 8 also has a means to effectively store the data with no compression and relatively little expansion, and Method 0 cannot be streamed whereas Method 8 can be.)

The ISO/IEC 21320-1:2015 standard for file containers is a restricted zip format, such as used in Java archive files (.jar), Office Open XML files (Microsoft Office .docx, .xlsx, .pptx), Office Document Format files (.odt, .ods, .odp), and EPUB files (.epub). That standard limits the compression methods to 0 and 8, as well as other constraints such as no encryption or signatures.

Around 1990, the Info-ZIP group wrote portable, free, open-source implementations of zip and unzip utilities, supporting compression with the Deflate format, and decompression of that and the earlier formats. This greatly expanded the use of the .zip format.

In the early '90s, the gzip format was developed as a replacement for the Unix compress utility, derived from the Deflate code in the Info-ZIP utilities. Unix compress was designed to compress a single file or stream, appending a .Z to the file name. compress uses the LZW compression algorithm, which at the time was under patent and its free use was in dispute by the patent holders. Though some specific implementations of Deflate were patented by Phil Katz, the format was not, and so it was possible to write a Deflate implementation that did not infringe on any patents. That implementation has not been so challenged in the last 20+ years. The Unix gzip utility was intended as a drop-in replacement for compress, and in fact is able to decompress compress-compressed data (assuming that you were able to parse that sentence). gzip appends a .gz to the file name. gzip uses the Deflate compressed data format, which compresses quite a bit better than Unix compress, has very fast decompression, and adds a CRC-32 as an integrity check for the data. The header format also permits the storage of more information than the compress format allowed, such as the original file name and the file modification time.

Though compress only compresses a single file, it was common to use the tar utility to create an archive of files, their attributes, and their directory structure into a single .tar file, and then compress it with compress to make a .tar.Z file. In fact, the tar utility had and still has the option to do the compression at the same time, instead of having to pipe the output of tar to compress. This all carried forward to the gzip format, and tar has an option to compress directly to the .tar.gz format. The tar.gz format compresses better than the .zip approach, since the compression of a .tar can take advantage of redundancy across files, especially many small files. .tar.gz is the most common archive format in use on Unix due to its very high portability, but there are more effective compression methods in use as well, so you will often see .tar.bz2 and .tar.xz archives.

Unlike .tar, .zip has a central directory at the end, which provides a list of the contents. That and the separate compression provides random access to the individual entries in a .zip file. A .tar file would have to be decompressed and scanned from start to end in order to build a directory, which is how a .tar file is listed.

Shortly after the introduction of gzip, around the mid-1990s, the same patent dispute called into question the free use of the .gif image format, very widely used on bulletin boards and the World Wide Web (a new thing at the time). So a small group created the PNG losslessly compressed image format, with file type .png, to replace .gif. That format also uses the Deflate format for compression, which is applied after filters on the image data expose more of the redundancy. In order to promote widespread usage of the PNG format, two free code libraries were created. libpng and zlib. libpng handled all of the features of the PNG format, and zlib provided the compression and decompression code for use by libpng, as well as for other applications. zlib was adapted from the gzip code.

All of the mentioned patents have since expired.

The zlib library supports Deflate compression and decompression, and three kinds of wrapping around the deflate streams. Those are no wrapping at all ("raw" deflate), zlib wrapping, which is used in the PNG format data blocks, and gzip wrapping, to provide gzip routines for the programmer. The main difference between zlib and gzip wrapping is that the zlib wrapping is more compact, six bytes vs. a minimum of 18 bytes for gzip, and the integrity check, Adler-32, runs faster than the CRC-32 that gzip uses. Raw deflate is used by programs that read and write the .zip format, which is another format that wraps around deflate compressed data.

zlib is now in wide use for data transmission and storage. For example, most HTTP transactions by servers and browsers compress and decompress the data using zlib, specifically HTTP header Content-Encoding: deflate means deflate compression method wrapped inside the zlib data format.

Different implementations of deflate can result in different compressed output for the same input data, as evidenced by the existence of selectable compression levels that allow trading off compression effectiveness for CPU time. zlib and PKZIP are not the only implementations of deflate compression and decompression. Both the 7-Zip archiving utility and Google's zopfli library have the ability to use much more CPU time than zlib in order to squeeze out the last few bits possible when using the deflate format, reducing compressed sizes by a few percent as compared to zlib's highest compression level. The pigz utility, a parallel implementation of gzip, includes the option to use zlib (compression levels 1-9) or zopfli (compression level 11), and somewhat mitigates the time impact of using zopfli by splitting the compression of large files over multiple processors and cores.

2 of 3

65

ZIP is a file format used for storing an arbitrary number of files and folders together with lossless compression. It makes no strict assumptions about the compression methods used, but is most frequently used with DEFLATE.

Gzip is both a compression algorithm based on DEFLATE but less encumbered with potential patents et al, and a file format for storing a single compressed file. It supports compressing an arbitrary number of files and folders when combined with tar. The resulting file has an extension of .tgz or .tar.gz and is commonly called a tarball.

zlib is a library of functions encapsulating DEFLATE in its most common LZ77 incarnation.

Baeldung

baeldung.com › home › algorithms › data compression: zlib vs. gzip vs. zip

Data Compression: ZLib vs. GZip vs. Zip | Baeldung on Computer Science

March 18, 2024 - The main drawback of zlib is that it doesn’t have any checksum mechanism to maintain the integrity of data. gzip is a popular data compression and decompression method. It’s mainly used to compress a single file and is found in Unix/Linux systems. Additionally, we can also utilize gzip to compress the HTTP content.

Videos

04:18

YouTube

Node Js Gzip Compression with ZLib Compression Module - YouTube

SDC 2018 - FPGA-Based ZLIB/GZIP Compression Engine as an NVMe ...

Node Js Gzip Decompression with ZLib Compression Module - YouTube

dev.to › biellls › compression-clearing-the-confusion-on-zip-gzip-zlib-and-deflate-15g1

Compression: Clearing the Confusion on ZIP, GZIP, Zlib and DEFLATE - DEV Community

January 21, 2022 - I was surprised to find out that GZIP, zlib or even ZIP are not compression algorithms, they are actually file formats that can permit different compression algorithms. Even more surprising, virtually every implementation of those three actually use the same lossless data compression algorithm.

reddit.com › r/programming › how are zlib, gzip and zip related? what do they have in common and how are they different?

r/programming on Reddit: How are zlib, gzip and zip related? What do they have in common and how are they different?

March 14, 2021 - Zlib is the most common deflate encode/decode library. Zip and gzip are container formats that use the deflate compression algorithm.

Encode

encode.su › threads › 3176-gzip-vs-zlib-benchmarking-considerations

gzip vs zlib; benchmarking considerations

Actually did a test and compression with zlib, from library, is 20%(level 1) - 50%(level 6), 60%(level 9) slower than gzip. Decompression is 10% faster, which is surprising, although, maybe not that much as, afaiu gzip uses slower crc32 (slice by 1) when zlib uses bigger slices but it's very complicated code.

GitHub

github.com › zlib-ng › zlib-ng › discussions › 871

2.0.0 Benchmark comparisons · zlib-ng/zlib-ng · Discussion #871

Gzip compression is about twice as fast as zlib, but zlib compresses slightly better than gzip (due to minigzip using a bare minimum of headers?), and decompression with gzip takes about 30% less time.

Author zlib-ng

Find elsewhere

Google Bing Mojeek

Codemia

codemia.io › knowledge-hub › path › how_are_zlib_gzip_and_zip_related_what_do_they_have_in_common_and_how_are_they_different

How are zlib, gzip and zip related? What do they have in ...

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises

TutorialsPoint

tutorialspoint.com › compression-compatible-with-gzip-in-python-zlib

Compression compatible with gzip in Python (zlib)

June 25, 2020 - Python's standard library has a rich collection of modules for data compression and archiving. One can select whichever is suitable for his job. There are following modules related to data compression −

Google Groups

groups.google.com › g › boost-developers-archive › c › GE4LclG4mMs

[boost] [iostreams][gzip][zlib] zlib vs gzip, and linking against external libraries

January 17, 2015 - > I think gzip and zlib are two different variants of the DEFLATE algorithm and they differ in their header information.

Medium

aminshamim.xyz › gzip-deflate-brotli-and-zstd-which-compression-algorithm-should-you-use-for-your-website-033ca5cfa7ca

Gzip, Deflate, Brotli, and Zstd: Which Compression Algorithm Should You Use for Your Website? | by Md Aminul Islam Sarker | Medium

November 28, 2025 - When it comes to web performance optimization, reducing the size of data sent from your server to the client is crucial. One of the most effective ways to achieve this is through compression. But with several algorithms to choose from — including Gzip, Deflate, Brotli, and Zstandard (Zstd) — how do you know which one is best for your website?

Dlecocq

dlecocq.github.io › blog › 2011 › 12 › 16 › python's-zlib-and-gzip,-performance,-and-you

Python's zlib and gzip, Performance, and You - My Octopress Blog

December 16, 2011 - Gzip is actually just a file format, apparently most commonly used with zlib’s compression. It provides a file header and a footer and a little bit of metadata, but it really is merely a wrapper around zlib. However, while python’s zlib module is a compiled C extension, the gzip module is a pure python implementation that makes calls to zlib.

Medium

medium.com › @linz07m › gzip-or-zstandard-which-compression-should-you-use-23dd28823001

Gzip or Zstandard: Which Compression Should You Use? | by Lince Mathew | Medium

November 12, 2025 - Two common compression algorithms used today are Gzip and Zstandard (Zstd).

Stack Exchange

unix.stackexchange.com › questions › 731609 › difference-between-ar-tar-gzip-zip-and-when-should-i-decide-to-choose-which-o

Difference between ar, tar, gzip, zip and when should I decide to choose which one? - Unix & Linux Stack Exchange