Archiving and compression

From ArchWiki
(Redirected from Archive)
Jump to: navigation, search

The traditional Unix archiving and compression tools are separated according to the Unix philosophy:

  • A file archiver combines several files into one archive file, e.g. tar.
  • A compression tool compresses and decompresses data.

These tools are often used in sequence by firstly creating an archive file and then compressing it.

Of course there are also tools that do both, which tend to additionally offer encryption, error detection and recovery.

Archiving only

Name Packages Manuals Description
ar binutils ar(1) Legacy Unix archiver before tar. Today only used for creating static library files.
cpio cpio cpio(1) File archiver via stdin/stdout, supports cpio and tar formats.
DAR darAUR dar(1) Archiver to backup large live filesystems, takes care of hard links, extended attributes, sparse files and inode types.
GNU tar coreutils tar(1) GNU utility for manipulating the ubiquitous tar archives (tarballs), see tar for usage examples.
libarchive libarchive bsdtar(1)
bsdcpio(1)
Implementation of tar and cpio that also offers a library. Used by pacman and mkinitcpio.
Tip: Both GNU and BSD tar automatically do decompression delegation for bzip2, compress, gzip, lzip, lzma and xz compressed archives. When creating archives both support the -a switch to automatically filter the created archive through the right compression program based on the file extension. While BSD tar recognizes compression formats based on the format, GNU tar only guesses based on the file extension.

Compression tools

Compression only

These compression programs implement their own file format.

Name Packages Manual Ext Tar ext Description Parallel implementations
bzip2 bzip2 bzip2(1) .bz2, .bz .tbz2, .tbz Uses the Burrows–Wheeler algorithm. lbzip2, pbzip2
gzip gzip gzip(1) .gz, .z .tgz, .taz GNU zip, based on DEFLATE algorithm. pigz
lrzip lrzip lrzip(1) .lrz Improved version of rzip, uses multiple algorithms. is multithreaded
LZ4 lz4 lz4(1) .lz4 Written in C, focused on compression and decompression speed. is multithreaded
lzip lzip lzip(1) .lz Uses LZMA. plzipAUR
lzop lzop lzop(1) .lzop .tzo Uses the LZO library (lzo).
xz xz xz(1) .xz, .lzma .txz, .tlz Uses LZMA, default for GNU coreutils and kernel archive files. pixz, pxzAUR
  • Parallel implementations offer improved speeds by using multiple CPU cores.
  • Tar extensions refers to compressed archives where tar and the compression tool is used, e.g. .tzo is .tar.lzo.

Archiving and compression

Name Packages Manual Ext Description
7z p7zip 7z(1) .7z POSIX port of 7-zip's command-line. See p7zip.
RAR rarAUR, unrar rar(1) .rar Both the format and the rar utility are proprietary.
ZIP zip, unzip zip(1), unzip(1) .zip Widely used outside of the Linux world.
Unarchiver unarchiver unar(1), lsar(1) many Command-line tool of a Mac application, supports over 40 archive formats.
ZPAQ zpaqAUR zpaq(1) .zpaq A high compression ratio archiver written in C++, uses several algorithms.

Feature charts

Decompress

Name gzip bzip2 ZIP compress pack CAB ARJ
gzip Yes No Yes Yes Yes No No
p7zip Yes Yes Yes No Yes Yes Yes
unarchiver Yes Yes Yes Yes No Yes partial

Usage comparison

Archiving only

Name Create archive Extract archive List content
tar(1) tar cfv archive.tar file1 file2 tar xfv archive.tar tar -tvf archive.tar
cpio(1) ls file1 file2 | cpio -o > archive.cpio cpio -i -vd < archive.cpio cpio -t < archive.cpio

Compression only

Name Compress Decompress Decompress to stdout
bzip2(1) bzip2 file bzip2 -d file.bz2 bzcat file.bz2
gzip(1) gzip file gzip -d file.gz zcat file.gz
lrzip(1) lrzip file
lrztar folder
lrzip -d file.lrz
lrztar -d folder.tar.lrz
lrzcat file.lrz
xz(1) xz file xz -d file.xz xzcat file.xz

Archiving and compression

Name Compress Decompress Decompress to stdout List content
7z(1) 7z a archive.7z file1 file2 7z x archive.7z 7z e -so archive.7z file1 7z l archive.7z
rar(1) & unrar rar a archive.rar file1 file2 rar x archive.rar rar p -inul archive.rar file1 rar l archive.rar
zip(1), unzip(1) zip archive.zip file1 file2 unzip archive.zip unzip -p archive.zip file1 unzip -l archive.zip

Convenience tools

  • atool — Script for managing file archives of various types.
https://www.nongnu.org/atool/ || atool
  • dtrx — An intelligent archive extraction tool.
https://brettcsmith.org/2007/dtrx/ || dtrxAUR
  • unp — Command line tool that can unpack archives easily.
https://github.com/mitsuhiko/unp || python-unpAUR
  • unpack — Wrapper script for handling multiple archive formats.
https://github.com/githaff/unpack || unpack-gitAUR

Determining archive format

To extract an archive, its file format needs to be determined. If the file is properly named you can deduce its format from the file extension.

Otherwise you can use the file tool, see file(1).

Esoteric, rare or deprecated tools

Name Packages Ext Description
ARC arcAUR .arc, .ark Was very popular during the early days of the dial-up BBS. Superseded by ZIP.
ARJ arj .arj An archiver used on DOS/Windows in mid-1990s. This is an open source clone.
compress ncompressAUR .Z The classic unix compression utility which can handle the ancient .Z archive.
LHA lhaAUR .lzh, .lha Format popular in Japan, archiver to create LH-7 format archives. 32-bit only (requires multilib).
PAR2 par2cmdline .par2 Parity archiver for increased data integrity. See also Parchive.
shar sharutils .shar Creates self-extracting archives that are valid shell scripts.
Zoo zooAUR .zoo Was mostly popular on the OpenVMS operating system before PKZIP became popular.

Compression libraries

  • Brotli — Compression algorithm for data streams using the LZ77 algorithm, Huffman coding and 2nd order context modeling.
https://github.com/google/brotli || brotli
  • zlib — Compression library implementing the deflate compression method found in gzip and PKZIP.
https://www.zlib.net/ || zlib
  • Zopfli — High compress ratio file compressor from Google, using a deflate-compatible algorithm called zopfli.
https://github.com/google/zopfli || zopfli-gitAUR

See also