Difference between revisions of "Lrzip"

From ArchWiki
Jump to: navigation, search
(Compression: clarify)
 
(14 intermediate revisions by 7 users not shown)
Line 1: Line 1:
 
{{Lowercase title}}
 
{{Lowercase title}}
 
[[Category:Data compression and archiving]]
 
[[Category:Data compression and archiving]]
[http://lrzip.kolivas.org/ Long Range ZIP] or Lzma RZIP is a compression program optimised for large files. The larger the file and the more memory you have, the better the compression advantage this will provide, especially once the files are larger than 100MB. The advantage can be chosen to be either size (much smaller than bzip2) or speed (much faster than bzip2).
+
[[ja:Lrzip]]
 +
[https://github.com/ckolivas/lrzip Long Range ZIP] (or Lzma RZIP) is a compression program optimised for large files, consisting mainly of a extended [[Wikipedia:rzip|rzip]] step for long-distance redundancy reduction and a normal compressor (LZMA, LZO, gzip, bzip2, or ZPAQ) step. The larger the file and the more memory you have, the better the compression advantage this will provide, especially once the files are larger than 100MB. The advantage can be chosen to be either size (much smaller than bzip2) or speed (much faster than bzip2).
  
==Installing Lrzip==
+
== Installation ==
  
[[pacman|Install]] {{Pkg|lrzip}}, available in the [[Official Repositories]].
 
  
==Usage==
+
[[Install]] {{Pkg|lrzip}}, available in the [[official repositories]].
===Compression===
+
Compression of directories (recursive) requires lrztar which first tars the directory, then compresses the single file just like tar does when users compress with gzip or xz (tar zcf ... and tar Jcz ... respectfully).
+
  
This will produce an [[Wikipedia:LZMA|LZMA]] compressed archive "foo.tar.lrz" from a directory named "foo".
+
== Usage ==
 +
 
 +
=== Compression ===
 +
 
 +
Compression of directories (recursive) requires ''lrztar'', which first tars the directory, then compresses the single file just like ''tar'' does when users compress with ''gzip'' or ''xz'' ({{ic|tar zcf ...}} and {{ic|tar Jcz ...}}, respectively). Note that the compression algorithms are used after the rzip-like precompressing of the archive, instead of e.g. plain LZMA compression in normal "LZMA compressed archives".
 +
 
 +
This will produce an [[Wikipedia:LZMA|LZMA]] compressed archive {{ic|foo.tar.lrz}} from a directory named {{ic|foo}}:
 
  $ lrztar foo
 
  $ lrztar foo
  
This will produce an lzma compressed archive "bar.lrz" from a file named "bar"
+
This will produce an LZMA compressed archive {{ic|bar.lrz}} from a file named {{ic|bar}}:
 
  $ lrzip bar
 
  $ lrzip bar
  
For extreme compression, add the -z switch which enables [[Wikipedia:ZPAQ|ZPAQ]] but takes notably longer than lzma.
+
For extreme compression, add the {{ic|-z}} switch which enables [[Wikipedia:ZPAQ|ZPAQ]] but takes notably longer than LZMA:
 
  $ lrztar -z foo
 
  $ lrztar -z foo
  
For extremely fast compression and decompression, use the -l switch for [[Wikipedia:LZO|LZO]].
+
For extremely fast compression and decompression, use the {{ic|-l}} switch for [[Wikipedia:LZO|LZO]]:
 
  $ lrzip -l bar
 
  $ lrzip -l bar
  
===Decompression===
+
=== Decompression ===
  
To completely extract an archived directory.
+
To completely extract an archived directory:
 
  $ lrzuntar foo.tar.lrz
 
  $ lrzuntar foo.tar.lrz
  
To decompress "bar.lrz to "bar".
+
To decompress {{ic|bar.lrz}} to {{ic|bar}}:
 
  $ lrunzip bar.lrz
 
  $ lrunzip bar.lrz
  
 
== Details ==
 
== Details ==
Lrzip uses an extended version of [[Wikipedia:Rzip|rzip]] which does a first pass long distance redundancy reduction. The lrzip modifications make it scale according to memory size. The data is then either:
 
  
# Compressed by lzma (default) which gives excellent compression at approximately twice the speed of bzip2 compression  
+
Lrzip uses an extended version of [[Wikipedia:rzip|rzip]], which does a first pass long distance redundancy reduction. The lrzip modifications make it scale according to memory size. The data is then either:
 +
 
 +
# Compressed by LZMA (default), which gives excellent compression at approximately twice the speed of bzip2 compression  
 
# Compressed by a number of other compressors chosen for different reasons, in order of likelihood of usefulness:  
 
# Compressed by a number of other compressors chosen for different reasons, in order of likelihood of usefulness:  
## ZPAQ: Extreme compression up to 20% smaller than lzma but ultra slow at compression AND decompression.
+
## ZPAQ: Extreme compression up to 20% smaller than LZMA, but ultra slow at compression AND decompression.
## LZO: Extremely fast compression and decompression which on most machines compresses faster than disk writing making it as fast (or even faster) than simply copying a large file.
+
## LZO: Extremely fast compression and decompression, which on most machines compresses faster than disk writing making it as fast (or even faster) than simply copying a large file.
## GZIP: Almost as fast as LZO but with better compression.  
+
## GZIP: Almost as fast as LZO, but with better compression.  
## BZIP2: A defacto linux standard of sorts but is the middle ground between lzma and gzip and neither here nor there.
+
## BZIP2: A defacto linux standard of sorts, but is the middle ground between LZMA and gzip and neither here nor there.
# Leaving it uncompressed and rzip prepared. This form improves substantially any compression performed on the resulting file in both size and speed (due to the nature of rzip preparation merging similar compressible blocks of data and creating a smaller file). By "improving" I mean it will either speed up the very slow compressors with minor detriment to compression, or greatly increase the compression of simple compression algorithms.
+
# Leaving it uncompressed and rzip prepared. This form improves substantially any compression performed on the resulting file in both size and speed (due to the nature of rzip preparation merging similar compressible blocks of data and creating a smaller file). By "improving" it will either speed up the very slow compressors with minor detriment to compression, or greatly increase the compression of simple compression algorithms.
  
 
The major disadvantages are:
 
The major disadvantages are:
#The main lrzip application only works on single files so it requires the lrztar wrapper to fake a complete archiver.  
+
# The main ''lrzip'' application only works on single files, so it requires the ''lrztar'' wrapper to fake a complete archiver.  
#It requires a lot of memory to get the best performance out of, and is not really usable (for compression) with less than 256MB. Decompression requires less ram and works on smaller ram machines. Sometimes swap may need to be enabled on these lower ram machines for the operating system to be happy.
+
# It requires a lot of memory to get the best performance out of (as much memory as the size of the data to compress; but see the sliding mmap below), and is not really usable (for compression) with less than 256MB. Decompression requires less ram and works on smaller ram machines. Sometimes swap may need to be enabled on these lower ram machines for the operating system to be happy.
 
# STDIN/STDOUT works fine on both compression and decompression, but larger files compressed in this manner will end up being less efficiently compressed.
 
# STDIN/STDOUT works fine on both compression and decompression, but larger files compressed in this manner will end up being less efficiently compressed.
  
The unique feature of lrzip is that it tries to make the most of the available ram in your system at all times for maximum benefit. It does this by default, choosing the largest sized window possible without running out of memory. It also has a unique "sliding mmap" feature which makes it possible to even use a compression window larger than your ramsize, if the file is that large. It does this (with the -U option) by implementing one large mmap buffer as per normal, and a smaller moving buffer to track which part of the file is currently being examined, emulating a much larger single mmapped buffer. Unfortunately this mode can be many times slower.
+
The unique feature of lrzip is that it tries to make the most of the available ram in your system at all times for maximum benefit. It does this by default, choosing the largest sized window possible without running out of memory. It also has a unique "sliding mmap" feature which makes it possible to even use a compression window larger than your ramsize, if the file is that large. It does this (with the {{ic|-U}} option) by implementing one large mmap buffer as per normal, and a smaller moving buffer to track which part of the file is currently being examined, emulating a much larger single mmapped buffer. Unfortunately, this mode can be many times slower.
  
 
== Benchmarks ==
 
== Benchmarks ==
Line 56: Line 61:
  
 
See the [http://ck.kolivas.org/apps/lrzip/README README] included with the source package.
 
See the [http://ck.kolivas.org/apps/lrzip/README README] included with the source package.
 +
 +
== Repository and issue tracker ==
 +
 +
On https://github.com/ckolivas/lrzip

Latest revision as of 01:19, 7 September 2016

Long Range ZIP (or Lzma RZIP) is a compression program optimised for large files, consisting mainly of a extended rzip step for long-distance redundancy reduction and a normal compressor (LZMA, LZO, gzip, bzip2, or ZPAQ) step. The larger the file and the more memory you have, the better the compression advantage this will provide, especially once the files are larger than 100MB. The advantage can be chosen to be either size (much smaller than bzip2) or speed (much faster than bzip2).

Installation

Install lrzip, available in the official repositories.

Usage

Compression

Compression of directories (recursive) requires lrztar, which first tars the directory, then compresses the single file just like tar does when users compress with gzip or xz (tar zcf ... and tar Jcz ..., respectively). Note that the compression algorithms are used after the rzip-like precompressing of the archive, instead of e.g. plain LZMA compression in normal "LZMA compressed archives".

This will produce an LZMA compressed archive foo.tar.lrz from a directory named foo:

$ lrztar foo

This will produce an LZMA compressed archive bar.lrz from a file named bar:

$ lrzip bar

For extreme compression, add the -z switch which enables ZPAQ but takes notably longer than LZMA:

$ lrztar -z foo

For extremely fast compression and decompression, use the -l switch for LZO:

$ lrzip -l bar

Decompression

To completely extract an archived directory:

$ lrzuntar foo.tar.lrz

To decompress bar.lrz to bar:

$ lrunzip bar.lrz

Details

Lrzip uses an extended version of rzip, which does a first pass long distance redundancy reduction. The lrzip modifications make it scale according to memory size. The data is then either:

  1. Compressed by LZMA (default), which gives excellent compression at approximately twice the speed of bzip2 compression
  2. Compressed by a number of other compressors chosen for different reasons, in order of likelihood of usefulness:
    1. ZPAQ: Extreme compression up to 20% smaller than LZMA, but ultra slow at compression AND decompression.
    2. LZO: Extremely fast compression and decompression, which on most machines compresses faster than disk writing making it as fast (or even faster) than simply copying a large file.
    3. GZIP: Almost as fast as LZO, but with better compression.
    4. BZIP2: A defacto linux standard of sorts, but is the middle ground between LZMA and gzip and neither here nor there.
  3. Leaving it uncompressed and rzip prepared. This form improves substantially any compression performed on the resulting file in both size and speed (due to the nature of rzip preparation merging similar compressible blocks of data and creating a smaller file). By "improving" it will either speed up the very slow compressors with minor detriment to compression, or greatly increase the compression of simple compression algorithms.

The major disadvantages are:

  1. The main lrzip application only works on single files, so it requires the lrztar wrapper to fake a complete archiver.
  2. It requires a lot of memory to get the best performance out of (as much memory as the size of the data to compress; but see the sliding mmap below), and is not really usable (for compression) with less than 256MB. Decompression requires less ram and works on smaller ram machines. Sometimes swap may need to be enabled on these lower ram machines for the operating system to be happy.
  3. STDIN/STDOUT works fine on both compression and decompression, but larger files compressed in this manner will end up being less efficiently compressed.

The unique feature of lrzip is that it tries to make the most of the available ram in your system at all times for maximum benefit. It does this by default, choosing the largest sized window possible without running out of memory. It also has a unique "sliding mmap" feature which makes it possible to even use a compression window larger than your ramsize, if the file is that large. It does this (with the -U option) by implementing one large mmap buffer as per normal, and a smaller moving buffer to track which part of the file is currently being examined, emulating a much larger single mmapped buffer. Unfortunately, this mode can be many times slower.

Benchmarks

See the README.benchmarks included in the source/docs.

FAQ

See the README included with the source package.

Repository and issue tracker

On https://github.com/ckolivas/lrzip