648db22b |
1 | Command Line Interface for Zstandard library |
2 | ============================================ |
3 | |
4 | Command Line Interface (CLI) can be created using the `make` command without any additional parameters. |
5 | There are however other Makefile targets that create different variations of CLI: |
6 | - `zstd` : default CLI supporting gzip-like arguments; includes dictionary builder, benchmark, and supports decompression of legacy zstd formats |
7 | - `zstd_nolegacy` : Same as `zstd` but without support for legacy zstd formats |
8 | - `zstd-small` : CLI optimized for minimal size; no dictionary builder, no benchmark, and no support for legacy zstd formats |
9 | - `zstd-compress` : version of CLI which can only compress into zstd format |
10 | - `zstd-decompress` : version of CLI which can only decompress zstd format |
11 | |
12 | |
13 | ### Compilation variables |
14 | `zstd` scope can be altered by modifying the following `make` variables : |
15 | |
16 | - __HAVE_THREAD__ : multithreading is automatically enabled when `pthread` is detected. |
17 | It's possible to disable multithread support, by setting `HAVE_THREAD=0`. |
18 | Example : `make zstd HAVE_THREAD=0` |
19 | It's also possible to force multithread support, using `HAVE_THREAD=1`. |
20 | In which case, linking stage will fail if neither `pthread` nor `windows.h` library can be found. |
21 | This is useful to ensure this feature is not silently disabled. |
22 | |
23 | - __ZSTD_LEGACY_SUPPORT__ : `zstd` can decompress files compressed by older versions of `zstd`. |
24 | Starting v0.8.0, all versions of `zstd` produce frames compliant with the [specification](../doc/zstd_compression_format.md), and are therefore compatible. |
25 | But older versions (< v0.8.0) produced different, incompatible, frames. |
26 | By default, `zstd` supports decoding legacy formats >= v0.4.0 (`ZSTD_LEGACY_SUPPORT=4`). |
27 | This can be altered by modifying this compilation variable. |
28 | `ZSTD_LEGACY_SUPPORT=1` means "support all formats >= v0.1.0". |
29 | `ZSTD_LEGACY_SUPPORT=2` means "support all formats >= v0.2.0", and so on. |
30 | `ZSTD_LEGACY_SUPPORT=0` means _DO NOT_ support any legacy format. |
31 | if `ZSTD_LEGACY_SUPPORT >= 8`, it's the same as `0`, since there is no legacy format after `7`. |
32 | Note : `zstd` only supports decoding older formats, and cannot generate any legacy format. |
33 | |
34 | - __HAVE_ZLIB__ : `zstd` can compress and decompress files in `.gz` format. |
35 | This is ordered through command `--format=gzip`. |
36 | Alternatively, symlinks named `gzip` or `gunzip` will mimic intended behavior. |
37 | `.gz` support is automatically enabled when `zlib` library is detected at build time. |
38 | It's possible to disable `.gz` support, by setting `HAVE_ZLIB=0`. |
39 | Example : `make zstd HAVE_ZLIB=0` |
40 | It's also possible to force compilation with zlib support, using `HAVE_ZLIB=1`. |
41 | In which case, linking stage will fail if `zlib` library cannot be found. |
42 | This is useful to prevent silent feature disabling. |
43 | |
44 | - __HAVE_LZMA__ : `zstd` can compress and decompress files in `.xz` and `.lzma` formats. |
45 | This is ordered through commands `--format=xz` and `--format=lzma` respectively. |
46 | Alternatively, symlinks named `xz`, `unxz`, `lzma`, or `unlzma` will mimic intended behavior. |
47 | `.xz` and `.lzma` support is automatically enabled when `lzma` library is detected at build time. |
48 | It's possible to disable `.xz` and `.lzma` support, by setting `HAVE_LZMA=0`. |
49 | Example : `make zstd HAVE_LZMA=0` |
50 | It's also possible to force compilation with lzma support, using `HAVE_LZMA=1`. |
51 | In which case, linking stage will fail if `lzma` library cannot be found. |
52 | This is useful to prevent silent feature disabling. |
53 | |
54 | - __HAVE_LZ4__ : `zstd` can compress and decompress files in `.lz4` formats. |
55 | This is ordered through commands `--format=lz4`. |
56 | Alternatively, symlinks named `lz4`, or `unlz4` will mimic intended behavior. |
57 | `.lz4` support is automatically enabled when `lz4` library is detected at build time. |
58 | It's possible to disable `.lz4` support, by setting `HAVE_LZ4=0` . |
59 | Example : `make zstd HAVE_LZ4=0` |
60 | It's also possible to force compilation with lz4 support, using `HAVE_LZ4=1`. |
61 | In which case, linking stage will fail if `lz4` library cannot be found. |
62 | This is useful to prevent silent feature disabling. |
63 | |
64 | - __ZSTD_NOBENCH__ : `zstd` cli will be compiled without its integrated benchmark module. |
65 | This can be useful to produce smaller binaries. |
66 | In this case, the corresponding unit can also be excluded from compilation target. |
67 | |
68 | - __ZSTD_NODICT__ : `zstd` cli will be compiled without support for the integrated dictionary builder. |
69 | This can be useful to produce smaller binaries. |
70 | In this case, the corresponding unit can also be excluded from compilation target. |
71 | |
72 | - __ZSTD_NOCOMPRESS__ : `zstd` cli will be compiled without support for compression. |
73 | The resulting binary will only be able to decompress files. |
74 | This can be useful to produce smaller binaries. |
75 | A corresponding `Makefile` target using this ability is `zstd-decompress`. |
76 | |
77 | - __ZSTD_NODECOMPRESS__ : `zstd` cli will be compiled without support for decompression. |
78 | The resulting binary will only be able to compress files. |
79 | This can be useful to produce smaller binaries. |
80 | A corresponding `Makefile` target using this ability is `zstd-compress`. |
81 | |
82 | - __BACKTRACE__ : `zstd` can display a stack backtrace when execution |
83 | generates a runtime exception. By default, this feature may be |
84 | degraded/disabled on some platforms unless additional compiler directives are |
85 | applied. When triaging a runtime issue, enabling this feature can provide |
86 | more context to determine the location of the fault. |
87 | Example : `make zstd BACKTRACE=1` |
88 | |
89 | |
90 | ### Aggregation of parameters |
91 | CLI supports aggregation of parameters i.e. `-b1`, `-e18`, and `-i1` can be joined into `-b1e18i1`. |
92 | |
93 | |
94 | ### Symlink shortcuts |
95 | It's possible to invoke `zstd` through a symlink. |
96 | When the name of the symlink has a specific value, it triggers an associated behavior. |
97 | - `zstdmt` : compress using all cores available on local system. |
98 | - `zcat` : will decompress and output target file using any of the supported formats. `gzcat` and `zstdcat` are also equivalent. |
99 | - `gzip` : if zlib support is enabled, will mimic `gzip` by compressing file using `.gz` format, removing source file by default (use `--keep` to preserve). If zlib is not supported, triggers an error. |
100 | - `xz` : if lzma support is enabled, will mimic `xz` by compressing file using `.xz` format, removing source file by default (use `--keep` to preserve). If xz is not supported, triggers an error. |
101 | - `lzma` : if lzma support is enabled, will mimic `lzma` by compressing file using `.lzma` format, removing source file by default (use `--keep` to preserve). If lzma is not supported, triggers an error. |
102 | - `lz4` : if lz4 support is enabled, will mimic `lz4` by compressing file using `.lz4` format. If lz4 is not supported, triggers an error. |
103 | - `unzstd` and `unlz4` will decompress any of the supported format. |
104 | - `ungz`, `unxz` and `unlzma` will do the same, and will also remove source file by default (use `--keep` to preserve). |
105 | |
106 | |
107 | ### Dictionary builder in Command Line Interface |
108 | Zstd offers a training mode, which can be used to tune the algorithm for a selected |
109 | type of data, by providing it with a few samples. The result of the training is stored |
110 | in a file selected with the `-o` option (default name is `dictionary`), |
111 | which can be loaded before compression and decompression. |
112 | |
113 | Using a dictionary, the compression ratio achievable on small data improves dramatically. |
114 | These compression gains are achieved while simultaneously providing faster compression and decompression speeds. |
115 | Dictionary work if there is some correlation in a family of small data (there is no universal dictionary). |
116 | Hence, deploying one dictionary per type of data will provide the greater benefits. |
117 | Dictionary gains are mostly effective in the first few KB. Then, the compression algorithm |
118 | will rely more and more on previously decoded content to compress the rest of the file. |
119 | |
120 | Usage of the dictionary builder and created dictionaries with CLI: |
121 | |
122 | 1. Create the dictionary : `zstd --train PathToTrainingSet/* -o dictionaryName` |
123 | 2. Compress with the dictionary: `zstd FILE -D dictionaryName` |
124 | 3. Decompress with the dictionary: `zstd --decompress FILE.zst -D dictionaryName` |
125 | |
126 | |
127 | ### Benchmark in Command Line Interface |
128 | CLI includes in-memory compression benchmark module for zstd. |
129 | The benchmark is conducted using given filenames. The files are read into memory and joined together. |
130 | It makes benchmark more precise as it eliminates I/O overhead. |
131 | Multiple filenames can be supplied, as multiple parameters, with wildcards, |
132 | or names of directories can be used as parameters with `-r` option. |
133 | |
134 | The benchmark measures ratio, compressed size, compression and decompression speed. |
135 | One can select compression levels starting from `-b` and ending with `-e`. |
136 | The `-i` parameter selects minimal time used for each of tested levels. |
137 | |
138 | |
139 | ### Usage of Command Line Interface |
140 | The full list of options can be obtained with `-h` or `-H` parameter: |
141 | ``` |
142 | Usage : |
143 | zstd [args] [FILE(s)] [-o file] |
144 | |
145 | FILE : a filename |
146 | with no FILE, or when FILE is - , read standard input |
147 | Arguments : |
148 | -# : # compression level (1-19, default: 3) |
149 | -d : decompression |
150 | -D DICT: use DICT as Dictionary for compression or decompression |
151 | -o file: result stored into `file` (only 1 output file) |
152 | -f : overwrite output without prompting, also (de)compress links |
153 | --rm : remove source file(s) after successful de/compression |
154 | -k : preserve source file(s) (default) |
155 | -h/-H : display help/long help and exit |
156 | |
157 | Advanced arguments : |
158 | -V : display Version number and exit |
159 | -c : write to standard output (even if it is the console) |
160 | -v : verbose mode; specify multiple times to increase verbosity |
161 | -q : suppress warnings; specify twice to suppress errors too |
162 | --no-progress : do not display the progress counter |
163 | -r : operate recursively on directories |
164 | --filelist FILE : read list of files to operate upon from FILE |
165 | --output-dir-flat DIR : processed files are stored into DIR |
166 | --output-dir-mirror DIR : processed files are stored into DIR respecting original directory structure |
167 | --[no-]asyncio : use asynchronous IO (default: enabled) |
168 | --[no-]check : during compression, add XXH64 integrity checksum to frame (default: enabled). If specified with -d, decompressor will ignore/validate checksums in compressed frame (default: validate). |
169 | -- : All arguments after "--" are treated as files |
170 | |
171 | Advanced compression arguments : |
172 | --ultra : enable levels beyond 19, up to 22 (requires more memory) |
173 | --long[=#]: enable long distance matching with given window log (default: 27) |
174 | --fast[=#]: switch to very fast compression levels (default: 1) |
175 | --adapt : dynamically adapt compression level to I/O conditions |
176 | --patch-from=FILE : specify the file to be used as a reference point for zstd's diff engine |
177 | -T# : spawns # compression threads (default: 1, 0==# cores) |
178 | -B# : select size of each job (default: 0==automatic) |
179 | --single-thread : use a single thread for both I/O and compression (result slightly different than -T1) |
180 | --rsyncable : compress using a rsync-friendly method (-B sets block size) |
181 | --exclude-compressed: only compress files that are not already compressed |
182 | --stream-size=# : specify size of streaming input from `stdin` |
183 | --size-hint=# optimize compression parameters for streaming input of approximately this size |
184 | --target-compressed-block-size=# : generate compressed block of approximately targeted size |
185 | --no-dictID : don't write dictID into header (dictionary compression only) |
186 | --[no-]compress-literals : force (un)compressed literals |
187 | --format=zstd : compress files to the .zst format (default) |
188 | --format=gzip : compress files to the .gz format |
189 | --format=xz : compress files to the .xz format |
190 | --format=lzma : compress files to the .lzma format |
191 | --format=lz4 : compress files to the .lz4 format |
192 | |
193 | Advanced decompression arguments : |
194 | -l : print information about zstd compressed files |
195 | --test : test compressed file integrity |
196 | -M# : Set a memory usage limit for decompression |
197 | --[no-]sparse : sparse mode (default: disabled) |
198 | |
199 | Dictionary builder : |
200 | --train ## : create a dictionary from a training set of files |
201 | --train-cover[=k=#,d=#,steps=#,split=#,shrink[=#]] : use the cover algorithm with optional args |
202 | --train-fastcover[=k=#,d=#,f=#,steps=#,split=#,accel=#,shrink[=#]] : use the fast cover algorithm with optional args |
203 | --train-legacy[=s=#] : use the legacy algorithm with selectivity (default: 9) |
204 | -o DICT : DICT is dictionary name (default: dictionary) |
205 | --maxdict=# : limit dictionary to specified size (default: 112640) |
206 | --dictID=# : force dictionary ID to specified value (default: random) |
207 | |
208 | Benchmark arguments : |
209 | -b# : benchmark file(s), using # compression level (default: 3) |
210 | -e# : test all compression levels successively from -b# to -e# (default: 1) |
211 | -i# : minimum evaluation time in seconds (default: 3s) |
212 | -B# : cut file into independent chunks of size # (default: no chunking) |
213 | -S : output one benchmark result per input file (default: consolidated result) |
214 | --priority=rt : set process priority to real-time |
215 | ``` |
216 | |
217 | ### Passing parameters through Environment Variables |
218 | There is no "generic" way to pass "any kind of parameter" to `zstd` in a pass-through manner. |
219 | Using environment variables for this purpose has security implications. |
220 | Therefore, this avenue is intentionally restricted and only supports `ZSTD_CLEVEL` and `ZSTD_NBTHREADS`. |
221 | |
222 | `ZSTD_CLEVEL` can be used to modify the default compression level of `zstd` |
223 | (usually set to `3`) to another value between 1 and 19 (the "normal" range). |
224 | |
225 | `ZSTD_NBTHREADS` can be used to specify a number of threads |
226 | that `zstd` will use for compression, which by default is `1`. |
227 | This functionality only exists when `zstd` is compiled with multithread support. |
228 | `0` means "use as many threads as detected cpu cores on local system". |
229 | The max # of threads is capped at `ZSTDMT_NBWORKERS_MAX`, |
230 | which is either 64 in 32-bit mode, or 256 for 64-bit environments. |
231 | |
232 | This functionality can be useful when `zstd` CLI is invoked in a way that doesn't allow passing arguments. |
233 | One such scenario is `tar --zstd`. |
234 | As `ZSTD_CLEVEL` and `ZSTD_NBTHREADS` only replace the default compression level |
235 | and number of threads respectively, they can both be overridden by corresponding command line arguments: |
236 | `-#` for compression level and `-T#` for number of threads. |
237 | |
238 | |
239 | ### Long distance matching mode |
240 | The long distance matching mode, enabled with `--long`, is designed to improve |
241 | the compression ratio for files with long matches at a large distance (up to the |
242 | maximum window size, `128 MiB`) while still maintaining compression speed. |
243 | |
244 | Enabling this mode sets the window size to `128 MiB` and thus increases the memory |
245 | usage for both the compressor and decompressor. Performance in terms of speed is |
246 | dependent on long matches being found. Compression speed may degrade if few long |
247 | matches are found. Decompression speed usually improves when there are many long |
248 | distance matches. |
249 | |
250 | Below are graphs comparing the compression speed, compression ratio, and |
251 | decompression speed with and without long distance matching on an ideal use |
252 | case: a tar of four versions of clang (versions `3.4.1`, `3.4.2`, `3.5.0`, |
253 | `3.5.1`) with a total size of `244889600 B`. This is an ideal use case as there |
254 | are many long distance matches within the maximum window size of `128 MiB` (each |
255 | version is less than `128 MiB`). |
256 | |
257 | Compression Speed vs Ratio | Decompression Speed |
258 | ---------------------------|--------------------- |
259 | ![Compression Speed vs Ratio](https://raw.githubusercontent.com/facebook/zstd/v1.3.3/doc/images/ldmCspeed.png "Compression Speed vs Ratio") | ![Decompression Speed](https://raw.githubusercontent.com/facebook/zstd/v1.3.3/doc/images/ldmDspeed.png "Decompression Speed") |
260 | |
261 | | Method | Compression ratio | Compression speed | Decompression speed | |
262 | |:-------|------------------:|-------------------------:|---------------------------:| |
263 | | `zstd -1` | `5.065` | `284.8 MB/s` | `759.3 MB/s` | |
264 | | `zstd -5` | `5.826` | `124.9 MB/s` | `674.0 MB/s` | |
265 | | `zstd -10` | `6.504` | `29.5 MB/s` | `771.3 MB/s` | |
266 | | `zstd -1 --long` | `17.426` | `220.6 MB/s` | `1638.4 MB/s` | |
267 | | `zstd -5 --long` | `19.661` | `165.5 MB/s` | `1530.6 MB/s` | |
268 | | `zstd -10 --long`| `21.949` | `75.6 MB/s` | `1632.6 MB/s` | |
269 | |
270 | On this file, the compression ratio improves significantly with minimal impact |
271 | on compression speed, and the decompression speed doubles. |
272 | |
273 | On the other extreme, compressing a file with few long distance matches (such as |
274 | the [Silesia compression corpus]) will likely lead to a deterioration in |
275 | compression speed (for lower levels) with minimal change in compression ratio. |
276 | |
277 | The below table illustrates this on the [Silesia compression corpus]. |
278 | |
279 | [Silesia compression corpus]: https://sun.aei.polsl.pl//~sdeor/index.php?page=silesia |
280 | |
281 | | Method | Compression ratio | Compression speed | Decompression speed | |
282 | |:-------|------------------:|------------------:|---------------------:| |
283 | | `zstd -1` | `2.878` | `231.7 MB/s` | `594.4 MB/s` | |
284 | | `zstd -1 --long` | `2.929` | `106.5 MB/s` | `517.9 MB/s` | |
285 | | `zstd -5` | `3.274` | `77.1 MB/s` | `464.2 MB/s` | |
286 | | `zstd -5 --long` | `3.319` | `51.7 MB/s` | `371.9 MB/s` | |
287 | | `zstd -10` | `3.523` | `16.4 MB/s` | `489.2 MB/s` | |
288 | | `zstd -10 --long`| `3.566` | `16.2 MB/s` | `415.7 MB/s` | |
289 | |
290 | |
291 | ### zstdgrep |
292 | |
293 | `zstdgrep` is a utility which makes it possible to `grep` directly a `.zst` compressed file. |
294 | It's used the same way as normal `grep`, for example : |
295 | `zstdgrep pattern file.zst` |
296 | |
297 | `zstdgrep` is _not_ compatible with dictionary compression. |
298 | |
299 | To search into a file compressed with a dictionary, |
300 | it's necessary to decompress it using `zstd` or `zstdcat`, |
301 | and then pipe the result to `grep`. For example : |
302 | `zstdcat -D dictionary -qc -- file.zst | grep pattern` |