648db22b |
1 | zstd(1) -- zstd, zstdmt, unzstd, zstdcat - Compress or decompress .zst files |
2 | ============================================================================ |
3 | |
4 | SYNOPSIS |
5 | -------- |
6 | |
7 | `zstd` [<OPTIONS>] [-|<INPUT-FILE>] [-o <OUTPUT-FILE>] |
8 | |
9 | `zstdmt` is equivalent to `zstd -T0` |
10 | |
11 | `unzstd` is equivalent to `zstd -d` |
12 | |
13 | `zstdcat` is equivalent to `zstd -dcf` |
14 | |
15 | |
16 | DESCRIPTION |
17 | ----------- |
18 | `zstd` is a fast lossless compression algorithm and data compression tool, |
19 | with command line syntax similar to `gzip`(1) and `xz`(1). |
20 | It is based on the **LZ77** family, with further FSE & huff0 entropy stages. |
21 | `zstd` offers highly configurable compression speed, |
22 | from fast modes at > 200 MB/s per core, |
23 | to strong modes with excellent compression ratios. |
24 | It also features a very fast decoder, with speeds > 500 MB/s per core. |
25 | |
26 | `zstd` command line syntax is generally similar to gzip, |
27 | but features the following differences: |
28 | |
29 | - Source files are preserved by default. |
30 | It's possible to remove them automatically by using the `--rm` command. |
31 | - When compressing a single file, `zstd` displays progress notifications |
32 | and result summary by default. |
33 | Use `-q` to turn them off. |
34 | - `zstd` displays a short help page when command line is an error. |
35 | Use `-q` to turn it off. |
36 | - `zstd` does not accept input from console, |
37 | though it does accept `stdin` when it's not the console. |
38 | - `zstd` does not store the input's filename or attributes, only its contents. |
39 | |
40 | `zstd` processes each _file_ according to the selected operation mode. |
41 | If no _files_ are given or _file_ is `-`, `zstd` reads from standard input |
42 | and writes the processed data to standard output. |
43 | `zstd` will refuse to write compressed data to standard output |
44 | if it is a terminal: it will display an error message and skip the file. |
45 | Similarly, `zstd` will refuse to read compressed data from standard input |
46 | if it is a terminal. |
47 | |
48 | Unless `--stdout` or `-o` is specified, _files_ are written to a new file |
49 | whose name is derived from the source _file_ name: |
50 | |
51 | * When compressing, the suffix `.zst` is appended to the source filename to |
52 | get the target filename. |
53 | * When decompressing, the `.zst` suffix is removed from the source filename to |
54 | get the target filename |
55 | |
56 | ### Concatenation with .zst Files |
57 | It is possible to concatenate multiple `.zst` files. `zstd` will decompress |
58 | such agglomerated file as if it was a single `.zst` file. |
59 | |
60 | OPTIONS |
61 | ------- |
62 | |
63 | ### Integer Suffixes and Special Values |
64 | |
65 | In most places where an integer argument is expected, |
66 | an optional suffix is supported to easily indicate large integers. |
67 | There must be no space between the integer and the suffix. |
68 | |
69 | * `KiB`: |
70 | Multiply the integer by 1,024 (2\^10). |
71 | `Ki`, `K`, and `KB` are accepted as synonyms for `KiB`. |
72 | * `MiB`: |
73 | Multiply the integer by 1,048,576 (2\^20). |
74 | `Mi`, `M`, and `MB` are accepted as synonyms for `MiB`. |
75 | |
76 | ### Operation Mode |
77 | |
78 | If multiple operation mode options are given, |
79 | the last one takes effect. |
80 | |
81 | * `-z`, `--compress`: |
82 | Compress. |
83 | This is the default operation mode when no operation mode option is specified |
84 | and no other operation mode is implied from the command name |
85 | (for example, `unzstd` implies `--decompress`). |
86 | * `-d`, `--decompress`, `--uncompress`: |
87 | Decompress. |
88 | * `-t`, `--test`: |
89 | Test the integrity of compressed _files_. |
90 | This option is equivalent to `--decompress --stdout > /dev/null`, |
91 | decompressed data is discarded and checksummed for errors. |
92 | No files are created or removed. |
93 | * `-b#`: |
94 | Benchmark file(s) using compression level _#_. |
95 | See _BENCHMARK_ below for a description of this operation. |
96 | * `--train FILES`: |
97 | Use _FILES_ as a training set to create a dictionary. |
98 | The training set should contain a lot of small files (> 100). |
99 | See _DICTIONARY BUILDER_ below for a description of this operation. |
100 | * `-l`, `--list`: |
101 | Display information related to a zstd compressed file, such as size, ratio, and checksum. |
102 | Some of these fields may not be available. |
103 | This command's output can be augmented with the `-v` modifier. |
104 | |
105 | ### Operation Modifiers |
106 | |
107 | * `-#`: |
108 | selects `#` compression level \[1-19\] (default: 3) |
109 | * `--ultra`: |
110 | unlocks high compression levels 20+ (maximum 22), using a lot more memory. |
111 | Note that decompression will also require more memory when using these levels. |
112 | * `--fast[=#]`: |
113 | switch to ultra-fast compression levels. |
114 | If `=#` is not present, it defaults to `1`. |
115 | The higher the value, the faster the compression speed, |
116 | at the cost of some compression ratio. |
117 | This setting overwrites compression level if one was set previously. |
118 | Similarly, if a compression level is set after `--fast`, it overrides it. |
119 | * `-T#`, `--threads=#`: |
120 | Compress using `#` working threads (default: 1). |
121 | If `#` is 0, attempt to detect and use the number of physical CPU cores. |
122 | In all cases, the nb of threads is capped to `ZSTDMT_NBWORKERS_MAX`, |
123 | which is either 64 in 32-bit mode, or 256 for 64-bit environments. |
124 | This modifier does nothing if `zstd` is compiled without multithread support. |
125 | * `--single-thread`: |
126 | Use a single thread for both I/O and compression. |
127 | As compression is serialized with I/O, this can be slightly slower. |
128 | Single-thread mode features significantly lower memory usage, |
129 | which can be useful for systems with limited amount of memory, such as 32-bit systems. |
130 | |
131 | Note 1: this mode is the only available one when multithread support is disabled. |
132 | |
133 | Note 2: this mode is different from `-T1`, which spawns 1 compression thread in parallel with I/O. |
134 | Final compressed result is also slightly different from `-T1`. |
135 | * `--auto-threads={physical,logical} (default: physical)`: |
136 | When using a default amount of threads via `-T0`, choose the default based on the number |
137 | of detected physical or logical cores. |
138 | * `--adapt[=min=#,max=#]`: |
139 | `zstd` will dynamically adapt compression level to perceived I/O conditions. |
140 | Compression level adaptation can be observed live by using command `-v`. |
141 | Adaptation can be constrained between supplied `min` and `max` levels. |
142 | The feature works when combined with multi-threading and `--long` mode. |
143 | It does not work with `--single-thread`. |
144 | It sets window size to 8 MiB by default (can be changed manually, see `wlog`). |
145 | Due to the chaotic nature of dynamic adaptation, compressed result is not reproducible. |
146 | |
147 | _Note_: at the time of this writing, `--adapt` can remain stuck at low speed |
148 | when combined with multiple worker threads (>=2). |
149 | * `--long[=#]`: |
150 | enables long distance matching with `#` `windowLog`, if `#` is not |
151 | present it defaults to `27`. |
152 | This increases the window size (`windowLog`) and memory usage for both the |
153 | compressor and decompressor. |
154 | This setting is designed to improve the compression ratio for files with |
155 | long matches at a large distance. |
156 | |
157 | Note: If `windowLog` is set to larger than 27, `--long=windowLog` or |
158 | `--memory=windowSize` needs to be passed to the decompressor. |
159 | * `-D DICT`: |
160 | use `DICT` as Dictionary to compress or decompress FILE(s) |
161 | * `--patch-from FILE`: |
162 | Specify the file to be used as a reference point for zstd's diff engine. |
163 | This is effectively dictionary compression with some convenient parameter |
164 | selection, namely that _windowSize_ > _srcSize_. |
165 | |
166 | Note: cannot use both this and `-D` together. |
167 | |
168 | Note: `--long` mode will be automatically activated if _chainLog_ < _fileLog_ |
169 | (_fileLog_ being the _windowLog_ required to cover the whole file). You |
170 | can also manually force it. |
171 | |
172 | Note: for all levels, you can use `--patch-from` in `--single-thread` mode |
173 | to improve compression ratio at the cost of speed. |
174 | |
175 | Note: for level 19, you can get increased compression ratio at the cost |
176 | of speed by specifying `--zstd=targetLength=` to be something large |
177 | (i.e. 4096), and by setting a large `--zstd=chainLog=`. |
178 | * `--rsyncable`: |
179 | `zstd` will periodically synchronize the compression state to make the |
180 | compressed file more rsync-friendly. |
181 | There is a negligible impact to compression ratio, |
182 | and a potential impact to compression speed, perceptible at higher speeds, |
183 | for example when combining `--rsyncable` with many parallel worker threads. |
184 | This feature does not work with `--single-thread`. You probably don't want |
185 | to use it with long range mode, since it will decrease the effectiveness of |
186 | the synchronization points, but your mileage may vary. |
187 | * `-C`, `--[no-]check`: |
188 | add integrity check computed from uncompressed data (default: enabled) |
189 | * `--[no-]content-size`: |
190 | enable / disable whether or not the original size of the file is placed in |
191 | the header of the compressed file. The default option is |
192 | `--content-size` (meaning that the original size will be placed in the header). |
193 | * `--no-dictID`: |
194 | do not store dictionary ID within frame header (dictionary compression). |
195 | The decoder will have to rely on implicit knowledge about which dictionary to use, |
196 | it won't be able to check if it's correct. |
197 | * `-M#`, `--memory=#`: |
198 | Set a memory usage limit. By default, `zstd` uses 128 MiB for decompression |
199 | as the maximum amount of memory the decompressor is allowed to use, but you can |
200 | override this manually if need be in either direction (i.e. you can increase or |
201 | decrease it). |
202 | |
203 | This is also used during compression when using with `--patch-from=`. In this case, |
204 | this parameter overrides that maximum size allowed for a dictionary. (128 MiB). |
205 | |
206 | Additionally, this can be used to limit memory for dictionary training. This parameter |
207 | overrides the default limit of 2 GiB. zstd will load training samples up to the memory limit |
208 | and ignore the rest. |
209 | * `--stream-size=#`: |
210 | Sets the pledged source size of input coming from a stream. This value must be exact, as it |
211 | will be included in the produced frame header. Incorrect stream sizes will cause an error. |
212 | This information will be used to better optimize compression parameters, resulting in |
213 | better and potentially faster compression, especially for smaller source sizes. |
214 | * `--size-hint=#`: |
215 | When handling input from a stream, `zstd` must guess how large the source size |
216 | will be when optimizing compression parameters. If the stream size is relatively |
217 | small, this guess may be a poor one, resulting in a higher compression ratio than |
218 | expected. This feature allows for controlling the guess when needed. |
219 | Exact guesses result in better compression ratios. Overestimates result in slightly |
220 | degraded compression ratios, while underestimates may result in significant degradation. |
221 | * `-o FILE`: |
222 | save result into `FILE`. |
223 | * `-f`, `--force`: |
224 | disable input and output checks. Allows overwriting existing files, input |
225 | from console, output to stdout, operating on links, block devices, etc. |
226 | During decompression and when the output destination is stdout, pass-through |
227 | unrecognized formats as-is. |
228 | * `-c`, `--stdout`: |
229 | write to standard output (even if it is the console); keep original files unchanged. |
230 | * `--[no-]sparse`: |
231 | enable / disable sparse FS support, |
232 | to make files with many zeroes smaller on disk. |
233 | Creating sparse files may save disk space and speed up decompression by |
234 | reducing the amount of disk I/O. |
235 | default: enabled when output is into a file, |
236 | and disabled when output is stdout. |
237 | This setting overrides default and can force sparse mode over stdout. |
238 | * `--[no-]pass-through` |
239 | enable / disable passing through uncompressed files as-is. During |
240 | decompression when pass-through is enabled, unrecognized formats will be |
241 | copied as-is from the input to the output. By default, pass-through will |
242 | occur when the output destination is stdout and the force (`-f`) option is |
243 | set. |
244 | * `--rm`: |
245 | remove source file(s) after successful compression or decompression. |
246 | This command is silently ignored if output is `stdout`. |
247 | If used in combination with `-o`, |
248 | triggers a confirmation prompt (which can be silenced with `-f`), as this is a destructive operation. |
249 | * `-k`, `--keep`: |
250 | keep source file(s) after successful compression or decompression. |
251 | This is the default behavior. |
252 | * `-r`: |
253 | operate recursively on directories. |
254 | It selects all files in the named directory and all its subdirectories. |
255 | This can be useful both to reduce command line typing, |
256 | and to circumvent shell expansion limitations, |
257 | when there are a lot of files and naming breaks the maximum size of a command line. |
258 | * `--filelist FILE` |
259 | read a list of files to process as content from `FILE`. |
260 | Format is compatible with `ls` output, with one file per line. |
261 | * `--output-dir-flat DIR`: |
262 | resulting files are stored into target `DIR` directory, |
263 | instead of same directory as origin file. |
264 | Be aware that this command can introduce name collision issues, |
265 | if multiple files, from different directories, end up having the same name. |
266 | Collision resolution ensures first file with a given name will be present in `DIR`, |
267 | while in combination with `-f`, the last file will be present instead. |
268 | * `--output-dir-mirror DIR`: |
269 | similar to `--output-dir-flat`, |
270 | the output files are stored underneath target `DIR` directory, |
271 | but this option will replicate input directory hierarchy into output `DIR`. |
272 | |
273 | If input directory contains "..", the files in this directory will be ignored. |
274 | If input directory is an absolute directory (i.e. "/var/tmp/abc"), |
275 | it will be stored into the "output-dir/var/tmp/abc". |
276 | If there are multiple input files or directories, |
277 | name collision resolution will follow the same rules as `--output-dir-flat`. |
278 | * `--format=FORMAT`: |
279 | compress and decompress in other formats. If compiled with |
280 | support, zstd can compress to or decompress from other compression algorithm |
281 | formats. Possibly available options are `zstd`, `gzip`, `xz`, `lzma`, and `lz4`. |
282 | If no such format is provided, `zstd` is the default. |
283 | * `-h`/`-H`, `--help`: |
284 | display help/long help and exit |
285 | * `-V`, `--version`: |
286 | display version number and exit. |
287 | Advanced: `-vV` also displays supported formats. |
288 | `-vvV` also displays POSIX support. |
289 | `-q` will only display the version number, suitable for machine reading. |
290 | * `-v`, `--verbose`: |
291 | verbose mode, display more information |
292 | * `-q`, `--quiet`: |
293 | suppress warnings, interactivity, and notifications. |
294 | specify twice to suppress errors too. |
295 | * `--no-progress`: |
296 | do not display the progress bar, but keep all other messages. |
297 | * `--show-default-cparams`: |
298 | shows the default compression parameters that will be used for a particular input file, based on the provided compression level and the input size. |
299 | If the provided file is not a regular file (e.g. a pipe), this flag will output the parameters used for inputs of unknown size. |
300 | * `--`: |
301 | All arguments after `--` are treated as files |
302 | |
303 | |
304 | ### gzip Operation Modifiers |
305 | When invoked via a `gzip` symlink, `zstd` will support further |
306 | options that intend to mimic the `gzip` behavior: |
307 | |
308 | * `-n`, `--no-name`: |
309 | do not store the original filename and timestamps when compressing |
310 | a file. This is the default behavior and hence a no-op. |
311 | * `--best`: |
312 | alias to the option `-9`. |
313 | |
314 | |
315 | ### Environment Variables |
316 | |
317 | Employing environment variables to set parameters has security implications. |
318 | Therefore, this avenue is intentionally limited. |
319 | Only `ZSTD_CLEVEL` and `ZSTD_NBTHREADS` are currently supported. |
320 | They set the compression level and number of threads to use during compression, respectively. |
321 | |
322 | `ZSTD_CLEVEL` can be used to set the level between 1 and 19 (the "normal" range). |
323 | If the value of `ZSTD_CLEVEL` is not a valid integer, it will be ignored with a warning message. |
324 | `ZSTD_CLEVEL` just replaces the default compression level (`3`). |
325 | |
326 | `ZSTD_NBTHREADS` can be used to set the number of threads `zstd` will attempt to use during compression. |
327 | If the value of `ZSTD_NBTHREADS` is not a valid unsigned integer, it will be ignored with a warning message. |
328 | `ZSTD_NBTHREADS` has a default value of (`1`), and is capped at ZSTDMT_NBWORKERS_MAX==200. |
329 | `zstd` must be compiled with multithread support for this to have any effect. |
330 | |
331 | They can both be overridden by corresponding command line arguments: |
332 | `-#` for compression level and `-T#` for number of compression threads. |
333 | |
334 | |
335 | DICTIONARY BUILDER |
336 | ------------------ |
337 | `zstd` offers _dictionary_ compression, |
338 | which greatly improves efficiency on small files and messages. |
339 | It's possible to train `zstd` with a set of samples, |
340 | the result of which is saved into a file called a `dictionary`. |
341 | Then, during compression and decompression, reference the same dictionary, |
342 | using command `-D dictionaryFileName`. |
343 | Compression of small files similar to the sample set will be greatly improved. |
344 | |
345 | * `--train FILEs`: |
346 | Use FILEs as training set to create a dictionary. |
347 | The training set should ideally contain a lot of samples (> 100), |
348 | and weight typically 100x the target dictionary size |
349 | (for example, ~10 MB for a 100 KB dictionary). |
350 | `--train` can be combined with `-r` to indicate a directory rather than listing all the files, |
351 | which can be useful to circumvent shell expansion limits. |
352 | |
353 | Since dictionary compression is mostly effective for small files, |
354 | the expectation is that the training set will only contain small files. |
355 | In the case where some samples happen to be large, |
356 | only the first 128 KiB of these samples will be used for training. |
357 | |
358 | `--train` supports multithreading if `zstd` is compiled with threading support (default). |
359 | Additional advanced parameters can be specified with `--train-fastcover`. |
360 | The legacy dictionary builder can be accessed with `--train-legacy`. |
361 | The slower cover dictionary builder can be accessed with `--train-cover`. |
362 | Default `--train` is equivalent to `--train-fastcover=d=8,steps=4`. |
363 | |
364 | * `-o FILE`: |
365 | Dictionary saved into `FILE` (default name: dictionary). |
366 | * `--maxdict=#`: |
367 | Limit dictionary to specified size (default: 112640 bytes). |
368 | As usual, quantities are expressed in bytes by default, |
369 | and it's possible to employ suffixes (like `KB` or `MB`) |
370 | to specify larger values. |
371 | * `-#`: |
372 | Use `#` compression level during training (optional). |
373 | Will generate statistics more tuned for selected compression level, |
374 | resulting in a _small_ compression ratio improvement for this level. |
375 | * `-B#`: |
376 | Split input files into blocks of size # (default: no split) |
377 | * `-M#`, `--memory=#`: |
378 | Limit the amount of sample data loaded for training (default: 2 GB). |
379 | Note that the default (2 GB) is also the maximum. |
380 | This parameter can be useful in situations where the training set size |
381 | is not well controlled and could be potentially very large. |
382 | Since speed of the training process is directly correlated to |
383 | the size of the training sample set, |
384 | a smaller sample set leads to faster training. |
385 | |
386 | In situations where the training set is larger than maximum memory, |
387 | the CLI will randomly select samples among the available ones, |
388 | up to the maximum allowed memory budget. |
389 | This is meant to improve dictionary relevance |
390 | by mitigating the potential impact of clustering, |
391 | such as selecting only files from the beginning of a list |
392 | sorted by modification date, or sorted by alphabetical order. |
393 | The randomization process is deterministic, so |
394 | training of the same list of files with the same parameters |
395 | will lead to the creation of the same dictionary. |
396 | |
397 | * `--dictID=#`: |
398 | A dictionary ID is a locally unique ID. |
399 | The decoder will use this value to verify it is using the right dictionary. |
400 | By default, zstd will create a 4-bytes random number ID. |
401 | It's possible to provide an explicit number ID instead. |
402 | It's up to the dictionary manager to not assign twice the same ID to |
403 | 2 different dictionaries. |
404 | Note that short numbers have an advantage: |
405 | an ID < 256 will only need 1 byte in the compressed frame header, |
406 | and an ID < 65536 will only need 2 bytes. |
407 | This compares favorably to 4 bytes default. |
408 | |
409 | Note that RFC8878 reserves IDs less than 32768 and greater than or equal to 2\^31, so they should not be used in public. |
410 | |
411 | * `--train-cover[=k#,d=#,steps=#,split=#,shrink[=#]]`: |
412 | Select parameters for the default dictionary builder algorithm named cover. |
413 | If _d_ is not specified, then it tries _d_ = 6 and _d_ = 8. |
414 | If _k_ is not specified, then it tries _steps_ values in the range [50, 2000]. |
415 | If _steps_ is not specified, then the default value of 40 is used. |
416 | If _split_ is not specified or split <= 0, then the default value of 100 is used. |
417 | Requires that _d_ <= _k_. |
418 | If _shrink_ flag is not used, then the default value for _shrinkDict_ of 0 is used. |
419 | If _shrink_ is not specified, then the default value for _shrinkDictMaxRegression_ of 1 is used. |
420 | |
421 | Selects segments of size _k_ with highest score to put in the dictionary. |
422 | The score of a segment is computed by the sum of the frequencies of all the |
423 | subsegments of size _d_. |
424 | Generally _d_ should be in the range [6, 8], occasionally up to 16, but the |
425 | algorithm will run faster with d <= _8_. |
426 | Good values for _k_ vary widely based on the input data, but a safe range is |
427 | [2 * _d_, 2000]. |
428 | If _split_ is 100, all input samples are used for both training and testing |
429 | to find optimal _d_ and _k_ to build dictionary. |
430 | Supports multithreading if `zstd` is compiled with threading support. |
431 | Having _shrink_ enabled takes a truncated dictionary of minimum size and doubles |
432 | in size until compression ratio of the truncated dictionary is at most |
433 | _shrinkDictMaxRegression%_ worse than the compression ratio of the largest dictionary. |
434 | |
435 | Examples: |
436 | |
437 | `zstd --train-cover FILEs` |
438 | |
439 | `zstd --train-cover=k=50,d=8 FILEs` |
440 | |
441 | `zstd --train-cover=d=8,steps=500 FILEs` |
442 | |
443 | `zstd --train-cover=k=50 FILEs` |
444 | |
445 | `zstd --train-cover=k=50,split=60 FILEs` |
446 | |
447 | `zstd --train-cover=shrink FILEs` |
448 | |
449 | `zstd --train-cover=shrink=2 FILEs` |
450 | |
451 | * `--train-fastcover[=k#,d=#,f=#,steps=#,split=#,accel=#]`: |
452 | Same as cover but with extra parameters _f_ and _accel_ and different default value of split |
453 | If _split_ is not specified, then it tries _split_ = 75. |
454 | If _f_ is not specified, then it tries _f_ = 20. |
455 | Requires that 0 < _f_ < 32. |
456 | If _accel_ is not specified, then it tries _accel_ = 1. |
457 | Requires that 0 < _accel_ <= 10. |
458 | Requires that _d_ = 6 or _d_ = 8. |
459 | |
460 | _f_ is log of size of array that keeps track of frequency of subsegments of size _d_. |
461 | The subsegment is hashed to an index in the range [0,2^_f_ - 1]. |
462 | It is possible that 2 different subsegments are hashed to the same index, and they are considered as the same subsegment when computing frequency. |
463 | Using a higher _f_ reduces collision but takes longer. |
464 | |
465 | Examples: |
466 | |
467 | `zstd --train-fastcover FILEs` |
468 | |
469 | `zstd --train-fastcover=d=8,f=15,accel=2 FILEs` |
470 | |
471 | * `--train-legacy[=selectivity=#]`: |
472 | Use legacy dictionary builder algorithm with the given dictionary |
473 | _selectivity_ (default: 9). |
474 | The smaller the _selectivity_ value, the denser the dictionary, |
475 | improving its efficiency but reducing its achievable maximum size. |
476 | `--train-legacy=s=#` is also accepted. |
477 | |
478 | Examples: |
479 | |
480 | `zstd --train-legacy FILEs` |
481 | |
482 | `zstd --train-legacy=selectivity=8 FILEs` |
483 | |
484 | |
485 | BENCHMARK |
486 | --------- |
487 | |
488 | * `-b#`: |
489 | benchmark file(s) using compression level # |
490 | * `-e#`: |
491 | benchmark file(s) using multiple compression levels, from `-b#` to `-e#` (inclusive) |
492 | * `-i#`: |
493 | minimum evaluation time, in seconds (default: 3s), benchmark mode only |
494 | * `-B#`, `--block-size=#`: |
495 | cut file(s) into independent chunks of size # (default: no chunking) |
496 | * `--priority=rt`: |
497 | set process priority to real-time |
498 | |
499 | **Output Format:** CompressionLevel#Filename: InputSize -> OutputSize (CompressionRatio), CompressionSpeed, DecompressionSpeed |
500 | |
501 | **Methodology:** For both compression and decompression speed, the entire input is compressed/decompressed in-memory to measure speed. A run lasts at least 1 sec, so when files are small, they are compressed/decompressed several times per run, in order to improve measurement accuracy. |
502 | |
503 | ADVANCED COMPRESSION OPTIONS |
504 | ---------------------------- |
505 | ### -B#: |
506 | Specify the size of each compression job. |
507 | This parameter is only available when multi-threading is enabled. |
508 | Each compression job is run in parallel, so this value indirectly impacts the nb of active threads. |
509 | Default job size varies depending on compression level (generally `4 * windowSize`). |
510 | `-B#` makes it possible to manually select a custom size. |
511 | Note that job size must respect a minimum value which is enforced transparently. |
512 | This minimum is either 512 KB, or `overlapSize`, whichever is largest. |
513 | Different job sizes will lead to non-identical compressed frames. |
514 | |
515 | ### --zstd[=options]: |
516 | `zstd` provides 22 predefined regular compression levels plus the fast levels. |
517 | This compression level is translated internally into a number of specific parameters that actually control the behavior of the compressor. |
518 | (You can see the result of this translation with `--show-default-cparams`.) |
519 | These specific parameters can be overridden with advanced compression options. |
520 | The _options_ are provided as a comma-separated list. |
521 | You may specify only the options you want to change and the rest will be |
522 | taken from the selected or default compression level. |
523 | The list of available _options_: |
524 | |
525 | - `strategy`=_strat_, `strat`=_strat_: |
526 | Specify a strategy used by a match finder. |
527 | |
528 | There are 9 strategies numbered from 1 to 9, from fastest to strongest: |
529 | 1=`ZSTD_fast`, 2=`ZSTD_dfast`, 3=`ZSTD_greedy`, |
530 | 4=`ZSTD_lazy`, 5=`ZSTD_lazy2`, 6=`ZSTD_btlazy2`, |
531 | 7=`ZSTD_btopt`, 8=`ZSTD_btultra`, 9=`ZSTD_btultra2`. |
532 | |
533 | - `windowLog`=_wlog_, `wlog`=_wlog_: |
534 | Specify the maximum number of bits for a match distance. |
535 | |
536 | The higher number of increases the chance to find a match which usually |
537 | improves compression ratio. |
538 | It also increases memory requirements for the compressor and decompressor. |
539 | The minimum _wlog_ is 10 (1 KiB) and the maximum is 30 (1 GiB) on 32-bit |
540 | platforms and 31 (2 GiB) on 64-bit platforms. |
541 | |
542 | Note: If `windowLog` is set to larger than 27, `--long=windowLog` or |
543 | `--memory=windowSize` needs to be passed to the decompressor. |
544 | |
545 | - `hashLog`=_hlog_, `hlog`=_hlog_: |
546 | Specify the maximum number of bits for a hash table. |
547 | |
548 | Bigger hash tables cause fewer collisions which usually makes compression |
549 | faster, but requires more memory during compression. |
550 | |
551 | The minimum _hlog_ is 6 (64 entries / 256 B) and the maximum is 30 (1B entries / 4 GiB). |
552 | |
553 | - `chainLog`=_clog_, `clog`=_clog_: |
554 | Specify the maximum number of bits for the secondary search structure, |
555 | whose form depends on the selected `strategy`. |
556 | |
557 | Higher numbers of bits increases the chance to find a match which usually |
558 | improves compression ratio. |
559 | It also slows down compression speed and increases memory requirements for |
560 | compression. |
561 | This option is ignored for the `ZSTD_fast` `strategy`, which only has the primary hash table. |
562 | |
563 | The minimum _clog_ is 6 (64 entries / 256 B) and the maximum is 29 (512M entries / 2 GiB) on 32-bit platforms |
564 | and 30 (1B entries / 4 GiB) on 64-bit platforms. |
565 | |
566 | - `searchLog`=_slog_, `slog`=_slog_: |
567 | Specify the maximum number of searches in a hash chain or a binary tree |
568 | using logarithmic scale. |
569 | |
570 | More searches increases the chance to find a match which usually increases |
571 | compression ratio but decreases compression speed. |
572 | |
573 | The minimum _slog_ is 1 and the maximum is 'windowLog' - 1. |
574 | |
575 | - `minMatch`=_mml_, `mml`=_mml_: |
576 | Specify the minimum searched length of a match in a hash table. |
577 | |
578 | Larger search lengths usually decrease compression ratio but improve |
579 | decompression speed. |
580 | |
581 | The minimum _mml_ is 3 and the maximum is 7. |
582 | |
583 | - `targetLength`=_tlen_, `tlen`=_tlen_: |
584 | The impact of this field vary depending on selected strategy. |
585 | |
586 | For `ZSTD_btopt`, `ZSTD_btultra` and `ZSTD_btultra2`, it specifies |
587 | the minimum match length that causes match finder to stop searching. |
588 | A larger `targetLength` usually improves compression ratio |
589 | but decreases compression speed. |
590 | |
591 | For `ZSTD_fast`, it triggers ultra-fast mode when > 0. |
592 | The value represents the amount of data skipped between match sampling. |
593 | Impact is reversed: a larger `targetLength` increases compression speed |
594 | but decreases compression ratio. |
595 | |
596 | For all other strategies, this field has no impact. |
597 | |
598 | The minimum _tlen_ is 0 and the maximum is 128 KiB. |
599 | |
600 | - `overlapLog`=_ovlog_, `ovlog`=_ovlog_: |
601 | Determine `overlapSize`, amount of data reloaded from previous job. |
602 | This parameter is only available when multithreading is enabled. |
603 | Reloading more data improves compression ratio, but decreases speed. |
604 | |
605 | The minimum _ovlog_ is 0, and the maximum is 9. |
606 | 1 means "no overlap", hence completely independent jobs. |
607 | 9 means "full overlap", meaning up to `windowSize` is reloaded from previous job. |
608 | Reducing _ovlog_ by 1 reduces the reloaded amount by a factor 2. |
609 | For example, 8 means "windowSize/2", and 6 means "windowSize/8". |
610 | Value 0 is special and means "default": _ovlog_ is automatically determined by `zstd`. |
611 | In which case, _ovlog_ will range from 6 to 9, depending on selected _strat_. |
612 | |
613 | - `ldmHashLog`=_lhlog_, `lhlog`=_lhlog_: |
614 | Specify the maximum size for a hash table used for long distance matching. |
615 | |
616 | This option is ignored unless long distance matching is enabled. |
617 | |
618 | Bigger hash tables usually improve compression ratio at the expense of more |
619 | memory during compression and a decrease in compression speed. |
620 | |
621 | The minimum _lhlog_ is 6 and the maximum is 30 (default: 20). |
622 | |
623 | - `ldmMinMatch`=_lmml_, `lmml`=_lmml_: |
624 | Specify the minimum searched length of a match for long distance matching. |
625 | |
626 | This option is ignored unless long distance matching is enabled. |
627 | |
628 | Larger/very small values usually decrease compression ratio. |
629 | |
630 | The minimum _lmml_ is 4 and the maximum is 4096 (default: 64). |
631 | |
632 | - `ldmBucketSizeLog`=_lblog_, `lblog`=_lblog_: |
633 | Specify the size of each bucket for the hash table used for long distance |
634 | matching. |
635 | |
636 | This option is ignored unless long distance matching is enabled. |
637 | |
638 | Larger bucket sizes improve collision resolution but decrease compression |
639 | speed. |
640 | |
641 | The minimum _lblog_ is 1 and the maximum is 8 (default: 3). |
642 | |
643 | - `ldmHashRateLog`=_lhrlog_, `lhrlog`=_lhrlog_: |
644 | Specify the frequency of inserting entries into the long distance matching |
645 | hash table. |
646 | |
647 | This option is ignored unless long distance matching is enabled. |
648 | |
649 | Larger values will improve compression speed. Deviating far from the |
650 | default value will likely result in a decrease in compression ratio. |
651 | |
652 | The default value is `wlog - lhlog`. |
653 | |
654 | ### Example |
655 | The following parameters sets advanced compression options to something |
656 | similar to predefined level 19 for files bigger than 256 KB: |
657 | |
658 | `--zstd`=wlog=23,clog=23,hlog=22,slog=6,mml=3,tlen=48,strat=6 |
659 | |
660 | SEE ALSO |
661 | -------- |
662 | `zstdgrep`(1), `zstdless`(1), `gzip`(1), `xz`(1) |
663 | |
664 | The <zstandard> format is specified in Y. Collet, "Zstandard Compression and the 'application/zstd' Media Type", https://www.ietf.org/rfc/rfc8878.txt, Internet RFC 8878 (February 2021). |
665 | |
666 | BUGS |
667 | ---- |
668 | Report bugs at: https://github.com/facebook/zstd/issues |
669 | |
670 | AUTHOR |
671 | ------ |
672 | Yann Collet |