648db22b |
1 | zstd(1) -- zstd, zstdmt, unzstd, zstdcat - Compress or decompress .zst files |
2 | ============================================================================ |
3 | |
4 | SYNOPSIS |
5 | -------- |
6 | |
7 | `zstd` [<OPTIONS>] [-|<INPUT-FILE>] [-o <OUTPUT-FILE>] |
8 | |
9 | `zstdmt` is equivalent to `zstd -T0` |
10 | |
11 | `unzstd` is equivalent to `zstd -d` |
12 | |
13 | `zstdcat` is equivalent to `zstd -dcf` |
14 | |
15 | |
16 | DESCRIPTION |
17 | ----------- |
18 | `zstd` is a fast lossless compression algorithm and data compression tool, |
19 | with command line syntax similar to `gzip`(1) and `xz`(1). |
20 | It is based on the **LZ77** family, with further FSE & huff0 entropy stages. |
21 | `zstd` offers highly configurable compression speed, |
22 | from fast modes at > 200 MB/s per core, |
23 | to strong modes with excellent compression ratios. |
f535537f |
24 | It also features a very fast decoder, with speeds > 500 MB/s per core, |
25 | which remains roughly stable at all compression settings. |
648db22b |
26 | |
27 | `zstd` command line syntax is generally similar to gzip, |
f535537f |
28 | but features the following few differences: |
648db22b |
29 | |
30 | - Source files are preserved by default. |
31 | It's possible to remove them automatically by using the `--rm` command. |
32 | - When compressing a single file, `zstd` displays progress notifications |
33 | and result summary by default. |
34 | Use `-q` to turn them off. |
35 | - `zstd` displays a short help page when command line is an error. |
36 | Use `-q` to turn it off. |
37 | - `zstd` does not accept input from console, |
38 | though it does accept `stdin` when it's not the console. |
39 | - `zstd` does not store the input's filename or attributes, only its contents. |
40 | |
41 | `zstd` processes each _file_ according to the selected operation mode. |
42 | If no _files_ are given or _file_ is `-`, `zstd` reads from standard input |
43 | and writes the processed data to standard output. |
44 | `zstd` will refuse to write compressed data to standard output |
45 | if it is a terminal: it will display an error message and skip the file. |
46 | Similarly, `zstd` will refuse to read compressed data from standard input |
47 | if it is a terminal. |
48 | |
49 | Unless `--stdout` or `-o` is specified, _files_ are written to a new file |
50 | whose name is derived from the source _file_ name: |
51 | |
52 | * When compressing, the suffix `.zst` is appended to the source filename to |
53 | get the target filename. |
54 | * When decompressing, the `.zst` suffix is removed from the source filename to |
55 | get the target filename |
56 | |
57 | ### Concatenation with .zst Files |
58 | It is possible to concatenate multiple `.zst` files. `zstd` will decompress |
59 | such agglomerated file as if it was a single `.zst` file. |
60 | |
61 | OPTIONS |
62 | ------- |
63 | |
64 | ### Integer Suffixes and Special Values |
65 | |
66 | In most places where an integer argument is expected, |
67 | an optional suffix is supported to easily indicate large integers. |
68 | There must be no space between the integer and the suffix. |
69 | |
70 | * `KiB`: |
71 | Multiply the integer by 1,024 (2\^10). |
72 | `Ki`, `K`, and `KB` are accepted as synonyms for `KiB`. |
73 | * `MiB`: |
74 | Multiply the integer by 1,048,576 (2\^20). |
75 | `Mi`, `M`, and `MB` are accepted as synonyms for `MiB`. |
76 | |
77 | ### Operation Mode |
78 | |
79 | If multiple operation mode options are given, |
80 | the last one takes effect. |
81 | |
82 | * `-z`, `--compress`: |
83 | Compress. |
84 | This is the default operation mode when no operation mode option is specified |
85 | and no other operation mode is implied from the command name |
86 | (for example, `unzstd` implies `--decompress`). |
87 | * `-d`, `--decompress`, `--uncompress`: |
88 | Decompress. |
89 | * `-t`, `--test`: |
90 | Test the integrity of compressed _files_. |
91 | This option is equivalent to `--decompress --stdout > /dev/null`, |
92 | decompressed data is discarded and checksummed for errors. |
93 | No files are created or removed. |
94 | * `-b#`: |
95 | Benchmark file(s) using compression level _#_. |
96 | See _BENCHMARK_ below for a description of this operation. |
97 | * `--train FILES`: |
98 | Use _FILES_ as a training set to create a dictionary. |
99 | The training set should contain a lot of small files (> 100). |
100 | See _DICTIONARY BUILDER_ below for a description of this operation. |
101 | * `-l`, `--list`: |
102 | Display information related to a zstd compressed file, such as size, ratio, and checksum. |
103 | Some of these fields may not be available. |
104 | This command's output can be augmented with the `-v` modifier. |
105 | |
106 | ### Operation Modifiers |
107 | |
108 | * `-#`: |
f535537f |
109 | selects `#` compression level \[1-19\] (default: 3). |
110 | Higher compression levels *generally* produce higher compression ratio at the expense of speed and memory. |
111 | A rough rule of thumb is that compression speed is expected to be divided by 2 every 2 levels. |
112 | Technically, each level is mapped to a set of advanced parameters (that can also be modified individually, see below). |
113 | Because the compressor's behavior highly depends on the content to compress, there's no guarantee of a smooth progression from one level to another. |
648db22b |
114 | * `--ultra`: |
115 | unlocks high compression levels 20+ (maximum 22), using a lot more memory. |
116 | Note that decompression will also require more memory when using these levels. |
117 | * `--fast[=#]`: |
118 | switch to ultra-fast compression levels. |
119 | If `=#` is not present, it defaults to `1`. |
120 | The higher the value, the faster the compression speed, |
121 | at the cost of some compression ratio. |
122 | This setting overwrites compression level if one was set previously. |
123 | Similarly, if a compression level is set after `--fast`, it overrides it. |
124 | * `-T#`, `--threads=#`: |
125 | Compress using `#` working threads (default: 1). |
126 | If `#` is 0, attempt to detect and use the number of physical CPU cores. |
127 | In all cases, the nb of threads is capped to `ZSTDMT_NBWORKERS_MAX`, |
128 | which is either 64 in 32-bit mode, or 256 for 64-bit environments. |
129 | This modifier does nothing if `zstd` is compiled without multithread support. |
130 | * `--single-thread`: |
131 | Use a single thread for both I/O and compression. |
132 | As compression is serialized with I/O, this can be slightly slower. |
133 | Single-thread mode features significantly lower memory usage, |
134 | which can be useful for systems with limited amount of memory, such as 32-bit systems. |
135 | |
136 | Note 1: this mode is the only available one when multithread support is disabled. |
137 | |
138 | Note 2: this mode is different from `-T1`, which spawns 1 compression thread in parallel with I/O. |
139 | Final compressed result is also slightly different from `-T1`. |
140 | * `--auto-threads={physical,logical} (default: physical)`: |
141 | When using a default amount of threads via `-T0`, choose the default based on the number |
142 | of detected physical or logical cores. |
143 | * `--adapt[=min=#,max=#]`: |
144 | `zstd` will dynamically adapt compression level to perceived I/O conditions. |
145 | Compression level adaptation can be observed live by using command `-v`. |
146 | Adaptation can be constrained between supplied `min` and `max` levels. |
147 | The feature works when combined with multi-threading and `--long` mode. |
148 | It does not work with `--single-thread`. |
149 | It sets window size to 8 MiB by default (can be changed manually, see `wlog`). |
150 | Due to the chaotic nature of dynamic adaptation, compressed result is not reproducible. |
151 | |
152 | _Note_: at the time of this writing, `--adapt` can remain stuck at low speed |
153 | when combined with multiple worker threads (>=2). |
154 | * `--long[=#]`: |
155 | enables long distance matching with `#` `windowLog`, if `#` is not |
156 | present it defaults to `27`. |
157 | This increases the window size (`windowLog`) and memory usage for both the |
158 | compressor and decompressor. |
159 | This setting is designed to improve the compression ratio for files with |
160 | long matches at a large distance. |
161 | |
162 | Note: If `windowLog` is set to larger than 27, `--long=windowLog` or |
163 | `--memory=windowSize` needs to be passed to the decompressor. |
164 | * `-D DICT`: |
165 | use `DICT` as Dictionary to compress or decompress FILE(s) |
166 | * `--patch-from FILE`: |
167 | Specify the file to be used as a reference point for zstd's diff engine. |
168 | This is effectively dictionary compression with some convenient parameter |
169 | selection, namely that _windowSize_ > _srcSize_. |
170 | |
171 | Note: cannot use both this and `-D` together. |
172 | |
173 | Note: `--long` mode will be automatically activated if _chainLog_ < _fileLog_ |
174 | (_fileLog_ being the _windowLog_ required to cover the whole file). You |
175 | can also manually force it. |
176 | |
177 | Note: for all levels, you can use `--patch-from` in `--single-thread` mode |
178 | to improve compression ratio at the cost of speed. |
179 | |
180 | Note: for level 19, you can get increased compression ratio at the cost |
181 | of speed by specifying `--zstd=targetLength=` to be something large |
182 | (i.e. 4096), and by setting a large `--zstd=chainLog=`. |
183 | * `--rsyncable`: |
184 | `zstd` will periodically synchronize the compression state to make the |
185 | compressed file more rsync-friendly. |
186 | There is a negligible impact to compression ratio, |
187 | and a potential impact to compression speed, perceptible at higher speeds, |
188 | for example when combining `--rsyncable` with many parallel worker threads. |
189 | This feature does not work with `--single-thread`. You probably don't want |
190 | to use it with long range mode, since it will decrease the effectiveness of |
191 | the synchronization points, but your mileage may vary. |
192 | * `-C`, `--[no-]check`: |
193 | add integrity check computed from uncompressed data (default: enabled) |
194 | * `--[no-]content-size`: |
195 | enable / disable whether or not the original size of the file is placed in |
196 | the header of the compressed file. The default option is |
197 | `--content-size` (meaning that the original size will be placed in the header). |
198 | * `--no-dictID`: |
199 | do not store dictionary ID within frame header (dictionary compression). |
200 | The decoder will have to rely on implicit knowledge about which dictionary to use, |
201 | it won't be able to check if it's correct. |
202 | * `-M#`, `--memory=#`: |
203 | Set a memory usage limit. By default, `zstd` uses 128 MiB for decompression |
204 | as the maximum amount of memory the decompressor is allowed to use, but you can |
205 | override this manually if need be in either direction (i.e. you can increase or |
206 | decrease it). |
207 | |
208 | This is also used during compression when using with `--patch-from=`. In this case, |
209 | this parameter overrides that maximum size allowed for a dictionary. (128 MiB). |
210 | |
211 | Additionally, this can be used to limit memory for dictionary training. This parameter |
212 | overrides the default limit of 2 GiB. zstd will load training samples up to the memory limit |
213 | and ignore the rest. |
214 | * `--stream-size=#`: |
215 | Sets the pledged source size of input coming from a stream. This value must be exact, as it |
216 | will be included in the produced frame header. Incorrect stream sizes will cause an error. |
217 | This information will be used to better optimize compression parameters, resulting in |
218 | better and potentially faster compression, especially for smaller source sizes. |
219 | * `--size-hint=#`: |
220 | When handling input from a stream, `zstd` must guess how large the source size |
221 | will be when optimizing compression parameters. If the stream size is relatively |
222 | small, this guess may be a poor one, resulting in a higher compression ratio than |
223 | expected. This feature allows for controlling the guess when needed. |
224 | Exact guesses result in better compression ratios. Overestimates result in slightly |
225 | degraded compression ratios, while underestimates may result in significant degradation. |
f535537f |
226 | * `--target-compressed-block-size=#`: |
227 | Attempt to produce compressed blocks of approximately this size. |
228 | This will split larger blocks in order to approach this target. |
229 | This feature is notably useful for improved latency, when the receiver can leverage receiving early incomplete data. |
230 | This parameter defines a loose target: compressed blocks will target this size "on average", but individual blocks can still be larger or smaller. |
231 | Enabling this feature can decrease compression speed by up to ~10% at level 1. |
232 | Higher levels will see smaller relative speed regression, becoming invisible at higher settings. |
648db22b |
233 | * `-f`, `--force`: |
234 | disable input and output checks. Allows overwriting existing files, input |
235 | from console, output to stdout, operating on links, block devices, etc. |
236 | During decompression and when the output destination is stdout, pass-through |
237 | unrecognized formats as-is. |
238 | * `-c`, `--stdout`: |
f535537f |
239 | write to standard output (even if it is the console); keep original files (disable `--rm`). |
240 | * `-o FILE`: |
241 | save result into `FILE`. |
242 | Note that this operation is in conflict with `-c`. |
243 | If both operations are present on the command line, the last expressed one wins. |
648db22b |
244 | * `--[no-]sparse`: |
245 | enable / disable sparse FS support, |
246 | to make files with many zeroes smaller on disk. |
247 | Creating sparse files may save disk space and speed up decompression by |
248 | reducing the amount of disk I/O. |
249 | default: enabled when output is into a file, |
250 | and disabled when output is stdout. |
251 | This setting overrides default and can force sparse mode over stdout. |
252 | * `--[no-]pass-through` |
253 | enable / disable passing through uncompressed files as-is. During |
254 | decompression when pass-through is enabled, unrecognized formats will be |
255 | copied as-is from the input to the output. By default, pass-through will |
256 | occur when the output destination is stdout and the force (`-f`) option is |
257 | set. |
258 | * `--rm`: |
259 | remove source file(s) after successful compression or decompression. |
260 | This command is silently ignored if output is `stdout`. |
261 | If used in combination with `-o`, |
262 | triggers a confirmation prompt (which can be silenced with `-f`), as this is a destructive operation. |
263 | * `-k`, `--keep`: |
264 | keep source file(s) after successful compression or decompression. |
265 | This is the default behavior. |
266 | * `-r`: |
267 | operate recursively on directories. |
268 | It selects all files in the named directory and all its subdirectories. |
269 | This can be useful both to reduce command line typing, |
270 | and to circumvent shell expansion limitations, |
271 | when there are a lot of files and naming breaks the maximum size of a command line. |
272 | * `--filelist FILE` |
273 | read a list of files to process as content from `FILE`. |
274 | Format is compatible with `ls` output, with one file per line. |
275 | * `--output-dir-flat DIR`: |
276 | resulting files are stored into target `DIR` directory, |
277 | instead of same directory as origin file. |
278 | Be aware that this command can introduce name collision issues, |
279 | if multiple files, from different directories, end up having the same name. |
280 | Collision resolution ensures first file with a given name will be present in `DIR`, |
281 | while in combination with `-f`, the last file will be present instead. |
282 | * `--output-dir-mirror DIR`: |
283 | similar to `--output-dir-flat`, |
284 | the output files are stored underneath target `DIR` directory, |
285 | but this option will replicate input directory hierarchy into output `DIR`. |
286 | |
287 | If input directory contains "..", the files in this directory will be ignored. |
288 | If input directory is an absolute directory (i.e. "/var/tmp/abc"), |
289 | it will be stored into the "output-dir/var/tmp/abc". |
290 | If there are multiple input files or directories, |
291 | name collision resolution will follow the same rules as `--output-dir-flat`. |
292 | * `--format=FORMAT`: |
293 | compress and decompress in other formats. If compiled with |
294 | support, zstd can compress to or decompress from other compression algorithm |
295 | formats. Possibly available options are `zstd`, `gzip`, `xz`, `lzma`, and `lz4`. |
296 | If no such format is provided, `zstd` is the default. |
297 | * `-h`/`-H`, `--help`: |
298 | display help/long help and exit |
299 | * `-V`, `--version`: |
f535537f |
300 | display version number and immediately exit. |
301 | note that, since it exits, flags specified after `-V` are effectively ignored. |
648db22b |
302 | Advanced: `-vV` also displays supported formats. |
303 | `-vvV` also displays POSIX support. |
f535537f |
304 | `-qV` will only display the version number, suitable for machine reading. |
648db22b |
305 | * `-v`, `--verbose`: |
306 | verbose mode, display more information |
307 | * `-q`, `--quiet`: |
308 | suppress warnings, interactivity, and notifications. |
309 | specify twice to suppress errors too. |
310 | * `--no-progress`: |
311 | do not display the progress bar, but keep all other messages. |
312 | * `--show-default-cparams`: |
313 | shows the default compression parameters that will be used for a particular input file, based on the provided compression level and the input size. |
314 | If the provided file is not a regular file (e.g. a pipe), this flag will output the parameters used for inputs of unknown size. |
f535537f |
315 | * `--exclude-compressed`: |
316 | only compress files that are not already compressed. |
648db22b |
317 | * `--`: |
318 | All arguments after `--` are treated as files |
319 | |
320 | |
321 | ### gzip Operation Modifiers |
322 | When invoked via a `gzip` symlink, `zstd` will support further |
323 | options that intend to mimic the `gzip` behavior: |
324 | |
325 | * `-n`, `--no-name`: |
326 | do not store the original filename and timestamps when compressing |
327 | a file. This is the default behavior and hence a no-op. |
328 | * `--best`: |
329 | alias to the option `-9`. |
330 | |
331 | |
332 | ### Environment Variables |
648db22b |
333 | Employing environment variables to set parameters has security implications. |
334 | Therefore, this avenue is intentionally limited. |
335 | Only `ZSTD_CLEVEL` and `ZSTD_NBTHREADS` are currently supported. |
f535537f |
336 | They set the default compression level and number of threads to use during compression, respectively. |
648db22b |
337 | |
338 | `ZSTD_CLEVEL` can be used to set the level between 1 and 19 (the "normal" range). |
339 | If the value of `ZSTD_CLEVEL` is not a valid integer, it will be ignored with a warning message. |
340 | `ZSTD_CLEVEL` just replaces the default compression level (`3`). |
341 | |
342 | `ZSTD_NBTHREADS` can be used to set the number of threads `zstd` will attempt to use during compression. |
343 | If the value of `ZSTD_NBTHREADS` is not a valid unsigned integer, it will be ignored with a warning message. |
344 | `ZSTD_NBTHREADS` has a default value of (`1`), and is capped at ZSTDMT_NBWORKERS_MAX==200. |
f535537f |
345 | `zstd` must be compiled with multithread support for this variable to have any effect. |
648db22b |
346 | |
347 | They can both be overridden by corresponding command line arguments: |
348 | `-#` for compression level and `-T#` for number of compression threads. |
349 | |
350 | |
f535537f |
351 | ADVANCED COMPRESSION OPTIONS |
352 | ---------------------------- |
353 | `zstd` provides 22 predefined regular compression levels plus the fast levels. |
354 | A compression level is translated internally into multiple advanced parameters that control the behavior of the compressor |
355 | (one can observe the result of this translation with `--show-default-cparams`). |
356 | These advanced parameters can be overridden using advanced compression options. |
357 | |
358 | ### --zstd[=options]: |
359 | The _options_ are provided as a comma-separated list. |
360 | You may specify only the options you want to change and the rest will be |
361 | taken from the selected or default compression level. |
362 | The list of available _options_: |
363 | |
364 | - `strategy`=_strat_, `strat`=_strat_: |
365 | Specify a strategy used by a match finder. |
366 | |
367 | There are 9 strategies numbered from 1 to 9, from fastest to strongest: |
368 | 1=`ZSTD_fast`, 2=`ZSTD_dfast`, 3=`ZSTD_greedy`, |
369 | 4=`ZSTD_lazy`, 5=`ZSTD_lazy2`, 6=`ZSTD_btlazy2`, |
370 | 7=`ZSTD_btopt`, 8=`ZSTD_btultra`, 9=`ZSTD_btultra2`. |
371 | |
372 | - `windowLog`=_wlog_, `wlog`=_wlog_: |
373 | Specify the maximum number of bits for a match distance. |
374 | |
375 | The higher number of increases the chance to find a match which usually |
376 | improves compression ratio. |
377 | It also increases memory requirements for the compressor and decompressor. |
378 | The minimum _wlog_ is 10 (1 KiB) and the maximum is 30 (1 GiB) on 32-bit |
379 | platforms and 31 (2 GiB) on 64-bit platforms. |
380 | |
381 | Note: If `windowLog` is set to larger than 27, `--long=windowLog` or |
382 | `--memory=windowSize` needs to be passed to the decompressor. |
383 | |
384 | - `hashLog`=_hlog_, `hlog`=_hlog_: |
385 | Specify the maximum number of bits for a hash table. |
386 | |
387 | Bigger hash tables cause fewer collisions which usually makes compression |
388 | faster, but requires more memory during compression. |
389 | |
390 | The minimum _hlog_ is 6 (64 entries / 256 B) and the maximum is 30 (1B entries / 4 GiB). |
391 | |
392 | - `chainLog`=_clog_, `clog`=_clog_: |
393 | Specify the maximum number of bits for the secondary search structure, |
394 | whose form depends on the selected `strategy`. |
395 | |
396 | Higher numbers of bits increases the chance to find a match which usually |
397 | improves compression ratio. |
398 | It also slows down compression speed and increases memory requirements for |
399 | compression. |
400 | This option is ignored for the `ZSTD_fast` `strategy`, which only has the primary hash table. |
401 | |
402 | The minimum _clog_ is 6 (64 entries / 256 B) and the maximum is 29 (512M entries / 2 GiB) on 32-bit platforms |
403 | and 30 (1B entries / 4 GiB) on 64-bit platforms. |
404 | |
405 | - `searchLog`=_slog_, `slog`=_slog_: |
406 | Specify the maximum number of searches in a hash chain or a binary tree |
407 | using logarithmic scale. |
408 | |
409 | More searches increases the chance to find a match which usually increases |
410 | compression ratio but decreases compression speed. |
411 | |
412 | The minimum _slog_ is 1 and the maximum is 'windowLog' - 1. |
413 | |
414 | - `minMatch`=_mml_, `mml`=_mml_: |
415 | Specify the minimum searched length of a match in a hash table. |
416 | |
417 | Larger search lengths usually decrease compression ratio but improve |
418 | decompression speed. |
419 | |
420 | The minimum _mml_ is 3 and the maximum is 7. |
421 | |
422 | - `targetLength`=_tlen_, `tlen`=_tlen_: |
423 | The impact of this field vary depending on selected strategy. |
424 | |
425 | For `ZSTD_btopt`, `ZSTD_btultra` and `ZSTD_btultra2`, it specifies |
426 | the minimum match length that causes match finder to stop searching. |
427 | A larger `targetLength` usually improves compression ratio |
428 | but decreases compression speed. |
429 | |
430 | For `ZSTD_fast`, it triggers ultra-fast mode when > 0. |
431 | The value represents the amount of data skipped between match sampling. |
432 | Impact is reversed: a larger `targetLength` increases compression speed |
433 | but decreases compression ratio. |
434 | |
435 | For all other strategies, this field has no impact. |
436 | |
437 | The minimum _tlen_ is 0 and the maximum is 128 KiB. |
438 | |
439 | - `overlapLog`=_ovlog_, `ovlog`=_ovlog_: |
440 | Determine `overlapSize`, amount of data reloaded from previous job. |
441 | This parameter is only available when multithreading is enabled. |
442 | Reloading more data improves compression ratio, but decreases speed. |
443 | |
444 | The minimum _ovlog_ is 0, and the maximum is 9. |
445 | 1 means "no overlap", hence completely independent jobs. |
446 | 9 means "full overlap", meaning up to `windowSize` is reloaded from previous job. |
447 | Reducing _ovlog_ by 1 reduces the reloaded amount by a factor 2. |
448 | For example, 8 means "windowSize/2", and 6 means "windowSize/8". |
449 | Value 0 is special and means "default": _ovlog_ is automatically determined by `zstd`. |
450 | In which case, _ovlog_ will range from 6 to 9, depending on selected _strat_. |
451 | |
452 | - `ldmHashLog`=_lhlog_, `lhlog`=_lhlog_: |
453 | Specify the maximum size for a hash table used for long distance matching. |
454 | |
455 | This option is ignored unless long distance matching is enabled. |
456 | |
457 | Bigger hash tables usually improve compression ratio at the expense of more |
458 | memory during compression and a decrease in compression speed. |
459 | |
460 | The minimum _lhlog_ is 6 and the maximum is 30 (default: 20). |
461 | |
462 | - `ldmMinMatch`=_lmml_, `lmml`=_lmml_: |
463 | Specify the minimum searched length of a match for long distance matching. |
464 | |
465 | This option is ignored unless long distance matching is enabled. |
466 | |
467 | Larger/very small values usually decrease compression ratio. |
468 | |
469 | The minimum _lmml_ is 4 and the maximum is 4096 (default: 64). |
470 | |
471 | - `ldmBucketSizeLog`=_lblog_, `lblog`=_lblog_: |
472 | Specify the size of each bucket for the hash table used for long distance |
473 | matching. |
474 | |
475 | This option is ignored unless long distance matching is enabled. |
476 | |
477 | Larger bucket sizes improve collision resolution but decrease compression |
478 | speed. |
479 | |
480 | The minimum _lblog_ is 1 and the maximum is 8 (default: 3). |
481 | |
482 | - `ldmHashRateLog`=_lhrlog_, `lhrlog`=_lhrlog_: |
483 | Specify the frequency of inserting entries into the long distance matching |
484 | hash table. |
485 | |
486 | This option is ignored unless long distance matching is enabled. |
487 | |
488 | Larger values will improve compression speed. Deviating far from the |
489 | default value will likely result in a decrease in compression ratio. |
490 | |
491 | The default value is `wlog - lhlog`. |
492 | |
493 | ### Example |
494 | The following parameters sets advanced compression options to something |
495 | similar to predefined level 19 for files bigger than 256 KB: |
496 | |
497 | `--zstd`=wlog=23,clog=23,hlog=22,slog=6,mml=3,tlen=48,strat=6 |
498 | |
499 | ### -B#: |
500 | Specify the size of each compression job. |
501 | This parameter is only available when multi-threading is enabled. |
502 | Each compression job is run in parallel, so this value indirectly impacts the nb of active threads. |
503 | Default job size varies depending on compression level (generally `4 * windowSize`). |
504 | `-B#` makes it possible to manually select a custom size. |
505 | Note that job size must respect a minimum value which is enforced transparently. |
506 | This minimum is either 512 KB, or `overlapSize`, whichever is largest. |
507 | Different job sizes will lead to non-identical compressed frames. |
508 | |
509 | |
648db22b |
510 | DICTIONARY BUILDER |
511 | ------------------ |
512 | `zstd` offers _dictionary_ compression, |
513 | which greatly improves efficiency on small files and messages. |
514 | It's possible to train `zstd` with a set of samples, |
515 | the result of which is saved into a file called a `dictionary`. |
516 | Then, during compression and decompression, reference the same dictionary, |
517 | using command `-D dictionaryFileName`. |
518 | Compression of small files similar to the sample set will be greatly improved. |
519 | |
520 | * `--train FILEs`: |
521 | Use FILEs as training set to create a dictionary. |
522 | The training set should ideally contain a lot of samples (> 100), |
523 | and weight typically 100x the target dictionary size |
524 | (for example, ~10 MB for a 100 KB dictionary). |
525 | `--train` can be combined with `-r` to indicate a directory rather than listing all the files, |
526 | which can be useful to circumvent shell expansion limits. |
527 | |
528 | Since dictionary compression is mostly effective for small files, |
529 | the expectation is that the training set will only contain small files. |
530 | In the case where some samples happen to be large, |
531 | only the first 128 KiB of these samples will be used for training. |
532 | |
533 | `--train` supports multithreading if `zstd` is compiled with threading support (default). |
534 | Additional advanced parameters can be specified with `--train-fastcover`. |
535 | The legacy dictionary builder can be accessed with `--train-legacy`. |
536 | The slower cover dictionary builder can be accessed with `--train-cover`. |
537 | Default `--train` is equivalent to `--train-fastcover=d=8,steps=4`. |
538 | |
539 | * `-o FILE`: |
540 | Dictionary saved into `FILE` (default name: dictionary). |
541 | * `--maxdict=#`: |
542 | Limit dictionary to specified size (default: 112640 bytes). |
543 | As usual, quantities are expressed in bytes by default, |
544 | and it's possible to employ suffixes (like `KB` or `MB`) |
545 | to specify larger values. |
546 | * `-#`: |
547 | Use `#` compression level during training (optional). |
548 | Will generate statistics more tuned for selected compression level, |
549 | resulting in a _small_ compression ratio improvement for this level. |
550 | * `-B#`: |
551 | Split input files into blocks of size # (default: no split) |
552 | * `-M#`, `--memory=#`: |
553 | Limit the amount of sample data loaded for training (default: 2 GB). |
554 | Note that the default (2 GB) is also the maximum. |
555 | This parameter can be useful in situations where the training set size |
556 | is not well controlled and could be potentially very large. |
557 | Since speed of the training process is directly correlated to |
558 | the size of the training sample set, |
559 | a smaller sample set leads to faster training. |
560 | |
561 | In situations where the training set is larger than maximum memory, |
562 | the CLI will randomly select samples among the available ones, |
563 | up to the maximum allowed memory budget. |
564 | This is meant to improve dictionary relevance |
565 | by mitigating the potential impact of clustering, |
566 | such as selecting only files from the beginning of a list |
567 | sorted by modification date, or sorted by alphabetical order. |
568 | The randomization process is deterministic, so |
569 | training of the same list of files with the same parameters |
570 | will lead to the creation of the same dictionary. |
571 | |
572 | * `--dictID=#`: |
573 | A dictionary ID is a locally unique ID. |
574 | The decoder will use this value to verify it is using the right dictionary. |
575 | By default, zstd will create a 4-bytes random number ID. |
576 | It's possible to provide an explicit number ID instead. |
577 | It's up to the dictionary manager to not assign twice the same ID to |
578 | 2 different dictionaries. |
579 | Note that short numbers have an advantage: |
580 | an ID < 256 will only need 1 byte in the compressed frame header, |
581 | and an ID < 65536 will only need 2 bytes. |
582 | This compares favorably to 4 bytes default. |
583 | |
584 | Note that RFC8878 reserves IDs less than 32768 and greater than or equal to 2\^31, so they should not be used in public. |
585 | |
586 | * `--train-cover[=k#,d=#,steps=#,split=#,shrink[=#]]`: |
587 | Select parameters for the default dictionary builder algorithm named cover. |
588 | If _d_ is not specified, then it tries _d_ = 6 and _d_ = 8. |
589 | If _k_ is not specified, then it tries _steps_ values in the range [50, 2000]. |
590 | If _steps_ is not specified, then the default value of 40 is used. |
591 | If _split_ is not specified or split <= 0, then the default value of 100 is used. |
592 | Requires that _d_ <= _k_. |
593 | If _shrink_ flag is not used, then the default value for _shrinkDict_ of 0 is used. |
594 | If _shrink_ is not specified, then the default value for _shrinkDictMaxRegression_ of 1 is used. |
595 | |
596 | Selects segments of size _k_ with highest score to put in the dictionary. |
597 | The score of a segment is computed by the sum of the frequencies of all the |
598 | subsegments of size _d_. |
599 | Generally _d_ should be in the range [6, 8], occasionally up to 16, but the |
600 | algorithm will run faster with d <= _8_. |
601 | Good values for _k_ vary widely based on the input data, but a safe range is |
602 | [2 * _d_, 2000]. |
603 | If _split_ is 100, all input samples are used for both training and testing |
604 | to find optimal _d_ and _k_ to build dictionary. |
605 | Supports multithreading if `zstd` is compiled with threading support. |
606 | Having _shrink_ enabled takes a truncated dictionary of minimum size and doubles |
607 | in size until compression ratio of the truncated dictionary is at most |
608 | _shrinkDictMaxRegression%_ worse than the compression ratio of the largest dictionary. |
609 | |
610 | Examples: |
611 | |
612 | `zstd --train-cover FILEs` |
613 | |
614 | `zstd --train-cover=k=50,d=8 FILEs` |
615 | |
616 | `zstd --train-cover=d=8,steps=500 FILEs` |
617 | |
618 | `zstd --train-cover=k=50 FILEs` |
619 | |
620 | `zstd --train-cover=k=50,split=60 FILEs` |
621 | |
622 | `zstd --train-cover=shrink FILEs` |
623 | |
624 | `zstd --train-cover=shrink=2 FILEs` |
625 | |
626 | * `--train-fastcover[=k#,d=#,f=#,steps=#,split=#,accel=#]`: |
627 | Same as cover but with extra parameters _f_ and _accel_ and different default value of split |
628 | If _split_ is not specified, then it tries _split_ = 75. |
629 | If _f_ is not specified, then it tries _f_ = 20. |
630 | Requires that 0 < _f_ < 32. |
631 | If _accel_ is not specified, then it tries _accel_ = 1. |
632 | Requires that 0 < _accel_ <= 10. |
633 | Requires that _d_ = 6 or _d_ = 8. |
634 | |
635 | _f_ is log of size of array that keeps track of frequency of subsegments of size _d_. |
636 | The subsegment is hashed to an index in the range [0,2^_f_ - 1]. |
637 | It is possible that 2 different subsegments are hashed to the same index, and they are considered as the same subsegment when computing frequency. |
638 | Using a higher _f_ reduces collision but takes longer. |
639 | |
640 | Examples: |
641 | |
642 | `zstd --train-fastcover FILEs` |
643 | |
644 | `zstd --train-fastcover=d=8,f=15,accel=2 FILEs` |
645 | |
646 | * `--train-legacy[=selectivity=#]`: |
647 | Use legacy dictionary builder algorithm with the given dictionary |
648 | _selectivity_ (default: 9). |
649 | The smaller the _selectivity_ value, the denser the dictionary, |
650 | improving its efficiency but reducing its achievable maximum size. |
651 | `--train-legacy=s=#` is also accepted. |
652 | |
653 | Examples: |
654 | |
655 | `zstd --train-legacy FILEs` |
656 | |
657 | `zstd --train-legacy=selectivity=8 FILEs` |
658 | |
659 | |
660 | BENCHMARK |
661 | --------- |
f535537f |
662 | The `zstd` CLI provides a benchmarking mode that can be used to easily find suitable compression parameters, or alternatively to benchmark a computer's performance. |
663 | Note that the results are highly dependent on the content being compressed. |
648db22b |
664 | |
665 | * `-b#`: |
666 | benchmark file(s) using compression level # |
667 | * `-e#`: |
668 | benchmark file(s) using multiple compression levels, from `-b#` to `-e#` (inclusive) |
f535537f |
669 | * `-d`: |
670 | benchmark decompression speed only (requires providing an already zstd-compressed content) |
648db22b |
671 | * `-i#`: |
672 | minimum evaluation time, in seconds (default: 3s), benchmark mode only |
673 | * `-B#`, `--block-size=#`: |
674 | cut file(s) into independent chunks of size # (default: no chunking) |
675 | * `--priority=rt`: |
f535537f |
676 | set process priority to real-time (Windows) |
648db22b |
677 | |
678 | **Output Format:** CompressionLevel#Filename: InputSize -> OutputSize (CompressionRatio), CompressionSpeed, DecompressionSpeed |
679 | |
680 | **Methodology:** For both compression and decompression speed, the entire input is compressed/decompressed in-memory to measure speed. A run lasts at least 1 sec, so when files are small, they are compressed/decompressed several times per run, in order to improve measurement accuracy. |
681 | |
648db22b |
682 | |
683 | SEE ALSO |
684 | -------- |
685 | `zstdgrep`(1), `zstdless`(1), `gzip`(1), `xz`(1) |
686 | |
687 | The <zstandard> format is specified in Y. Collet, "Zstandard Compression and the 'application/zstd' Media Type", https://www.ietf.org/rfc/rfc8878.txt, Internet RFC 8878 (February 2021). |
688 | |
689 | BUGS |
690 | ---- |
691 | Report bugs at: https://github.com/facebook/zstd/issues |
692 | |
693 | AUTHOR |
694 | ------ |
695 | Yann Collet |