| 1 | zstd(1) -- zstd, zstdmt, unzstd, zstdcat - Compress or decompress .zst files |
| 2 | ============================================================================ |
| 3 | |
| 4 | SYNOPSIS |
| 5 | -------- |
| 6 | |
| 7 | `zstd` [<OPTIONS>] [-|<INPUT-FILE>] [-o <OUTPUT-FILE>] |
| 8 | |
| 9 | `zstdmt` is equivalent to `zstd -T0` |
| 10 | |
| 11 | `unzstd` is equivalent to `zstd -d` |
| 12 | |
| 13 | `zstdcat` is equivalent to `zstd -dcf` |
| 14 | |
| 15 | |
| 16 | DESCRIPTION |
| 17 | ----------- |
| 18 | `zstd` is a fast lossless compression algorithm and data compression tool, |
| 19 | with command line syntax similar to `gzip`(1) and `xz`(1). |
| 20 | It is based on the **LZ77** family, with further FSE & huff0 entropy stages. |
| 21 | `zstd` offers highly configurable compression speed, |
| 22 | from fast modes at > 200 MB/s per core, |
| 23 | to strong modes with excellent compression ratios. |
| 24 | It also features a very fast decoder, with speeds > 500 MB/s per core. |
| 25 | |
| 26 | `zstd` command line syntax is generally similar to gzip, |
| 27 | but features the following differences: |
| 28 | |
| 29 | - Source files are preserved by default. |
| 30 | It's possible to remove them automatically by using the `--rm` command. |
| 31 | - When compressing a single file, `zstd` displays progress notifications |
| 32 | and result summary by default. |
| 33 | Use `-q` to turn them off. |
| 34 | - `zstd` displays a short help page when command line is an error. |
| 35 | Use `-q` to turn it off. |
| 36 | - `zstd` does not accept input from console, |
| 37 | though it does accept `stdin` when it's not the console. |
| 38 | - `zstd` does not store the input's filename or attributes, only its contents. |
| 39 | |
| 40 | `zstd` processes each _file_ according to the selected operation mode. |
| 41 | If no _files_ are given or _file_ is `-`, `zstd` reads from standard input |
| 42 | and writes the processed data to standard output. |
| 43 | `zstd` will refuse to write compressed data to standard output |
| 44 | if it is a terminal: it will display an error message and skip the file. |
| 45 | Similarly, `zstd` will refuse to read compressed data from standard input |
| 46 | if it is a terminal. |
| 47 | |
| 48 | Unless `--stdout` or `-o` is specified, _files_ are written to a new file |
| 49 | whose name is derived from the source _file_ name: |
| 50 | |
| 51 | * When compressing, the suffix `.zst` is appended to the source filename to |
| 52 | get the target filename. |
| 53 | * When decompressing, the `.zst` suffix is removed from the source filename to |
| 54 | get the target filename |
| 55 | |
| 56 | ### Concatenation with .zst Files |
| 57 | It is possible to concatenate multiple `.zst` files. `zstd` will decompress |
| 58 | such agglomerated file as if it was a single `.zst` file. |
| 59 | |
| 60 | OPTIONS |
| 61 | ------- |
| 62 | |
| 63 | ### Integer Suffixes and Special Values |
| 64 | |
| 65 | In most places where an integer argument is expected, |
| 66 | an optional suffix is supported to easily indicate large integers. |
| 67 | There must be no space between the integer and the suffix. |
| 68 | |
| 69 | * `KiB`: |
| 70 | Multiply the integer by 1,024 (2\^10). |
| 71 | `Ki`, `K`, and `KB` are accepted as synonyms for `KiB`. |
| 72 | * `MiB`: |
| 73 | Multiply the integer by 1,048,576 (2\^20). |
| 74 | `Mi`, `M`, and `MB` are accepted as synonyms for `MiB`. |
| 75 | |
| 76 | ### Operation Mode |
| 77 | |
| 78 | If multiple operation mode options are given, |
| 79 | the last one takes effect. |
| 80 | |
| 81 | * `-z`, `--compress`: |
| 82 | Compress. |
| 83 | This is the default operation mode when no operation mode option is specified |
| 84 | and no other operation mode is implied from the command name |
| 85 | (for example, `unzstd` implies `--decompress`). |
| 86 | * `-d`, `--decompress`, `--uncompress`: |
| 87 | Decompress. |
| 88 | * `-t`, `--test`: |
| 89 | Test the integrity of compressed _files_. |
| 90 | This option is equivalent to `--decompress --stdout > /dev/null`, |
| 91 | decompressed data is discarded and checksummed for errors. |
| 92 | No files are created or removed. |
| 93 | * `-b#`: |
| 94 | Benchmark file(s) using compression level _#_. |
| 95 | See _BENCHMARK_ below for a description of this operation. |
| 96 | * `--train FILES`: |
| 97 | Use _FILES_ as a training set to create a dictionary. |
| 98 | The training set should contain a lot of small files (> 100). |
| 99 | See _DICTIONARY BUILDER_ below for a description of this operation. |
| 100 | * `-l`, `--list`: |
| 101 | Display information related to a zstd compressed file, such as size, ratio, and checksum. |
| 102 | Some of these fields may not be available. |
| 103 | This command's output can be augmented with the `-v` modifier. |
| 104 | |
| 105 | ### Operation Modifiers |
| 106 | |
| 107 | * `-#`: |
| 108 | selects `#` compression level \[1-19\] (default: 3) |
| 109 | * `--ultra`: |
| 110 | unlocks high compression levels 20+ (maximum 22), using a lot more memory. |
| 111 | Note that decompression will also require more memory when using these levels. |
| 112 | * `--fast[=#]`: |
| 113 | switch to ultra-fast compression levels. |
| 114 | If `=#` is not present, it defaults to `1`. |
| 115 | The higher the value, the faster the compression speed, |
| 116 | at the cost of some compression ratio. |
| 117 | This setting overwrites compression level if one was set previously. |
| 118 | Similarly, if a compression level is set after `--fast`, it overrides it. |
| 119 | * `-T#`, `--threads=#`: |
| 120 | Compress using `#` working threads (default: 1). |
| 121 | If `#` is 0, attempt to detect and use the number of physical CPU cores. |
| 122 | In all cases, the nb of threads is capped to `ZSTDMT_NBWORKERS_MAX`, |
| 123 | which is either 64 in 32-bit mode, or 256 for 64-bit environments. |
| 124 | This modifier does nothing if `zstd` is compiled without multithread support. |
| 125 | * `--single-thread`: |
| 126 | Use a single thread for both I/O and compression. |
| 127 | As compression is serialized with I/O, this can be slightly slower. |
| 128 | Single-thread mode features significantly lower memory usage, |
| 129 | which can be useful for systems with limited amount of memory, such as 32-bit systems. |
| 130 | |
| 131 | Note 1: this mode is the only available one when multithread support is disabled. |
| 132 | |
| 133 | Note 2: this mode is different from `-T1`, which spawns 1 compression thread in parallel with I/O. |
| 134 | Final compressed result is also slightly different from `-T1`. |
| 135 | * `--auto-threads={physical,logical} (default: physical)`: |
| 136 | When using a default amount of threads via `-T0`, choose the default based on the number |
| 137 | of detected physical or logical cores. |
| 138 | * `--adapt[=min=#,max=#]`: |
| 139 | `zstd` will dynamically adapt compression level to perceived I/O conditions. |
| 140 | Compression level adaptation can be observed live by using command `-v`. |
| 141 | Adaptation can be constrained between supplied `min` and `max` levels. |
| 142 | The feature works when combined with multi-threading and `--long` mode. |
| 143 | It does not work with `--single-thread`. |
| 144 | It sets window size to 8 MiB by default (can be changed manually, see `wlog`). |
| 145 | Due to the chaotic nature of dynamic adaptation, compressed result is not reproducible. |
| 146 | |
| 147 | _Note_: at the time of this writing, `--adapt` can remain stuck at low speed |
| 148 | when combined with multiple worker threads (>=2). |
| 149 | * `--long[=#]`: |
| 150 | enables long distance matching with `#` `windowLog`, if `#` is not |
| 151 | present it defaults to `27`. |
| 152 | This increases the window size (`windowLog`) and memory usage for both the |
| 153 | compressor and decompressor. |
| 154 | This setting is designed to improve the compression ratio for files with |
| 155 | long matches at a large distance. |
| 156 | |
| 157 | Note: If `windowLog` is set to larger than 27, `--long=windowLog` or |
| 158 | `--memory=windowSize` needs to be passed to the decompressor. |
| 159 | * `-D DICT`: |
| 160 | use `DICT` as Dictionary to compress or decompress FILE(s) |
| 161 | * `--patch-from FILE`: |
| 162 | Specify the file to be used as a reference point for zstd's diff engine. |
| 163 | This is effectively dictionary compression with some convenient parameter |
| 164 | selection, namely that _windowSize_ > _srcSize_. |
| 165 | |
| 166 | Note: cannot use both this and `-D` together. |
| 167 | |
| 168 | Note: `--long` mode will be automatically activated if _chainLog_ < _fileLog_ |
| 169 | (_fileLog_ being the _windowLog_ required to cover the whole file). You |
| 170 | can also manually force it. |
| 171 | |
| 172 | Note: for all levels, you can use `--patch-from` in `--single-thread` mode |
| 173 | to improve compression ratio at the cost of speed. |
| 174 | |
| 175 | Note: for level 19, you can get increased compression ratio at the cost |
| 176 | of speed by specifying `--zstd=targetLength=` to be something large |
| 177 | (i.e. 4096), and by setting a large `--zstd=chainLog=`. |
| 178 | * `--rsyncable`: |
| 179 | `zstd` will periodically synchronize the compression state to make the |
| 180 | compressed file more rsync-friendly. |
| 181 | There is a negligible impact to compression ratio, |
| 182 | and a potential impact to compression speed, perceptible at higher speeds, |
| 183 | for example when combining `--rsyncable` with many parallel worker threads. |
| 184 | This feature does not work with `--single-thread`. You probably don't want |
| 185 | to use it with long range mode, since it will decrease the effectiveness of |
| 186 | the synchronization points, but your mileage may vary. |
| 187 | * `-C`, `--[no-]check`: |
| 188 | add integrity check computed from uncompressed data (default: enabled) |
| 189 | * `--[no-]content-size`: |
| 190 | enable / disable whether or not the original size of the file is placed in |
| 191 | the header of the compressed file. The default option is |
| 192 | `--content-size` (meaning that the original size will be placed in the header). |
| 193 | * `--no-dictID`: |
| 194 | do not store dictionary ID within frame header (dictionary compression). |
| 195 | The decoder will have to rely on implicit knowledge about which dictionary to use, |
| 196 | it won't be able to check if it's correct. |
| 197 | * `-M#`, `--memory=#`: |
| 198 | Set a memory usage limit. By default, `zstd` uses 128 MiB for decompression |
| 199 | as the maximum amount of memory the decompressor is allowed to use, but you can |
| 200 | override this manually if need be in either direction (i.e. you can increase or |
| 201 | decrease it). |
| 202 | |
| 203 | This is also used during compression when using with `--patch-from=`. In this case, |
| 204 | this parameter overrides that maximum size allowed for a dictionary. (128 MiB). |
| 205 | |
| 206 | Additionally, this can be used to limit memory for dictionary training. This parameter |
| 207 | overrides the default limit of 2 GiB. zstd will load training samples up to the memory limit |
| 208 | and ignore the rest. |
| 209 | * `--stream-size=#`: |
| 210 | Sets the pledged source size of input coming from a stream. This value must be exact, as it |
| 211 | will be included in the produced frame header. Incorrect stream sizes will cause an error. |
| 212 | This information will be used to better optimize compression parameters, resulting in |
| 213 | better and potentially faster compression, especially for smaller source sizes. |
| 214 | * `--size-hint=#`: |
| 215 | When handling input from a stream, `zstd` must guess how large the source size |
| 216 | will be when optimizing compression parameters. If the stream size is relatively |
| 217 | small, this guess may be a poor one, resulting in a higher compression ratio than |
| 218 | expected. This feature allows for controlling the guess when needed. |
| 219 | Exact guesses result in better compression ratios. Overestimates result in slightly |
| 220 | degraded compression ratios, while underestimates may result in significant degradation. |
| 221 | * `-o FILE`: |
| 222 | save result into `FILE`. |
| 223 | * `-f`, `--force`: |
| 224 | disable input and output checks. Allows overwriting existing files, input |
| 225 | from console, output to stdout, operating on links, block devices, etc. |
| 226 | During decompression and when the output destination is stdout, pass-through |
| 227 | unrecognized formats as-is. |
| 228 | * `-c`, `--stdout`: |
| 229 | write to standard output (even if it is the console); keep original files unchanged. |
| 230 | * `--[no-]sparse`: |
| 231 | enable / disable sparse FS support, |
| 232 | to make files with many zeroes smaller on disk. |
| 233 | Creating sparse files may save disk space and speed up decompression by |
| 234 | reducing the amount of disk I/O. |
| 235 | default: enabled when output is into a file, |
| 236 | and disabled when output is stdout. |
| 237 | This setting overrides default and can force sparse mode over stdout. |
| 238 | * `--[no-]pass-through` |
| 239 | enable / disable passing through uncompressed files as-is. During |
| 240 | decompression when pass-through is enabled, unrecognized formats will be |
| 241 | copied as-is from the input to the output. By default, pass-through will |
| 242 | occur when the output destination is stdout and the force (`-f`) option is |
| 243 | set. |
| 244 | * `--rm`: |
| 245 | remove source file(s) after successful compression or decompression. |
| 246 | This command is silently ignored if output is `stdout`. |
| 247 | If used in combination with `-o`, |
| 248 | triggers a confirmation prompt (which can be silenced with `-f`), as this is a destructive operation. |
| 249 | * `-k`, `--keep`: |
| 250 | keep source file(s) after successful compression or decompression. |
| 251 | This is the default behavior. |
| 252 | * `-r`: |
| 253 | operate recursively on directories. |
| 254 | It selects all files in the named directory and all its subdirectories. |
| 255 | This can be useful both to reduce command line typing, |
| 256 | and to circumvent shell expansion limitations, |
| 257 | when there are a lot of files and naming breaks the maximum size of a command line. |
| 258 | * `--filelist FILE` |
| 259 | read a list of files to process as content from `FILE`. |
| 260 | Format is compatible with `ls` output, with one file per line. |
| 261 | * `--output-dir-flat DIR`: |
| 262 | resulting files are stored into target `DIR` directory, |
| 263 | instead of same directory as origin file. |
| 264 | Be aware that this command can introduce name collision issues, |
| 265 | if multiple files, from different directories, end up having the same name. |
| 266 | Collision resolution ensures first file with a given name will be present in `DIR`, |
| 267 | while in combination with `-f`, the last file will be present instead. |
| 268 | * `--output-dir-mirror DIR`: |
| 269 | similar to `--output-dir-flat`, |
| 270 | the output files are stored underneath target `DIR` directory, |
| 271 | but this option will replicate input directory hierarchy into output `DIR`. |
| 272 | |
| 273 | If input directory contains "..", the files in this directory will be ignored. |
| 274 | If input directory is an absolute directory (i.e. "/var/tmp/abc"), |
| 275 | it will be stored into the "output-dir/var/tmp/abc". |
| 276 | If there are multiple input files or directories, |
| 277 | name collision resolution will follow the same rules as `--output-dir-flat`. |
| 278 | * `--format=FORMAT`: |
| 279 | compress and decompress in other formats. If compiled with |
| 280 | support, zstd can compress to or decompress from other compression algorithm |
| 281 | formats. Possibly available options are `zstd`, `gzip`, `xz`, `lzma`, and `lz4`. |
| 282 | If no such format is provided, `zstd` is the default. |
| 283 | * `-h`/`-H`, `--help`: |
| 284 | display help/long help and exit |
| 285 | * `-V`, `--version`: |
| 286 | display version number and exit. |
| 287 | Advanced: `-vV` also displays supported formats. |
| 288 | `-vvV` also displays POSIX support. |
| 289 | `-q` will only display the version number, suitable for machine reading. |
| 290 | * `-v`, `--verbose`: |
| 291 | verbose mode, display more information |
| 292 | * `-q`, `--quiet`: |
| 293 | suppress warnings, interactivity, and notifications. |
| 294 | specify twice to suppress errors too. |
| 295 | * `--no-progress`: |
| 296 | do not display the progress bar, but keep all other messages. |
| 297 | * `--show-default-cparams`: |
| 298 | shows the default compression parameters that will be used for a particular input file, based on the provided compression level and the input size. |
| 299 | If the provided file is not a regular file (e.g. a pipe), this flag will output the parameters used for inputs of unknown size. |
| 300 | * `--`: |
| 301 | All arguments after `--` are treated as files |
| 302 | |
| 303 | |
| 304 | ### gzip Operation Modifiers |
| 305 | When invoked via a `gzip` symlink, `zstd` will support further |
| 306 | options that intend to mimic the `gzip` behavior: |
| 307 | |
| 308 | * `-n`, `--no-name`: |
| 309 | do not store the original filename and timestamps when compressing |
| 310 | a file. This is the default behavior and hence a no-op. |
| 311 | * `--best`: |
| 312 | alias to the option `-9`. |
| 313 | |
| 314 | |
| 315 | ### Environment Variables |
| 316 | |
| 317 | Employing environment variables to set parameters has security implications. |
| 318 | Therefore, this avenue is intentionally limited. |
| 319 | Only `ZSTD_CLEVEL` and `ZSTD_NBTHREADS` are currently supported. |
| 320 | They set the compression level and number of threads to use during compression, respectively. |
| 321 | |
| 322 | `ZSTD_CLEVEL` can be used to set the level between 1 and 19 (the "normal" range). |
| 323 | If the value of `ZSTD_CLEVEL` is not a valid integer, it will be ignored with a warning message. |
| 324 | `ZSTD_CLEVEL` just replaces the default compression level (`3`). |
| 325 | |
| 326 | `ZSTD_NBTHREADS` can be used to set the number of threads `zstd` will attempt to use during compression. |
| 327 | If the value of `ZSTD_NBTHREADS` is not a valid unsigned integer, it will be ignored with a warning message. |
| 328 | `ZSTD_NBTHREADS` has a default value of (`1`), and is capped at ZSTDMT_NBWORKERS_MAX==200. |
| 329 | `zstd` must be compiled with multithread support for this to have any effect. |
| 330 | |
| 331 | They can both be overridden by corresponding command line arguments: |
| 332 | `-#` for compression level and `-T#` for number of compression threads. |
| 333 | |
| 334 | |
| 335 | DICTIONARY BUILDER |
| 336 | ------------------ |
| 337 | `zstd` offers _dictionary_ compression, |
| 338 | which greatly improves efficiency on small files and messages. |
| 339 | It's possible to train `zstd` with a set of samples, |
| 340 | the result of which is saved into a file called a `dictionary`. |
| 341 | Then, during compression and decompression, reference the same dictionary, |
| 342 | using command `-D dictionaryFileName`. |
| 343 | Compression of small files similar to the sample set will be greatly improved. |
| 344 | |
| 345 | * `--train FILEs`: |
| 346 | Use FILEs as training set to create a dictionary. |
| 347 | The training set should ideally contain a lot of samples (> 100), |
| 348 | and weight typically 100x the target dictionary size |
| 349 | (for example, ~10 MB for a 100 KB dictionary). |
| 350 | `--train` can be combined with `-r` to indicate a directory rather than listing all the files, |
| 351 | which can be useful to circumvent shell expansion limits. |
| 352 | |
| 353 | Since dictionary compression is mostly effective for small files, |
| 354 | the expectation is that the training set will only contain small files. |
| 355 | In the case where some samples happen to be large, |
| 356 | only the first 128 KiB of these samples will be used for training. |
| 357 | |
| 358 | `--train` supports multithreading if `zstd` is compiled with threading support (default). |
| 359 | Additional advanced parameters can be specified with `--train-fastcover`. |
| 360 | The legacy dictionary builder can be accessed with `--train-legacy`. |
| 361 | The slower cover dictionary builder can be accessed with `--train-cover`. |
| 362 | Default `--train` is equivalent to `--train-fastcover=d=8,steps=4`. |
| 363 | |
| 364 | * `-o FILE`: |
| 365 | Dictionary saved into `FILE` (default name: dictionary). |
| 366 | * `--maxdict=#`: |
| 367 | Limit dictionary to specified size (default: 112640 bytes). |
| 368 | As usual, quantities are expressed in bytes by default, |
| 369 | and it's possible to employ suffixes (like `KB` or `MB`) |
| 370 | to specify larger values. |
| 371 | * `-#`: |
| 372 | Use `#` compression level during training (optional). |
| 373 | Will generate statistics more tuned for selected compression level, |
| 374 | resulting in a _small_ compression ratio improvement for this level. |
| 375 | * `-B#`: |
| 376 | Split input files into blocks of size # (default: no split) |
| 377 | * `-M#`, `--memory=#`: |
| 378 | Limit the amount of sample data loaded for training (default: 2 GB). |
| 379 | Note that the default (2 GB) is also the maximum. |
| 380 | This parameter can be useful in situations where the training set size |
| 381 | is not well controlled and could be potentially very large. |
| 382 | Since speed of the training process is directly correlated to |
| 383 | the size of the training sample set, |
| 384 | a smaller sample set leads to faster training. |
| 385 | |
| 386 | In situations where the training set is larger than maximum memory, |
| 387 | the CLI will randomly select samples among the available ones, |
| 388 | up to the maximum allowed memory budget. |
| 389 | This is meant to improve dictionary relevance |
| 390 | by mitigating the potential impact of clustering, |
| 391 | such as selecting only files from the beginning of a list |
| 392 | sorted by modification date, or sorted by alphabetical order. |
| 393 | The randomization process is deterministic, so |
| 394 | training of the same list of files with the same parameters |
| 395 | will lead to the creation of the same dictionary. |
| 396 | |
| 397 | * `--dictID=#`: |
| 398 | A dictionary ID is a locally unique ID. |
| 399 | The decoder will use this value to verify it is using the right dictionary. |
| 400 | By default, zstd will create a 4-bytes random number ID. |
| 401 | It's possible to provide an explicit number ID instead. |
| 402 | It's up to the dictionary manager to not assign twice the same ID to |
| 403 | 2 different dictionaries. |
| 404 | Note that short numbers have an advantage: |
| 405 | an ID < 256 will only need 1 byte in the compressed frame header, |
| 406 | and an ID < 65536 will only need 2 bytes. |
| 407 | This compares favorably to 4 bytes default. |
| 408 | |
| 409 | Note that RFC8878 reserves IDs less than 32768 and greater than or equal to 2\^31, so they should not be used in public. |
| 410 | |
| 411 | * `--train-cover[=k#,d=#,steps=#,split=#,shrink[=#]]`: |
| 412 | Select parameters for the default dictionary builder algorithm named cover. |
| 413 | If _d_ is not specified, then it tries _d_ = 6 and _d_ = 8. |
| 414 | If _k_ is not specified, then it tries _steps_ values in the range [50, 2000]. |
| 415 | If _steps_ is not specified, then the default value of 40 is used. |
| 416 | If _split_ is not specified or split <= 0, then the default value of 100 is used. |
| 417 | Requires that _d_ <= _k_. |
| 418 | If _shrink_ flag is not used, then the default value for _shrinkDict_ of 0 is used. |
| 419 | If _shrink_ is not specified, then the default value for _shrinkDictMaxRegression_ of 1 is used. |
| 420 | |
| 421 | Selects segments of size _k_ with highest score to put in the dictionary. |
| 422 | The score of a segment is computed by the sum of the frequencies of all the |
| 423 | subsegments of size _d_. |
| 424 | Generally _d_ should be in the range [6, 8], occasionally up to 16, but the |
| 425 | algorithm will run faster with d <= _8_. |
| 426 | Good values for _k_ vary widely based on the input data, but a safe range is |
| 427 | [2 * _d_, 2000]. |
| 428 | If _split_ is 100, all input samples are used for both training and testing |
| 429 | to find optimal _d_ and _k_ to build dictionary. |
| 430 | Supports multithreading if `zstd` is compiled with threading support. |
| 431 | Having _shrink_ enabled takes a truncated dictionary of minimum size and doubles |
| 432 | in size until compression ratio of the truncated dictionary is at most |
| 433 | _shrinkDictMaxRegression%_ worse than the compression ratio of the largest dictionary. |
| 434 | |
| 435 | Examples: |
| 436 | |
| 437 | `zstd --train-cover FILEs` |
| 438 | |
| 439 | `zstd --train-cover=k=50,d=8 FILEs` |
| 440 | |
| 441 | `zstd --train-cover=d=8,steps=500 FILEs` |
| 442 | |
| 443 | `zstd --train-cover=k=50 FILEs` |
| 444 | |
| 445 | `zstd --train-cover=k=50,split=60 FILEs` |
| 446 | |
| 447 | `zstd --train-cover=shrink FILEs` |
| 448 | |
| 449 | `zstd --train-cover=shrink=2 FILEs` |
| 450 | |
| 451 | * `--train-fastcover[=k#,d=#,f=#,steps=#,split=#,accel=#]`: |
| 452 | Same as cover but with extra parameters _f_ and _accel_ and different default value of split |
| 453 | If _split_ is not specified, then it tries _split_ = 75. |
| 454 | If _f_ is not specified, then it tries _f_ = 20. |
| 455 | Requires that 0 < _f_ < 32. |
| 456 | If _accel_ is not specified, then it tries _accel_ = 1. |
| 457 | Requires that 0 < _accel_ <= 10. |
| 458 | Requires that _d_ = 6 or _d_ = 8. |
| 459 | |
| 460 | _f_ is log of size of array that keeps track of frequency of subsegments of size _d_. |
| 461 | The subsegment is hashed to an index in the range [0,2^_f_ - 1]. |
| 462 | It is possible that 2 different subsegments are hashed to the same index, and they are considered as the same subsegment when computing frequency. |
| 463 | Using a higher _f_ reduces collision but takes longer. |
| 464 | |
| 465 | Examples: |
| 466 | |
| 467 | `zstd --train-fastcover FILEs` |
| 468 | |
| 469 | `zstd --train-fastcover=d=8,f=15,accel=2 FILEs` |
| 470 | |
| 471 | * `--train-legacy[=selectivity=#]`: |
| 472 | Use legacy dictionary builder algorithm with the given dictionary |
| 473 | _selectivity_ (default: 9). |
| 474 | The smaller the _selectivity_ value, the denser the dictionary, |
| 475 | improving its efficiency but reducing its achievable maximum size. |
| 476 | `--train-legacy=s=#` is also accepted. |
| 477 | |
| 478 | Examples: |
| 479 | |
| 480 | `zstd --train-legacy FILEs` |
| 481 | |
| 482 | `zstd --train-legacy=selectivity=8 FILEs` |
| 483 | |
| 484 | |
| 485 | BENCHMARK |
| 486 | --------- |
| 487 | |
| 488 | * `-b#`: |
| 489 | benchmark file(s) using compression level # |
| 490 | * `-e#`: |
| 491 | benchmark file(s) using multiple compression levels, from `-b#` to `-e#` (inclusive) |
| 492 | * `-i#`: |
| 493 | minimum evaluation time, in seconds (default: 3s), benchmark mode only |
| 494 | * `-B#`, `--block-size=#`: |
| 495 | cut file(s) into independent chunks of size # (default: no chunking) |
| 496 | * `--priority=rt`: |
| 497 | set process priority to real-time |
| 498 | |
| 499 | **Output Format:** CompressionLevel#Filename: InputSize -> OutputSize (CompressionRatio), CompressionSpeed, DecompressionSpeed |
| 500 | |
| 501 | **Methodology:** For both compression and decompression speed, the entire input is compressed/decompressed in-memory to measure speed. A run lasts at least 1 sec, so when files are small, they are compressed/decompressed several times per run, in order to improve measurement accuracy. |
| 502 | |
| 503 | ADVANCED COMPRESSION OPTIONS |
| 504 | ---------------------------- |
| 505 | ### -B#: |
| 506 | Specify the size of each compression job. |
| 507 | This parameter is only available when multi-threading is enabled. |
| 508 | Each compression job is run in parallel, so this value indirectly impacts the nb of active threads. |
| 509 | Default job size varies depending on compression level (generally `4 * windowSize`). |
| 510 | `-B#` makes it possible to manually select a custom size. |
| 511 | Note that job size must respect a minimum value which is enforced transparently. |
| 512 | This minimum is either 512 KB, or `overlapSize`, whichever is largest. |
| 513 | Different job sizes will lead to non-identical compressed frames. |
| 514 | |
| 515 | ### --zstd[=options]: |
| 516 | `zstd` provides 22 predefined regular compression levels plus the fast levels. |
| 517 | This compression level is translated internally into a number of specific parameters that actually control the behavior of the compressor. |
| 518 | (You can see the result of this translation with `--show-default-cparams`.) |
| 519 | These specific parameters can be overridden with advanced compression options. |
| 520 | The _options_ are provided as a comma-separated list. |
| 521 | You may specify only the options you want to change and the rest will be |
| 522 | taken from the selected or default compression level. |
| 523 | The list of available _options_: |
| 524 | |
| 525 | - `strategy`=_strat_, `strat`=_strat_: |
| 526 | Specify a strategy used by a match finder. |
| 527 | |
| 528 | There are 9 strategies numbered from 1 to 9, from fastest to strongest: |
| 529 | 1=`ZSTD_fast`, 2=`ZSTD_dfast`, 3=`ZSTD_greedy`, |
| 530 | 4=`ZSTD_lazy`, 5=`ZSTD_lazy2`, 6=`ZSTD_btlazy2`, |
| 531 | 7=`ZSTD_btopt`, 8=`ZSTD_btultra`, 9=`ZSTD_btultra2`. |
| 532 | |
| 533 | - `windowLog`=_wlog_, `wlog`=_wlog_: |
| 534 | Specify the maximum number of bits for a match distance. |
| 535 | |
| 536 | The higher number of increases the chance to find a match which usually |
| 537 | improves compression ratio. |
| 538 | It also increases memory requirements for the compressor and decompressor. |
| 539 | The minimum _wlog_ is 10 (1 KiB) and the maximum is 30 (1 GiB) on 32-bit |
| 540 | platforms and 31 (2 GiB) on 64-bit platforms. |
| 541 | |
| 542 | Note: If `windowLog` is set to larger than 27, `--long=windowLog` or |
| 543 | `--memory=windowSize` needs to be passed to the decompressor. |
| 544 | |
| 545 | - `hashLog`=_hlog_, `hlog`=_hlog_: |
| 546 | Specify the maximum number of bits for a hash table. |
| 547 | |
| 548 | Bigger hash tables cause fewer collisions which usually makes compression |
| 549 | faster, but requires more memory during compression. |
| 550 | |
| 551 | The minimum _hlog_ is 6 (64 entries / 256 B) and the maximum is 30 (1B entries / 4 GiB). |
| 552 | |
| 553 | - `chainLog`=_clog_, `clog`=_clog_: |
| 554 | Specify the maximum number of bits for the secondary search structure, |
| 555 | whose form depends on the selected `strategy`. |
| 556 | |
| 557 | Higher numbers of bits increases the chance to find a match which usually |
| 558 | improves compression ratio. |
| 559 | It also slows down compression speed and increases memory requirements for |
| 560 | compression. |
| 561 | This option is ignored for the `ZSTD_fast` `strategy`, which only has the primary hash table. |
| 562 | |
| 563 | The minimum _clog_ is 6 (64 entries / 256 B) and the maximum is 29 (512M entries / 2 GiB) on 32-bit platforms |
| 564 | and 30 (1B entries / 4 GiB) on 64-bit platforms. |
| 565 | |
| 566 | - `searchLog`=_slog_, `slog`=_slog_: |
| 567 | Specify the maximum number of searches in a hash chain or a binary tree |
| 568 | using logarithmic scale. |
| 569 | |
| 570 | More searches increases the chance to find a match which usually increases |
| 571 | compression ratio but decreases compression speed. |
| 572 | |
| 573 | The minimum _slog_ is 1 and the maximum is 'windowLog' - 1. |
| 574 | |
| 575 | - `minMatch`=_mml_, `mml`=_mml_: |
| 576 | Specify the minimum searched length of a match in a hash table. |
| 577 | |
| 578 | Larger search lengths usually decrease compression ratio but improve |
| 579 | decompression speed. |
| 580 | |
| 581 | The minimum _mml_ is 3 and the maximum is 7. |
| 582 | |
| 583 | - `targetLength`=_tlen_, `tlen`=_tlen_: |
| 584 | The impact of this field vary depending on selected strategy. |
| 585 | |
| 586 | For `ZSTD_btopt`, `ZSTD_btultra` and `ZSTD_btultra2`, it specifies |
| 587 | the minimum match length that causes match finder to stop searching. |
| 588 | A larger `targetLength` usually improves compression ratio |
| 589 | but decreases compression speed. |
| 590 | |
| 591 | For `ZSTD_fast`, it triggers ultra-fast mode when > 0. |
| 592 | The value represents the amount of data skipped between match sampling. |
| 593 | Impact is reversed: a larger `targetLength` increases compression speed |
| 594 | but decreases compression ratio. |
| 595 | |
| 596 | For all other strategies, this field has no impact. |
| 597 | |
| 598 | The minimum _tlen_ is 0 and the maximum is 128 KiB. |
| 599 | |
| 600 | - `overlapLog`=_ovlog_, `ovlog`=_ovlog_: |
| 601 | Determine `overlapSize`, amount of data reloaded from previous job. |
| 602 | This parameter is only available when multithreading is enabled. |
| 603 | Reloading more data improves compression ratio, but decreases speed. |
| 604 | |
| 605 | The minimum _ovlog_ is 0, and the maximum is 9. |
| 606 | 1 means "no overlap", hence completely independent jobs. |
| 607 | 9 means "full overlap", meaning up to `windowSize` is reloaded from previous job. |
| 608 | Reducing _ovlog_ by 1 reduces the reloaded amount by a factor 2. |
| 609 | For example, 8 means "windowSize/2", and 6 means "windowSize/8". |
| 610 | Value 0 is special and means "default": _ovlog_ is automatically determined by `zstd`. |
| 611 | In which case, _ovlog_ will range from 6 to 9, depending on selected _strat_. |
| 612 | |
| 613 | - `ldmHashLog`=_lhlog_, `lhlog`=_lhlog_: |
| 614 | Specify the maximum size for a hash table used for long distance matching. |
| 615 | |
| 616 | This option is ignored unless long distance matching is enabled. |
| 617 | |
| 618 | Bigger hash tables usually improve compression ratio at the expense of more |
| 619 | memory during compression and a decrease in compression speed. |
| 620 | |
| 621 | The minimum _lhlog_ is 6 and the maximum is 30 (default: 20). |
| 622 | |
| 623 | - `ldmMinMatch`=_lmml_, `lmml`=_lmml_: |
| 624 | Specify the minimum searched length of a match for long distance matching. |
| 625 | |
| 626 | This option is ignored unless long distance matching is enabled. |
| 627 | |
| 628 | Larger/very small values usually decrease compression ratio. |
| 629 | |
| 630 | The minimum _lmml_ is 4 and the maximum is 4096 (default: 64). |
| 631 | |
| 632 | - `ldmBucketSizeLog`=_lblog_, `lblog`=_lblog_: |
| 633 | Specify the size of each bucket for the hash table used for long distance |
| 634 | matching. |
| 635 | |
| 636 | This option is ignored unless long distance matching is enabled. |
| 637 | |
| 638 | Larger bucket sizes improve collision resolution but decrease compression |
| 639 | speed. |
| 640 | |
| 641 | The minimum _lblog_ is 1 and the maximum is 8 (default: 3). |
| 642 | |
| 643 | - `ldmHashRateLog`=_lhrlog_, `lhrlog`=_lhrlog_: |
| 644 | Specify the frequency of inserting entries into the long distance matching |
| 645 | hash table. |
| 646 | |
| 647 | This option is ignored unless long distance matching is enabled. |
| 648 | |
| 649 | Larger values will improve compression speed. Deviating far from the |
| 650 | default value will likely result in a decrease in compression ratio. |
| 651 | |
| 652 | The default value is `wlog - lhlog`. |
| 653 | |
| 654 | ### Example |
| 655 | The following parameters sets advanced compression options to something |
| 656 | similar to predefined level 19 for files bigger than 256 KB: |
| 657 | |
| 658 | `--zstd`=wlog=23,clog=23,hlog=22,slog=6,mml=3,tlen=48,strat=6 |
| 659 | |
| 660 | SEE ALSO |
| 661 | -------- |
| 662 | `zstdgrep`(1), `zstdless`(1), `gzip`(1), `xz`(1) |
| 663 | |
| 664 | The <zstandard> format is specified in Y. Collet, "Zstandard Compression and the 'application/zstd' Media Type", https://www.ietf.org/rfc/rfc8878.txt, Internet RFC 8878 (February 2021). |
| 665 | |
| 666 | BUGS |
| 667 | ---- |
| 668 | Report bugs at: https://github.com/facebook/zstd/issues |
| 669 | |
| 670 | AUTHOR |
| 671 | ------ |
| 672 | Yann Collet |