| 1 | Programs and scripts for automated testing of Zstandard |
| 2 | ======================================================= |
| 3 | |
| 4 | This directory contains the following programs and scripts: |
| 5 | - `datagen` : Synthetic and parametrable data generator, for tests |
| 6 | - `fullbench` : Precisely measure speed for each zstd inner functions |
| 7 | - `fuzzer` : Test tool, to check zstd integrity on target platform |
| 8 | - `paramgrill` : parameter tester for zstd |
| 9 | - `test-zstd-speed.py` : script for testing zstd speed difference between commits |
| 10 | - `test-zstd-versions.py` : compatibility test between zstd versions stored on Github (v0.1+) |
| 11 | - `zstreamtest` : Fuzzer test tool for zstd streaming API |
| 12 | - `legacy` : Test tool to test decoding of legacy zstd frames |
| 13 | - `decodecorpus` : Tool to generate valid Zstandard frames, for verifying decoder implementations |
| 14 | |
| 15 | |
| 16 | #### `test-zstd-versions.py` - script for testing zstd interoperability between versions |
| 17 | |
| 18 | This script creates `versionsTest` directory to which zstd repository is cloned. |
| 19 | Then all tagged (released) versions of zstd are compiled. |
| 20 | In the following step interoperability between zstd versions is checked. |
| 21 | |
| 22 | #### `automated-benchmarking.py` - script for benchmarking zstd prs to dev |
| 23 | |
| 24 | This script benchmarks facebook:dev and changes from pull requests made to zstd and compares |
| 25 | them against facebook:dev to detect regressions. This script currently runs on a dedicated |
| 26 | desktop machine for every pull request that is made to the zstd repo but can also |
| 27 | be run on any machine via the command line interface. |
| 28 | |
| 29 | There are three modes of usage for this script: fastmode will just run a minimal single |
| 30 | build comparison (between facebook:dev and facebook:release), onetime will pull all the current |
| 31 | pull requests from the zstd repo and compare facebook:dev to all of them once, continuous |
| 32 | will continuously get pull requests from the zstd repo and run benchmarks against facebook:dev. |
| 33 | |
| 34 | ``` |
| 35 | Example usage: python automated_benchmarking.py |
| 36 | ``` |
| 37 | |
| 38 | ``` |
| 39 | usage: automated_benchmarking.py [-h] [--directory DIRECTORY] |
| 40 | [--levels LEVELS] [--iterations ITERATIONS] |
| 41 | [--emails EMAILS] [--frequency FREQUENCY] |
| 42 | [--mode MODE] [--dict DICT] |
| 43 | |
| 44 | optional arguments: |
| 45 | -h, --help show this help message and exit |
| 46 | --directory DIRECTORY |
| 47 | directory with files to benchmark |
| 48 | --levels LEVELS levels to test e.g. ('1,2,3') |
| 49 | --iterations ITERATIONS |
| 50 | number of benchmark iterations to run |
| 51 | --emails EMAILS email addresses of people who will be alerted upon |
| 52 | regression. Only for continuous mode |
| 53 | --frequency FREQUENCY |
| 54 | specifies the number of seconds to wait before each |
| 55 | successive check for new PRs in continuous mode |
| 56 | --mode MODE 'fastmode', 'onetime', 'current', or 'continuous' (see |
| 57 | README.md for details) |
| 58 | --dict DICT filename of dictionary to use (when set, this |
| 59 | dictionary will be used to compress the files provided |
| 60 | inside --directory) |
| 61 | ``` |
| 62 | |
| 63 | #### `test-zstd-speed.py` - script for testing zstd speed difference between commits |
| 64 | |
| 65 | DEPRECATED |
| 66 | |
| 67 | This script creates `speedTest` directory to which zstd repository is cloned. |
| 68 | Then it compiles all branches of zstd and performs a speed benchmark for a given list of files (the `testFileNames` parameter). |
| 69 | After `sleepTime` (an optional parameter, default 300 seconds) seconds the script checks repository for new commits. |
| 70 | If a new commit is found it is compiled and a speed benchmark for this commit is performed. |
| 71 | The results of the speed benchmark are compared to the previous results. |
| 72 | If compression or decompression speed for one of zstd levels is lower than `lowerLimit` (an optional parameter, default 0.98) the speed benchmark is restarted. |
| 73 | If second results are also lower than `lowerLimit` the warning e-mail is sent to recipients from the list (the `emails` parameter). |
| 74 | |
| 75 | Additional remarks: |
| 76 | - To be sure that speed results are accurate the script should be run on a "stable" target system with no other jobs running in parallel |
| 77 | - Using the script with virtual machines can lead to large variations of speed results |
| 78 | - The speed benchmark is not performed until computers' load average is lower than `maxLoadAvg` (an optional parameter, default 0.75) |
| 79 | - The script sends e-mails using `mutt`; if `mutt` is not available it sends e-mails without attachments using `mail`; if both are not available it only prints a warning |
| 80 | |
| 81 | |
| 82 | The example usage with two test files, one e-mail address, and with an additional message: |
| 83 | ``` |
| 84 | ./test-zstd-speed.py "silesia.tar calgary.tar" "email@gmail.com" --message "tested on my laptop" --sleepTime 60 |
| 85 | ``` |
| 86 | |
| 87 | To run the script in background please use: |
| 88 | ``` |
| 89 | nohup ./test-zstd-speed.py testFileNames emails & |
| 90 | ``` |
| 91 | |
| 92 | The full list of parameters: |
| 93 | ``` |
| 94 | positional arguments: |
| 95 | testFileNames file names list for speed benchmark |
| 96 | emails list of e-mail addresses to send warnings |
| 97 | |
| 98 | optional arguments: |
| 99 | -h, --help show this help message and exit |
| 100 | --message MESSAGE attach an additional message to e-mail |
| 101 | --lowerLimit LOWERLIMIT |
| 102 | send email if speed is lower than given limit |
| 103 | --maxLoadAvg MAXLOADAVG |
| 104 | maximum load average to start testing |
| 105 | --lastCLevel LASTCLEVEL |
| 106 | last compression level for testing |
| 107 | --sleepTime SLEEPTIME |
| 108 | frequency of repository checking in seconds |
| 109 | ``` |
| 110 | |
| 111 | #### `decodecorpus` - tool to generate Zstandard frames for decoder testing |
| 112 | Command line tool to generate test .zst files. |
| 113 | |
| 114 | This tool will generate .zst files with checksums, |
| 115 | as well as optionally output the corresponding correct uncompressed data for |
| 116 | extra verification. |
| 117 | |
| 118 | Example: |
| 119 | ``` |
| 120 | ./decodecorpus -ptestfiles -otestfiles -n10000 -s5 |
| 121 | ``` |
| 122 | will generate 10,000 sample .zst files using a seed of 5 in the `testfiles` directory, |
| 123 | with the zstd checksum field set, |
| 124 | as well as the 10,000 original files for more detailed comparison of decompression results. |
| 125 | |
| 126 | ``` |
| 127 | ./decodecorpus -t -T1mn |
| 128 | ``` |
| 129 | will choose a random seed, and for 1 minute, |
| 130 | generate random test frames and ensure that the |
| 131 | zstd library correctly decompresses them in both simple and streaming modes. |
| 132 | |
| 133 | #### `paramgrill` - tool for generating compression table parameters and optimizing parameters on file given constraints |
| 134 | |
| 135 | Full list of arguments |
| 136 | ``` |
| 137 | -T# : set level 1 speed objective |
| 138 | -B# : cut input into blocks of size # (default : single block) |
| 139 | -S : benchmarks a single run (example command: -Sl3w10h12) |
| 140 | w# - windowLog |
| 141 | h# - hashLog |
| 142 | c# - chainLog |
| 143 | s# - searchLog |
| 144 | l# - minMatch |
| 145 | t# - targetLength |
| 146 | S# - strategy |
| 147 | L# - level |
| 148 | --zstd= : Single run, parameter selection syntax same as zstdcli with more parameters |
| 149 | (Added forceAttachDictionary / fadt) |
| 150 | When invoked with --optimize, this represents the sample to exceed. |
| 151 | --optimize= : find parameters to maximize compression ratio given parameters |
| 152 | Can use all --zstd= commands to constrain the type of solution found in addition to the following constraints |
| 153 | cSpeed= : Minimum compression speed |
| 154 | dSpeed= : Minimum decompression speed |
| 155 | cMem= : Maximum compression memory |
| 156 | lvl= : Searches for solutions which are strictly better than that compression lvl in ratio and cSpeed, |
| 157 | stc= : When invoked with lvl=, represents percentage slack in ratio/cSpeed allowed for a solution to be considered (Default 100%) |
| 158 | : In normal operation, represents percentage slack in choosing viable starting strategy selection in choosing the default parameters |
| 159 | (Lower value will begin with stronger strategies) (Default 90%) |
| 160 | speedRatio= (accepts decimals) |
| 161 | : determines value of gains in speed vs gains in ratio |
| 162 | when determining overall winner (default 5 (1% ratio = 5% speed)). |
| 163 | tries= : Maximum number of random restarts on a single strategy before switching (Default 5) |
| 164 | Higher values will make optimizer run longer, more chances to find better solution. |
| 165 | memLog : Limits the log of the size of each memotable (1 per strategy). Will use hash tables when state space is larger than max size. |
| 166 | Setting memLog = 0 turns off memoization |
| 167 | --display= : specify which parameters are included in the output |
| 168 | can use all --zstd parameter names and 'cParams' as a shorthand for all parameters used in ZSTD_compressionParameters |
| 169 | (Default: display all params available) |
| 170 | -P# : generated sample compressibility (when no file is provided) |
| 171 | -t# : Caps runtime of operation in seconds (default: 99999 seconds (about 27 hours)) |
| 172 | -v : Prints Benchmarking output |
| 173 | -D : Next argument dictionary file |
| 174 | -s : Benchmark all files separately |
| 175 | -q : Quiet, repeat for more quiet |
| 176 | -q Prints parameters + results whenever a new best is found |
| 177 | -qq Only prints parameters whenever a new best is found, prints final parameters + results |
| 178 | -qqq Only print final parameters + results |
| 179 | -qqqq Only prints final parameter set in the form --zstd= |
| 180 | -v : Verbose, cancels quiet, repeat for more volume |
| 181 | -v Prints all candidate parameters and results |
| 182 | |
| 183 | ``` |
| 184 | Any inputs afterwards are treated as files to benchmark. |