648db22b |
1 | Decompressor Errata |
2 | =================== |
3 | |
4 | This document captures known decompressor bugs, where the decompressor rejects a valid zstd frame. |
5 | Each entry will contain: |
6 | 1. The last affected decompressor versions. |
7 | 2. The decompressor components affected. |
8 | 2. Whether the compressed frame could ever be produced by the reference compressor. |
f535537f |
9 | 3. An example frame (hexadecimal string when it can be short enough, link to golden file otherwise) |
648db22b |
10 | 4. A description of the bug. |
11 | |
12 | The document is in reverse chronological order, with the bugs that affect the most recent zstd decompressor versions listed first. |
13 | |
14 | |
f535537f |
15 | No sequence using the 2-bytes format |
16 | ------------------------------------------------ |
17 | |
18 | **Last affected version**: v1.5.5 |
19 | |
20 | **Affected decompressor component(s)**: Library & CLI |
21 | |
22 | **Produced by the reference compressor**: No |
23 | |
24 | **Example Frame**: see zstd/tests/golden-decompression/zeroSeq_2B.zst |
25 | |
26 | The zstd decoder incorrectly expects FSE tables when there are 0 sequences present in the block |
27 | if the value 0 is encoded using the 2-bytes format. |
28 | Instead, it should immediately end the sequence section, and move on to next block. |
29 | |
30 | This situation was never generated by the reference compressor, |
31 | because representing 0 sequences with the 2-bytes format is inefficient |
32 | (the 1-byte format is always used in this case). |
33 | |
34 | |
35 | Compressed block with a size of exactly 128 KB |
36 | ------------------------------------------------ |
37 | |
38 | **Last affected version**: v1.5.2 |
39 | |
40 | **Affected decompressor component(s)**: Library & CLI |
41 | |
42 | **Produced by the reference compressor**: No |
43 | |
44 | **Example Frame**: see zstd/tests/golden-decompression/block-128k.zst |
45 | |
46 | The zstd decoder incorrectly rejected blocks of type `Compressed_Block` when their size was exactly 128 KB. |
47 | Note that `128 KB - 1` was accepted, and `128 KB + 1` is forbidden by the spec. |
48 | |
49 | This type of block was never generated by the reference compressor. |
50 | |
51 | These blocks used to be disallowed by the spec up until spec version 0.3.2 when the restriction was lifted by [PR#1689](https://github.com/facebook/zstd/pull/1689). |
52 | |
53 | > A Compressed_Block has the extra restriction that Block_Size is always strictly less than the decompressed size. If this condition cannot be respected, the block must be sent uncompressed instead (Raw_Block). |
54 | |
55 | |
648db22b |
56 | Compressed block with 0 literals and 0 sequences |
57 | ------------------------------------------------ |
58 | |
59 | **Last affected version**: v1.5.2 |
60 | |
61 | **Affected decompressor component(s)**: Library & CLI |
62 | |
63 | **Produced by the reference compressor**: No |
64 | |
65 | **Example Frame**: `28b5 2ffd 2000 1500 0000 00` |
66 | |
67 | The zstd decoder incorrectly rejected blocks of type `Compressed_Block` that encodes literals as `Raw_Literals_Block` with no literals, and has no sequences. |
68 | |
69 | This type of block was never generated by the reference compressor. |
70 | |
71 | Additionally, these blocks were disallowed by the spec up until spec version 0.3.2 when the restriction was lifted by [PR#1689](https://github.com/facebook/zstd/pull/1689). |
72 | |
73 | > A Compressed_Block has the extra restriction that Block_Size is always strictly less than the decompressed size. If this condition cannot be respected, the block must be sent uncompressed instead (Raw_Block). |
74 | |
f535537f |
75 | |
648db22b |
76 | First block is RLE block |
77 | ------------------------ |
78 | |
79 | **Last affected version**: v1.4.3 |
80 | |
81 | **Affected decompressor component(s)**: CLI only |
82 | |
83 | **Produced by the reference compressor**: No |
84 | |
85 | **Example Frame**: `28b5 2ffd a001 0002 0002 0010 000b 0000 00` |
86 | |
87 | The zstd CLI decompressor rejected cases where the first block was an RLE block whose `Block_Size` is 131072, and the frame contains more than one block. |
88 | This only affected the zstd CLI, and not the library. |
89 | |
90 | The example is an RLE block with 131072 bytes, followed by a second RLE block with 1 byte. |
91 | |
92 | The compressor currently works around this limitation by explicitly avoiding producing RLE blocks as the first |
93 | block. |
94 | |
95 | https://github.com/facebook/zstd/blob/8814aa5bfa74f05a86e55e9d508da177a893ceeb/lib/compress/zstd_compress.c#L3527-L3535 |
96 | |
f535537f |
97 | |
648db22b |
98 | Tiny FSE Table & Block |
99 | ---------------------- |
100 | |
101 | **Last affected version**: v1.3.4 |
102 | |
103 | **Affected decompressor component(s)**: Library & CLI |
104 | |
105 | **Produced by the reference compressor**: Possibly until version v1.3.4, but probably never |
106 | |
107 | **Example Frame**: `28b5 2ffd 2027 c500 0080 f3f1 f0ec ebc6 c5c7 f09d 4300 0000 e0e0 0658 0100 603e 52` |
108 | |
109 | The zstd library rejected blocks of type `Compressed_Block` whose offset of the last table with type `FSE_Compressed_Mode` was less than 4 bytes from the end of the block. |
110 | |
111 | In more depth, let `Last_Table_Offset` be the offset in the compressed block (excluding the header) that |
112 | the last table with type `FSE_Compressed_Mode` started. If `Block_Content - Last_Table_Offset < 4` then |
113 | the buggy zstd decompressor would reject the block. This occurs when the last serialized table is 2 bytes |
114 | and the bitstream size is 1 byte. |
115 | |
116 | For example: |
117 | * There is 1 sequence in the block |
118 | * `Literals_Lengths_Mode` is `FSE_Compressed_Mode` & the serialized table size is 2 bytes |
119 | * `Offsets_Mode` is `Predefined_Mode` |
120 | * `Match_Lengths_Mode` is `Predefined_Mode` |
121 | * The bitstream is 1 byte. E.g. there is only one sequence and it fits in 1 byte. |
122 | |
123 | The total `Block_Content` is `5` bytes, and `Last_Table_Offset` is `2`. |
124 | |
125 | See the compressor workaround code: |
126 | |
127 | https://github.com/facebook/zstd/blob/8814aa5bfa74f05a86e55e9d508da177a893ceeb/lib/compress/zstd_compress.c#L2667-L2682 |
f535537f |
128 | |
129 | Magicless format |
130 | ---------------------- |
131 | |
132 | **Last affected version**: v1.5.5 |
133 | |
134 | **Affected decompressor component(s)**: Library |
135 | |
136 | **Produced by the reference compressor**: Yes (example: https://gist.github.com/embg/9940726094f4cf2cef162cffe9319232) |
137 | |
138 | **Example Frame**: `27 b5 2f fd 00 03 19 00 00 66 6f 6f 3f ba c4 59` |
139 | |
140 | v1.5.6 fixes several bugs in which the magicless-format decoder rejects valid frames. |
141 | These include but are not limited to: |
142 | * Valid frames that happen to begin with a legacy magic number (little-endian) |
143 | * Valid frames that happen to begin with a skippable magic number (little-endian) |
144 | |
145 | If you are affected by this issue and cannot update to v1.5.6 or later, there is a |
146 | workaround to recover affected data. Simply prepend the ZSTD magic number |
147 | `0xFD2FB528` (little-endian) to your data and decompress using the standard-format |
148 | decoder. |