648db22b |
1 | # Zstandard Seekable Format |
2 | |
3 | The seekable format splits compressed data into a series of independent "frames", |
4 | each compressed individually, |
5 | so that decompression of a section in the middle of an archive |
6 | only requires zstd to decompress at most a frame's worth of extra data, |
7 | instead of the entire archive. |
8 | |
9 | The frames are appended, so that the decompression of the entire payload |
10 | still regenerates the original content, using any compliant zstd decoder. |
11 | |
12 | On top of that, the seekable format generates a jump table, |
13 | which makes it possible to jump directly to the position of the relevant frame |
14 | when requesting only a segment of the data. |
15 | The jump table is simply ignored by zstd decoders unaware of the seekable format. |
16 | |
17 | The format is delivered with an API to create seekable archives |
18 | and to retrieve arbitrary segments inside the archive. |
19 | |
20 | ### Maximum Frame Size parameter |
21 | |
22 | When creating a seekable archive, the main parameter is the maximum frame size. |
23 | |
24 | At compression time, user can manually select the boundaries between segments, |
25 | but they don't have to: long segments will be automatically split |
26 | when larger than selected maximum frame size. |
27 | |
28 | Small frame sizes reduce decompression cost when requesting small segments, |
29 | because the decoder will nonetheless have to decompress an entire frame |
30 | to recover just a single byte from it. |
31 | |
32 | A good rule of thumb is to select a maximum frame size roughly equivalent |
33 | to the access pattern when it's known. |
34 | For example, if the application tends to request 4KB blocks, |
35 | then it's a good idea to set a maximum frame size in the vicinity of 4 KB. |
36 | |
37 | But small frame sizes also reduce compression ratio, |
38 | and increase the cost for the jump table, |
39 | so there is a balance to find. |
40 | |
41 | In general, try to avoid really tiny frame sizes (<1 KB), |
42 | which would have a large negative impact on compression ratio. |