b24e7fce |
1 | LZMA compression |
2 | ---------------- |
3 | Version: 9.35 |
4 | |
5 | This file describes LZMA encoding and decoding functions written in C language. |
6 | |
7 | LZMA is an improved version of famous LZ77 compression algorithm. |
8 | It was improved in way of maximum increasing of compression ratio, |
9 | keeping high decompression speed and low memory requirements for |
10 | decompressing. |
11 | |
12 | Note: you can read also LZMA Specification (lzma-specification.txt from LZMA SDK) |
13 | |
14 | Also you can look source code for LZMA encoding and decoding: |
15 | C/Util/Lzma/LzmaUtil.c |
16 | |
17 | |
18 | LZMA compressed file format |
19 | --------------------------- |
20 | Offset Size Description |
21 | 0 1 Special LZMA properties (lc,lp, pb in encoded form) |
22 | 1 4 Dictionary size (little endian) |
23 | 5 8 Uncompressed size (little endian). -1 means unknown size |
24 | 13 Compressed data |
25 | |
26 | |
27 | |
28 | ANSI-C LZMA Decoder |
29 | ~~~~~~~~~~~~~~~~~~~ |
30 | |
31 | Please note that interfaces for ANSI-C code were changed in LZMA SDK 4.58. |
32 | If you want to use old interfaces you can download previous version of LZMA SDK |
33 | from sourceforge.net site. |
34 | |
35 | To use ANSI-C LZMA Decoder you need the following files: |
36 | 1) LzmaDec.h + LzmaDec.c + 7zTypes.h + Precomp.h + Compiler.h |
37 | |
38 | Look example code: |
39 | C/Util/Lzma/LzmaUtil.c |
40 | |
41 | |
42 | Memory requirements for LZMA decoding |
43 | ------------------------------------- |
44 | |
45 | Stack usage of LZMA decoding function for local variables is not |
46 | larger than 200-400 bytes. |
47 | |
48 | LZMA Decoder uses dictionary buffer and internal state structure. |
49 | Internal state structure consumes |
50 | state_size = (4 + (1.5 << (lc + lp))) KB |
51 | by default (lc=3, lp=0), state_size = 16 KB. |
52 | |
53 | |
54 | How To decompress data |
55 | ---------------------- |
56 | |
57 | LZMA Decoder (ANSI-C version) now supports 2 interfaces: |
58 | 1) Single-call Decompressing |
59 | 2) Multi-call State Decompressing (zlib-like interface) |
60 | |
61 | You must use external allocator: |
62 | Example: |
63 | void *SzAlloc(void *p, size_t size) { p = p; return malloc(size); } |
64 | void SzFree(void *p, void *address) { p = p; free(address); } |
65 | ISzAlloc alloc = { SzAlloc, SzFree }; |
66 | |
67 | You can use p = p; operator to disable compiler warnings. |
68 | |
69 | |
70 | Single-call Decompressing |
71 | ------------------------- |
72 | When to use: RAM->RAM decompressing |
73 | Compile files: LzmaDec.h + LzmaDec.c + 7zTypes.h |
74 | Compile defines: no defines |
75 | Memory Requirements: |
76 | - Input buffer: compressed size |
77 | - Output buffer: uncompressed size |
78 | - LZMA Internal Structures: state_size (16 KB for default settings) |
79 | |
80 | Interface: |
81 | int LzmaDecode(Byte *dest, SizeT *destLen, const Byte *src, SizeT *srcLen, |
82 | const Byte *propData, unsigned propSize, ELzmaFinishMode finishMode, |
83 | ELzmaStatus *status, ISzAlloc *alloc); |
84 | In: |
85 | dest - output data |
86 | destLen - output data size |
87 | src - input data |
88 | srcLen - input data size |
89 | propData - LZMA properties (5 bytes) |
90 | propSize - size of propData buffer (5 bytes) |
91 | finishMode - It has meaning only if the decoding reaches output limit (*destLen). |
92 | LZMA_FINISH_ANY - Decode just destLen bytes. |
93 | LZMA_FINISH_END - Stream must be finished after (*destLen). |
94 | You can use LZMA_FINISH_END, when you know that |
95 | current output buffer covers last bytes of stream. |
96 | alloc - Memory allocator. |
97 | |
98 | Out: |
99 | destLen - processed output size |
100 | srcLen - processed input size |
101 | |
102 | Output: |
103 | SZ_OK |
104 | status: |
105 | LZMA_STATUS_FINISHED_WITH_MARK |
106 | LZMA_STATUS_NOT_FINISHED |
107 | LZMA_STATUS_MAYBE_FINISHED_WITHOUT_MARK |
108 | SZ_ERROR_DATA - Data error |
109 | SZ_ERROR_MEM - Memory allocation error |
110 | SZ_ERROR_UNSUPPORTED - Unsupported properties |
111 | SZ_ERROR_INPUT_EOF - It needs more bytes in input buffer (src). |
112 | |
113 | If LZMA decoder sees end_marker before reaching output limit, it returns OK result, |
114 | and output value of destLen will be less than output buffer size limit. |
115 | |
116 | You can use multiple checks to test data integrity after full decompression: |
117 | 1) Check Result and "status" variable. |
118 | 2) Check that output(destLen) = uncompressedSize, if you know real uncompressedSize. |
119 | 3) Check that output(srcLen) = compressedSize, if you know real compressedSize. |
120 | You must use correct finish mode in that case. */ |
121 | |
122 | |
123 | Multi-call State Decompressing (zlib-like interface) |
124 | ---------------------------------------------------- |
125 | |
126 | When to use: file->file decompressing |
127 | Compile files: LzmaDec.h + LzmaDec.c + 7zTypes.h |
128 | |
129 | Memory Requirements: |
130 | - Buffer for input stream: any size (for example, 16 KB) |
131 | - Buffer for output stream: any size (for example, 16 KB) |
132 | - LZMA Internal Structures: state_size (16 KB for default settings) |
133 | - LZMA dictionary (dictionary size is encoded in LZMA properties header) |
134 | |
135 | 1) read LZMA properties (5 bytes) and uncompressed size (8 bytes, little-endian) to header: |
136 | unsigned char header[LZMA_PROPS_SIZE + 8]; |
137 | ReadFile(inFile, header, sizeof(header) |
138 | |
139 | 2) Allocate CLzmaDec structures (state + dictionary) using LZMA properties |
140 | |
141 | CLzmaDec state; |
142 | LzmaDec_Constr(&state); |
143 | res = LzmaDec_Allocate(&state, header, LZMA_PROPS_SIZE, &g_Alloc); |
144 | if (res != SZ_OK) |
145 | return res; |
146 | |
147 | 3) Init LzmaDec structure before any new LZMA stream. And call LzmaDec_DecodeToBuf in loop |
148 | |
149 | LzmaDec_Init(&state); |
150 | for (;;) |
151 | { |
152 | ... |
153 | int res = LzmaDec_DecodeToBuf(CLzmaDec *p, Byte *dest, SizeT *destLen, |
154 | const Byte *src, SizeT *srcLen, ELzmaFinishMode finishMode); |
155 | ... |
156 | } |
157 | |
158 | |
159 | 4) Free all allocated structures |
160 | LzmaDec_Free(&state, &g_Alloc); |
161 | |
162 | Look example code: |
163 | C/Util/Lzma/LzmaUtil.c |
164 | |
165 | |
166 | How To compress data |
167 | -------------------- |
168 | |
169 | Compile files: |
170 | 7zTypes.h |
171 | Threads.h |
172 | LzmaEnc.h |
173 | LzmaEnc.c |
174 | LzFind.h |
175 | LzFind.c |
176 | LzFindMt.h |
177 | LzFindMt.c |
178 | LzHash.h |
179 | |
180 | Memory Requirements: |
181 | - (dictSize * 11.5 + 6 MB) + state_size |
182 | |
183 | Lzma Encoder can use two memory allocators: |
184 | 1) alloc - for small arrays. |
185 | 2) allocBig - for big arrays. |
186 | |
187 | For example, you can use Large RAM Pages (2 MB) in allocBig allocator for |
188 | better compression speed. Note that Windows has bad implementation for |
189 | Large RAM Pages. |
190 | It's OK to use same allocator for alloc and allocBig. |
191 | |
192 | |
193 | Single-call Compression with callbacks |
194 | -------------------------------------- |
195 | |
196 | Look example code: |
197 | C/Util/Lzma/LzmaUtil.c |
198 | |
199 | When to use: file->file compressing |
200 | |
201 | 1) you must implement callback structures for interfaces: |
202 | ISeqInStream |
203 | ISeqOutStream |
204 | ICompressProgress |
205 | ISzAlloc |
206 | |
207 | static void *SzAlloc(void *p, size_t size) { p = p; return MyAlloc(size); } |
208 | static void SzFree(void *p, void *address) { p = p; MyFree(address); } |
209 | static ISzAlloc g_Alloc = { SzAlloc, SzFree }; |
210 | |
211 | CFileSeqInStream inStream; |
212 | CFileSeqOutStream outStream; |
213 | |
214 | inStream.funcTable.Read = MyRead; |
215 | inStream.file = inFile; |
216 | outStream.funcTable.Write = MyWrite; |
217 | outStream.file = outFile; |
218 | |
219 | |
220 | 2) Create CLzmaEncHandle object; |
221 | |
222 | CLzmaEncHandle enc; |
223 | |
224 | enc = LzmaEnc_Create(&g_Alloc); |
225 | if (enc == 0) |
226 | return SZ_ERROR_MEM; |
227 | |
228 | |
229 | 3) initialize CLzmaEncProps properties; |
230 | |
231 | LzmaEncProps_Init(&props); |
232 | |
233 | Then you can change some properties in that structure. |
234 | |
235 | 4) Send LZMA properties to LZMA Encoder |
236 | |
237 | res = LzmaEnc_SetProps(enc, &props); |
238 | |
239 | 5) Write encoded properties to header |
240 | |
241 | Byte header[LZMA_PROPS_SIZE + 8]; |
242 | size_t headerSize = LZMA_PROPS_SIZE; |
243 | UInt64 fileSize; |
244 | int i; |
245 | |
246 | res = LzmaEnc_WriteProperties(enc, header, &headerSize); |
247 | fileSize = MyGetFileLength(inFile); |
248 | for (i = 0; i < 8; i++) |
249 | header[headerSize++] = (Byte)(fileSize >> (8 * i)); |
250 | MyWriteFileAndCheck(outFile, header, headerSize) |
251 | |
252 | 6) Call encoding function: |
253 | res = LzmaEnc_Encode(enc, &outStream.funcTable, &inStream.funcTable, |
254 | NULL, &g_Alloc, &g_Alloc); |
255 | |
256 | 7) Destroy LZMA Encoder Object |
257 | LzmaEnc_Destroy(enc, &g_Alloc, &g_Alloc); |
258 | |
259 | |
260 | If callback function return some error code, LzmaEnc_Encode also returns that code |
261 | or it can return the code like SZ_ERROR_READ, SZ_ERROR_WRITE or SZ_ERROR_PROGRESS. |
262 | |
263 | |
264 | Single-call RAM->RAM Compression |
265 | -------------------------------- |
266 | |
267 | Single-call RAM->RAM Compression is similar to Compression with callbacks, |
268 | but you provide pointers to buffers instead of pointers to stream callbacks: |
269 | |
270 | SRes LzmaEncode(Byte *dest, SizeT *destLen, const Byte *src, SizeT srcLen, |
271 | const CLzmaEncProps *props, Byte *propsEncoded, SizeT *propsSize, int writeEndMark, |
272 | ICompressProgress *progress, ISzAlloc *alloc, ISzAlloc *allocBig); |
273 | |
274 | Return code: |
275 | SZ_OK - OK |
276 | SZ_ERROR_MEM - Memory allocation error |
277 | SZ_ERROR_PARAM - Incorrect paramater |
278 | SZ_ERROR_OUTPUT_EOF - output buffer overflow |
279 | SZ_ERROR_THREAD - errors in multithreading functions (only for Mt version) |
280 | |
281 | |
282 | |
283 | Defines |
284 | ------- |
285 | |
286 | _LZMA_SIZE_OPT - Enable some optimizations in LZMA Decoder to get smaller executable code. |
287 | |
288 | _LZMA_PROB32 - It can increase the speed on some 32-bit CPUs, but memory usage for |
289 | some structures will be doubled in that case. |
290 | |
291 | _LZMA_UINT32_IS_ULONG - Define it if int is 16-bit on your compiler and long is 32-bit. |
292 | |
293 | _LZMA_NO_SYSTEM_SIZE_T - Define it if you don't want to use size_t type. |
294 | |
295 | |
296 | _7ZIP_PPMD_SUPPPORT - Define it if you don't want to support PPMD method in AMSI-C .7z decoder. |
297 | |
298 | |
299 | C++ LZMA Encoder/Decoder |
300 | ~~~~~~~~~~~~~~~~~~~~~~~~ |
301 | C++ LZMA code use COM-like interfaces. So if you want to use it, |
302 | you can study basics of COM/OLE. |
303 | C++ LZMA code is just wrapper over ANSI-C code. |
304 | |
305 | |
306 | C++ Notes |
307 | ~~~~~~~~~~~~~~~~~~~~~~~~ |
308 | If you use some C++ code folders in 7-Zip (for example, C++ code for .7z handling), |
309 | you must check that you correctly work with "new" operator. |
310 | 7-Zip can be compiled with MSVC 6.0 that doesn't throw "exception" from "new" operator. |
311 | So 7-Zip uses "CPP\Common\NewHandler.cpp" that redefines "new" operator: |
312 | operator new(size_t size) |
313 | { |
314 | void *p = ::malloc(size); |
315 | if (p == 0) |
316 | throw CNewException(); |
317 | return p; |
318 | } |
319 | If you use MSCV that throws exception for "new" operator, you can compile without |
320 | "NewHandler.cpp". So standard exception will be used. Actually some code of |
321 | 7-Zip catches any exception in internal code and converts it to HRESULT code. |
322 | So you don't need to catch CNewException, if you call COM interfaces of 7-Zip. |
323 | |
324 | --- |
325 | |
326 | http://www.7-zip.org |
327 | http://www.7-zip.org/sdk.html |
328 | http://www.7-zip.org/support.html |