9e052883 |
1 | LZMA compression\r |
2 | ----------------\r |
3 | Version: 9.35\r |
4 | \r |
5 | This file describes LZMA encoding and decoding functions written in C language.\r |
6 | \r |
7 | LZMA is an improved version of famous LZ77 compression algorithm. \r |
8 | It was improved in way of maximum increasing of compression ratio,\r |
9 | keeping high decompression speed and low memory requirements for \r |
10 | decompressing.\r |
11 | \r |
12 | Note: you can read also LZMA Specification (lzma-specification.txt from LZMA SDK)\r |
13 | \r |
14 | Also you can look source code for LZMA encoding and decoding:\r |
15 | C/Util/Lzma/LzmaUtil.c\r |
16 | \r |
17 | \r |
18 | LZMA compressed file format\r |
19 | ---------------------------\r |
20 | Offset Size Description\r |
21 | 0 1 Special LZMA properties (lc,lp, pb in encoded form)\r |
22 | 1 4 Dictionary size (little endian)\r |
23 | 5 8 Uncompressed size (little endian). -1 means unknown size\r |
24 | 13 Compressed data\r |
25 | \r |
26 | \r |
27 | \r |
28 | ANSI-C LZMA Decoder\r |
29 | ~~~~~~~~~~~~~~~~~~~\r |
30 | \r |
31 | Please note that interfaces for ANSI-C code were changed in LZMA SDK 4.58.\r |
32 | If you want to use old interfaces you can download previous version of LZMA SDK\r |
33 | from sourceforge.net site.\r |
34 | \r |
35 | To use ANSI-C LZMA Decoder you need the following files:\r |
36 | 1) LzmaDec.h + LzmaDec.c + 7zTypes.h + Precomp.h + Compiler.h\r |
37 | \r |
38 | Look example code:\r |
39 | C/Util/Lzma/LzmaUtil.c\r |
40 | \r |
41 | \r |
42 | Memory requirements for LZMA decoding\r |
43 | -------------------------------------\r |
44 | \r |
45 | Stack usage of LZMA decoding function for local variables is not \r |
46 | larger than 200-400 bytes.\r |
47 | \r |
48 | LZMA Decoder uses dictionary buffer and internal state structure.\r |
49 | Internal state structure consumes\r |
50 | state_size = (4 + (1.5 << (lc + lp))) KB\r |
51 | by default (lc=3, lp=0), state_size = 16 KB.\r |
52 | \r |
53 | \r |
54 | How To decompress data\r |
55 | ----------------------\r |
56 | \r |
57 | LZMA Decoder (ANSI-C version) now supports 2 interfaces:\r |
58 | 1) Single-call Decompressing\r |
59 | 2) Multi-call State Decompressing (zlib-like interface)\r |
60 | \r |
61 | You must use external allocator:\r |
62 | Example:\r |
63 | void *SzAlloc(void *p, size_t size) { p = p; return malloc(size); }\r |
64 | void SzFree(void *p, void *address) { p = p; free(address); }\r |
65 | ISzAlloc alloc = { SzAlloc, SzFree };\r |
66 | \r |
67 | You can use p = p; operator to disable compiler warnings.\r |
68 | \r |
69 | \r |
70 | Single-call Decompressing\r |
71 | -------------------------\r |
72 | When to use: RAM->RAM decompressing\r |
73 | Compile files: LzmaDec.h + LzmaDec.c + 7zTypes.h\r |
74 | Compile defines: no defines\r |
75 | Memory Requirements:\r |
76 | - Input buffer: compressed size\r |
77 | - Output buffer: uncompressed size\r |
78 | - LZMA Internal Structures: state_size (16 KB for default settings) \r |
79 | \r |
80 | Interface:\r |
81 | int LzmaDecode(Byte *dest, SizeT *destLen, const Byte *src, SizeT *srcLen,\r |
82 | const Byte *propData, unsigned propSize, ELzmaFinishMode finishMode, \r |
83 | ELzmaStatus *status, ISzAlloc *alloc);\r |
84 | In: \r |
85 | dest - output data\r |
86 | destLen - output data size\r |
87 | src - input data\r |
88 | srcLen - input data size\r |
89 | propData - LZMA properties (5 bytes)\r |
90 | propSize - size of propData buffer (5 bytes)\r |
91 | finishMode - It has meaning only if the decoding reaches output limit (*destLen).\r |
92 | LZMA_FINISH_ANY - Decode just destLen bytes.\r |
93 | LZMA_FINISH_END - Stream must be finished after (*destLen).\r |
94 | You can use LZMA_FINISH_END, when you know that \r |
95 | current output buffer covers last bytes of stream. \r |
96 | alloc - Memory allocator.\r |
97 | \r |
98 | Out: \r |
99 | destLen - processed output size \r |
100 | srcLen - processed input size \r |
101 | \r |
102 | Output:\r |
103 | SZ_OK\r |
104 | status:\r |
105 | LZMA_STATUS_FINISHED_WITH_MARK\r |
106 | LZMA_STATUS_NOT_FINISHED \r |
107 | LZMA_STATUS_MAYBE_FINISHED_WITHOUT_MARK\r |
108 | SZ_ERROR_DATA - Data error\r |
109 | SZ_ERROR_MEM - Memory allocation error\r |
110 | SZ_ERROR_UNSUPPORTED - Unsupported properties\r |
111 | SZ_ERROR_INPUT_EOF - It needs more bytes in input buffer (src).\r |
112 | \r |
113 | If LZMA decoder sees end_marker before reaching output limit, it returns OK result,\r |
114 | and output value of destLen will be less than output buffer size limit.\r |
115 | \r |
116 | You can use multiple checks to test data integrity after full decompression:\r |
117 | 1) Check Result and "status" variable.\r |
118 | 2) Check that output(destLen) = uncompressedSize, if you know real uncompressedSize.\r |
119 | 3) Check that output(srcLen) = compressedSize, if you know real compressedSize. \r |
120 | You must use correct finish mode in that case. */ \r |
121 | \r |
122 | \r |
123 | Multi-call State Decompressing (zlib-like interface)\r |
124 | ----------------------------------------------------\r |
125 | \r |
126 | When to use: file->file decompressing \r |
127 | Compile files: LzmaDec.h + LzmaDec.c + 7zTypes.h\r |
128 | \r |
129 | Memory Requirements:\r |
130 | - Buffer for input stream: any size (for example, 16 KB)\r |
131 | - Buffer for output stream: any size (for example, 16 KB)\r |
132 | - LZMA Internal Structures: state_size (16 KB for default settings) \r |
133 | - LZMA dictionary (dictionary size is encoded in LZMA properties header)\r |
134 | \r |
135 | 1) read LZMA properties (5 bytes) and uncompressed size (8 bytes, little-endian) to header:\r |
136 | unsigned char header[LZMA_PROPS_SIZE + 8];\r |
137 | ReadFile(inFile, header, sizeof(header)\r |
138 | \r |
139 | 2) Allocate CLzmaDec structures (state + dictionary) using LZMA properties\r |
140 | \r |
141 | CLzmaDec state;\r |
142 | LzmaDec_Constr(&state);\r |
143 | res = LzmaDec_Allocate(&state, header, LZMA_PROPS_SIZE, &g_Alloc);\r |
144 | if (res != SZ_OK)\r |
145 | return res;\r |
146 | \r |
147 | 3) Init LzmaDec structure before any new LZMA stream. And call LzmaDec_DecodeToBuf in loop\r |
148 | \r |
149 | LzmaDec_Init(&state);\r |
150 | for (;;)\r |
151 | {\r |
152 | ... \r |
153 | int res = LzmaDec_DecodeToBuf(CLzmaDec *p, Byte *dest, SizeT *destLen, \r |
154 | const Byte *src, SizeT *srcLen, ELzmaFinishMode finishMode);\r |
155 | ...\r |
156 | }\r |
157 | \r |
158 | \r |
159 | 4) Free all allocated structures\r |
160 | LzmaDec_Free(&state, &g_Alloc);\r |
161 | \r |
162 | Look example code:\r |
163 | C/Util/Lzma/LzmaUtil.c\r |
164 | \r |
165 | \r |
166 | How To compress data\r |
167 | --------------------\r |
168 | \r |
169 | Compile files: \r |
170 | 7zTypes.h\r |
171 | Threads.h \r |
172 | LzmaEnc.h\r |
173 | LzmaEnc.c\r |
174 | LzFind.h\r |
175 | LzFind.c\r |
176 | LzFindMt.h\r |
177 | LzFindMt.c\r |
178 | LzHash.h\r |
179 | \r |
180 | Memory Requirements:\r |
181 | - (dictSize * 11.5 + 6 MB) + state_size\r |
182 | \r |
183 | Lzma Encoder can use two memory allocators:\r |
184 | 1) alloc - for small arrays.\r |
185 | 2) allocBig - for big arrays.\r |
186 | \r |
187 | For example, you can use Large RAM Pages (2 MB) in allocBig allocator for \r |
188 | better compression speed. Note that Windows has bad implementation for \r |
189 | Large RAM Pages. \r |
190 | It's OK to use same allocator for alloc and allocBig.\r |
191 | \r |
192 | \r |
193 | Single-call Compression with callbacks\r |
194 | --------------------------------------\r |
195 | \r |
196 | Look example code:\r |
197 | C/Util/Lzma/LzmaUtil.c\r |
198 | \r |
199 | When to use: file->file compressing \r |
200 | \r |
201 | 1) you must implement callback structures for interfaces:\r |
202 | ISeqInStream\r |
203 | ISeqOutStream\r |
204 | ICompressProgress\r |
205 | ISzAlloc\r |
206 | \r |
207 | static void *SzAlloc(void *p, size_t size) { p = p; return MyAlloc(size); }\r |
208 | static void SzFree(void *p, void *address) { p = p; MyFree(address); }\r |
209 | static ISzAlloc g_Alloc = { SzAlloc, SzFree };\r |
210 | \r |
211 | CFileSeqInStream inStream;\r |
212 | CFileSeqOutStream outStream;\r |
213 | \r |
214 | inStream.funcTable.Read = MyRead;\r |
215 | inStream.file = inFile;\r |
216 | outStream.funcTable.Write = MyWrite;\r |
217 | outStream.file = outFile;\r |
218 | \r |
219 | \r |
220 | 2) Create CLzmaEncHandle object;\r |
221 | \r |
222 | CLzmaEncHandle enc;\r |
223 | \r |
224 | enc = LzmaEnc_Create(&g_Alloc);\r |
225 | if (enc == 0)\r |
226 | return SZ_ERROR_MEM;\r |
227 | \r |
228 | \r |
229 | 3) initialize CLzmaEncProps properties;\r |
230 | \r |
231 | LzmaEncProps_Init(&props);\r |
232 | \r |
233 | Then you can change some properties in that structure.\r |
234 | \r |
235 | 4) Send LZMA properties to LZMA Encoder\r |
236 | \r |
237 | res = LzmaEnc_SetProps(enc, &props);\r |
238 | \r |
239 | 5) Write encoded properties to header\r |
240 | \r |
241 | Byte header[LZMA_PROPS_SIZE + 8];\r |
242 | size_t headerSize = LZMA_PROPS_SIZE;\r |
243 | UInt64 fileSize;\r |
244 | int i;\r |
245 | \r |
246 | res = LzmaEnc_WriteProperties(enc, header, &headerSize);\r |
247 | fileSize = MyGetFileLength(inFile);\r |
248 | for (i = 0; i < 8; i++)\r |
249 | header[headerSize++] = (Byte)(fileSize >> (8 * i));\r |
250 | MyWriteFileAndCheck(outFile, header, headerSize)\r |
251 | \r |
252 | 6) Call encoding function:\r |
253 | res = LzmaEnc_Encode(enc, &outStream.funcTable, &inStream.funcTable, \r |
254 | NULL, &g_Alloc, &g_Alloc);\r |
255 | \r |
256 | 7) Destroy LZMA Encoder Object\r |
257 | LzmaEnc_Destroy(enc, &g_Alloc, &g_Alloc);\r |
258 | \r |
259 | \r |
260 | If callback function return some error code, LzmaEnc_Encode also returns that code\r |
261 | or it can return the code like SZ_ERROR_READ, SZ_ERROR_WRITE or SZ_ERROR_PROGRESS.\r |
262 | \r |
263 | \r |
264 | Single-call RAM->RAM Compression\r |
265 | --------------------------------\r |
266 | \r |
267 | Single-call RAM->RAM Compression is similar to Compression with callbacks,\r |
268 | but you provide pointers to buffers instead of pointers to stream callbacks:\r |
269 | \r |
270 | SRes LzmaEncode(Byte *dest, SizeT *destLen, const Byte *src, SizeT srcLen,\r |
271 | const CLzmaEncProps *props, Byte *propsEncoded, SizeT *propsSize, int writeEndMark, \r |
272 | ICompressProgress *progress, ISzAlloc *alloc, ISzAlloc *allocBig);\r |
273 | \r |
274 | Return code:\r |
275 | SZ_OK - OK\r |
276 | SZ_ERROR_MEM - Memory allocation error \r |
277 | SZ_ERROR_PARAM - Incorrect paramater\r |
278 | SZ_ERROR_OUTPUT_EOF - output buffer overflow\r |
279 | SZ_ERROR_THREAD - errors in multithreading functions (only for Mt version)\r |
280 | \r |
281 | \r |
282 | \r |
283 | Defines\r |
284 | -------\r |
285 | \r |
286 | _LZMA_SIZE_OPT - Enable some optimizations in LZMA Decoder to get smaller executable code.\r |
287 | \r |
288 | _LZMA_PROB32 - It can increase the speed on some 32-bit CPUs, but memory usage for \r |
289 | some structures will be doubled in that case.\r |
290 | \r |
291 | _LZMA_UINT32_IS_ULONG - Define it if int is 16-bit on your compiler and long is 32-bit.\r |
292 | \r |
293 | _LZMA_NO_SYSTEM_SIZE_T - Define it if you don't want to use size_t type.\r |
294 | \r |
295 | \r |
296 | _7ZIP_PPMD_SUPPPORT - Define it if you don't want to support PPMD method in AMSI-C .7z decoder.\r |
297 | \r |
298 | \r |
299 | C++ LZMA Encoder/Decoder \r |
300 | ~~~~~~~~~~~~~~~~~~~~~~~~\r |
301 | C++ LZMA code use COM-like interfaces. So if you want to use it, \r |
302 | you can study basics of COM/OLE.\r |
303 | C++ LZMA code is just wrapper over ANSI-C code.\r |
304 | \r |
305 | \r |
306 | C++ Notes\r |
307 | ~~~~~~~~~~~~~~~~~~~~~~~~\r |
308 | If you use some C++ code folders in 7-Zip (for example, C++ code for .7z handling),\r |
309 | you must check that you correctly work with "new" operator.\r |
310 | 7-Zip can be compiled with MSVC 6.0 that doesn't throw "exception" from "new" operator.\r |
311 | So 7-Zip uses "CPP\Common\NewHandler.cpp" that redefines "new" operator:\r |
312 | operator new(size_t size)\r |
313 | {\r |
314 | void *p = ::malloc(size);\r |
315 | if (p == 0)\r |
316 | throw CNewException();\r |
317 | return p;\r |
318 | }\r |
319 | If you use MSCV that throws exception for "new" operator, you can compile without \r |
320 | "NewHandler.cpp". So standard exception will be used. Actually some code of \r |
321 | 7-Zip catches any exception in internal code and converts it to HRESULT code.\r |
322 | So you don't need to catch CNewException, if you call COM interfaces of 7-Zip.\r |
323 | \r |
324 | ---\r |
325 | \r |
326 | http://www.7-zip.org\r |
327 | http://www.7-zip.org/sdk.html\r |
328 | http://www.7-zip.org/support.html\r |