| 1 | LZMA compression |
| 2 | ---------------- |
| 3 | Version: 9.35 |
| 4 | |
| 5 | This file describes LZMA encoding and decoding functions written in C language. |
| 6 | |
| 7 | LZMA is an improved version of famous LZ77 compression algorithm. |
| 8 | It was improved in way of maximum increasing of compression ratio, |
| 9 | keeping high decompression speed and low memory requirements for |
| 10 | decompressing. |
| 11 | |
| 12 | Note: you can read also LZMA Specification (lzma-specification.txt from LZMA SDK) |
| 13 | |
| 14 | Also you can look source code for LZMA encoding and decoding: |
| 15 | C/Util/Lzma/LzmaUtil.c |
| 16 | |
| 17 | |
| 18 | LZMA compressed file format |
| 19 | --------------------------- |
| 20 | Offset Size Description |
| 21 | 0 1 Special LZMA properties (lc,lp, pb in encoded form) |
| 22 | 1 4 Dictionary size (little endian) |
| 23 | 5 8 Uncompressed size (little endian). -1 means unknown size |
| 24 | 13 Compressed data |
| 25 | |
| 26 | |
| 27 | |
| 28 | ANSI-C LZMA Decoder |
| 29 | ~~~~~~~~~~~~~~~~~~~ |
| 30 | |
| 31 | Please note that interfaces for ANSI-C code were changed in LZMA SDK 4.58. |
| 32 | If you want to use old interfaces you can download previous version of LZMA SDK |
| 33 | from sourceforge.net site. |
| 34 | |
| 35 | To use ANSI-C LZMA Decoder you need the following files: |
| 36 | 1) LzmaDec.h + LzmaDec.c + 7zTypes.h + Precomp.h + Compiler.h |
| 37 | |
| 38 | Look example code: |
| 39 | C/Util/Lzma/LzmaUtil.c |
| 40 | |
| 41 | |
| 42 | Memory requirements for LZMA decoding |
| 43 | ------------------------------------- |
| 44 | |
| 45 | Stack usage of LZMA decoding function for local variables is not |
| 46 | larger than 200-400 bytes. |
| 47 | |
| 48 | LZMA Decoder uses dictionary buffer and internal state structure. |
| 49 | Internal state structure consumes |
| 50 | state_size = (4 + (1.5 << (lc + lp))) KB |
| 51 | by default (lc=3, lp=0), state_size = 16 KB. |
| 52 | |
| 53 | |
| 54 | How To decompress data |
| 55 | ---------------------- |
| 56 | |
| 57 | LZMA Decoder (ANSI-C version) now supports 2 interfaces: |
| 58 | 1) Single-call Decompressing |
| 59 | 2) Multi-call State Decompressing (zlib-like interface) |
| 60 | |
| 61 | You must use external allocator: |
| 62 | Example: |
| 63 | void *SzAlloc(void *p, size_t size) { p = p; return malloc(size); } |
| 64 | void SzFree(void *p, void *address) { p = p; free(address); } |
| 65 | ISzAlloc alloc = { SzAlloc, SzFree }; |
| 66 | |
| 67 | You can use p = p; operator to disable compiler warnings. |
| 68 | |
| 69 | |
| 70 | Single-call Decompressing |
| 71 | ------------------------- |
| 72 | When to use: RAM->RAM decompressing |
| 73 | Compile files: LzmaDec.h + LzmaDec.c + 7zTypes.h |
| 74 | Compile defines: no defines |
| 75 | Memory Requirements: |
| 76 | - Input buffer: compressed size |
| 77 | - Output buffer: uncompressed size |
| 78 | - LZMA Internal Structures: state_size (16 KB for default settings) |
| 79 | |
| 80 | Interface: |
| 81 | int LzmaDecode(Byte *dest, SizeT *destLen, const Byte *src, SizeT *srcLen, |
| 82 | const Byte *propData, unsigned propSize, ELzmaFinishMode finishMode, |
| 83 | ELzmaStatus *status, ISzAlloc *alloc); |
| 84 | In: |
| 85 | dest - output data |
| 86 | destLen - output data size |
| 87 | src - input data |
| 88 | srcLen - input data size |
| 89 | propData - LZMA properties (5 bytes) |
| 90 | propSize - size of propData buffer (5 bytes) |
| 91 | finishMode - It has meaning only if the decoding reaches output limit (*destLen). |
| 92 | LZMA_FINISH_ANY - Decode just destLen bytes. |
| 93 | LZMA_FINISH_END - Stream must be finished after (*destLen). |
| 94 | You can use LZMA_FINISH_END, when you know that |
| 95 | current output buffer covers last bytes of stream. |
| 96 | alloc - Memory allocator. |
| 97 | |
| 98 | Out: |
| 99 | destLen - processed output size |
| 100 | srcLen - processed input size |
| 101 | |
| 102 | Output: |
| 103 | SZ_OK |
| 104 | status: |
| 105 | LZMA_STATUS_FINISHED_WITH_MARK |
| 106 | LZMA_STATUS_NOT_FINISHED |
| 107 | LZMA_STATUS_MAYBE_FINISHED_WITHOUT_MARK |
| 108 | SZ_ERROR_DATA - Data error |
| 109 | SZ_ERROR_MEM - Memory allocation error |
| 110 | SZ_ERROR_UNSUPPORTED - Unsupported properties |
| 111 | SZ_ERROR_INPUT_EOF - It needs more bytes in input buffer (src). |
| 112 | |
| 113 | If LZMA decoder sees end_marker before reaching output limit, it returns OK result, |
| 114 | and output value of destLen will be less than output buffer size limit. |
| 115 | |
| 116 | You can use multiple checks to test data integrity after full decompression: |
| 117 | 1) Check Result and "status" variable. |
| 118 | 2) Check that output(destLen) = uncompressedSize, if you know real uncompressedSize. |
| 119 | 3) Check that output(srcLen) = compressedSize, if you know real compressedSize. |
| 120 | You must use correct finish mode in that case. */ |
| 121 | |
| 122 | |
| 123 | Multi-call State Decompressing (zlib-like interface) |
| 124 | ---------------------------------------------------- |
| 125 | |
| 126 | When to use: file->file decompressing |
| 127 | Compile files: LzmaDec.h + LzmaDec.c + 7zTypes.h |
| 128 | |
| 129 | Memory Requirements: |
| 130 | - Buffer for input stream: any size (for example, 16 KB) |
| 131 | - Buffer for output stream: any size (for example, 16 KB) |
| 132 | - LZMA Internal Structures: state_size (16 KB for default settings) |
| 133 | - LZMA dictionary (dictionary size is encoded in LZMA properties header) |
| 134 | |
| 135 | 1) read LZMA properties (5 bytes) and uncompressed size (8 bytes, little-endian) to header: |
| 136 | unsigned char header[LZMA_PROPS_SIZE + 8]; |
| 137 | ReadFile(inFile, header, sizeof(header) |
| 138 | |
| 139 | 2) Allocate CLzmaDec structures (state + dictionary) using LZMA properties |
| 140 | |
| 141 | CLzmaDec state; |
| 142 | LzmaDec_Constr(&state); |
| 143 | res = LzmaDec_Allocate(&state, header, LZMA_PROPS_SIZE, &g_Alloc); |
| 144 | if (res != SZ_OK) |
| 145 | return res; |
| 146 | |
| 147 | 3) Init LzmaDec structure before any new LZMA stream. And call LzmaDec_DecodeToBuf in loop |
| 148 | |
| 149 | LzmaDec_Init(&state); |
| 150 | for (;;) |
| 151 | { |
| 152 | ... |
| 153 | int res = LzmaDec_DecodeToBuf(CLzmaDec *p, Byte *dest, SizeT *destLen, |
| 154 | const Byte *src, SizeT *srcLen, ELzmaFinishMode finishMode); |
| 155 | ... |
| 156 | } |
| 157 | |
| 158 | |
| 159 | 4) Free all allocated structures |
| 160 | LzmaDec_Free(&state, &g_Alloc); |
| 161 | |
| 162 | Look example code: |
| 163 | C/Util/Lzma/LzmaUtil.c |
| 164 | |
| 165 | |
| 166 | How To compress data |
| 167 | -------------------- |
| 168 | |
| 169 | Compile files: |
| 170 | 7zTypes.h |
| 171 | Threads.h |
| 172 | LzmaEnc.h |
| 173 | LzmaEnc.c |
| 174 | LzFind.h |
| 175 | LzFind.c |
| 176 | LzFindMt.h |
| 177 | LzFindMt.c |
| 178 | LzHash.h |
| 179 | |
| 180 | Memory Requirements: |
| 181 | - (dictSize * 11.5 + 6 MB) + state_size |
| 182 | |
| 183 | Lzma Encoder can use two memory allocators: |
| 184 | 1) alloc - for small arrays. |
| 185 | 2) allocBig - for big arrays. |
| 186 | |
| 187 | For example, you can use Large RAM Pages (2 MB) in allocBig allocator for |
| 188 | better compression speed. Note that Windows has bad implementation for |
| 189 | Large RAM Pages. |
| 190 | It's OK to use same allocator for alloc and allocBig. |
| 191 | |
| 192 | |
| 193 | Single-call Compression with callbacks |
| 194 | -------------------------------------- |
| 195 | |
| 196 | Look example code: |
| 197 | C/Util/Lzma/LzmaUtil.c |
| 198 | |
| 199 | When to use: file->file compressing |
| 200 | |
| 201 | 1) you must implement callback structures for interfaces: |
| 202 | ISeqInStream |
| 203 | ISeqOutStream |
| 204 | ICompressProgress |
| 205 | ISzAlloc |
| 206 | |
| 207 | static void *SzAlloc(void *p, size_t size) { p = p; return MyAlloc(size); } |
| 208 | static void SzFree(void *p, void *address) { p = p; MyFree(address); } |
| 209 | static ISzAlloc g_Alloc = { SzAlloc, SzFree }; |
| 210 | |
| 211 | CFileSeqInStream inStream; |
| 212 | CFileSeqOutStream outStream; |
| 213 | |
| 214 | inStream.funcTable.Read = MyRead; |
| 215 | inStream.file = inFile; |
| 216 | outStream.funcTable.Write = MyWrite; |
| 217 | outStream.file = outFile; |
| 218 | |
| 219 | |
| 220 | 2) Create CLzmaEncHandle object; |
| 221 | |
| 222 | CLzmaEncHandle enc; |
| 223 | |
| 224 | enc = LzmaEnc_Create(&g_Alloc); |
| 225 | if (enc == 0) |
| 226 | return SZ_ERROR_MEM; |
| 227 | |
| 228 | |
| 229 | 3) initialize CLzmaEncProps properties; |
| 230 | |
| 231 | LzmaEncProps_Init(&props); |
| 232 | |
| 233 | Then you can change some properties in that structure. |
| 234 | |
| 235 | 4) Send LZMA properties to LZMA Encoder |
| 236 | |
| 237 | res = LzmaEnc_SetProps(enc, &props); |
| 238 | |
| 239 | 5) Write encoded properties to header |
| 240 | |
| 241 | Byte header[LZMA_PROPS_SIZE + 8]; |
| 242 | size_t headerSize = LZMA_PROPS_SIZE; |
| 243 | UInt64 fileSize; |
| 244 | int i; |
| 245 | |
| 246 | res = LzmaEnc_WriteProperties(enc, header, &headerSize); |
| 247 | fileSize = MyGetFileLength(inFile); |
| 248 | for (i = 0; i < 8; i++) |
| 249 | header[headerSize++] = (Byte)(fileSize >> (8 * i)); |
| 250 | MyWriteFileAndCheck(outFile, header, headerSize) |
| 251 | |
| 252 | 6) Call encoding function: |
| 253 | res = LzmaEnc_Encode(enc, &outStream.funcTable, &inStream.funcTable, |
| 254 | NULL, &g_Alloc, &g_Alloc); |
| 255 | |
| 256 | 7) Destroy LZMA Encoder Object |
| 257 | LzmaEnc_Destroy(enc, &g_Alloc, &g_Alloc); |
| 258 | |
| 259 | |
| 260 | If callback function return some error code, LzmaEnc_Encode also returns that code |
| 261 | or it can return the code like SZ_ERROR_READ, SZ_ERROR_WRITE or SZ_ERROR_PROGRESS. |
| 262 | |
| 263 | |
| 264 | Single-call RAM->RAM Compression |
| 265 | -------------------------------- |
| 266 | |
| 267 | Single-call RAM->RAM Compression is similar to Compression with callbacks, |
| 268 | but you provide pointers to buffers instead of pointers to stream callbacks: |
| 269 | |
| 270 | SRes LzmaEncode(Byte *dest, SizeT *destLen, const Byte *src, SizeT srcLen, |
| 271 | const CLzmaEncProps *props, Byte *propsEncoded, SizeT *propsSize, int writeEndMark, |
| 272 | ICompressProgress *progress, ISzAlloc *alloc, ISzAlloc *allocBig); |
| 273 | |
| 274 | Return code: |
| 275 | SZ_OK - OK |
| 276 | SZ_ERROR_MEM - Memory allocation error |
| 277 | SZ_ERROR_PARAM - Incorrect paramater |
| 278 | SZ_ERROR_OUTPUT_EOF - output buffer overflow |
| 279 | SZ_ERROR_THREAD - errors in multithreading functions (only for Mt version) |
| 280 | |
| 281 | |
| 282 | |
| 283 | Defines |
| 284 | ------- |
| 285 | |
| 286 | _LZMA_SIZE_OPT - Enable some optimizations in LZMA Decoder to get smaller executable code. |
| 287 | |
| 288 | _LZMA_PROB32 - It can increase the speed on some 32-bit CPUs, but memory usage for |
| 289 | some structures will be doubled in that case. |
| 290 | |
| 291 | _LZMA_UINT32_IS_ULONG - Define it if int is 16-bit on your compiler and long is 32-bit. |
| 292 | |
| 293 | _LZMA_NO_SYSTEM_SIZE_T - Define it if you don't want to use size_t type. |
| 294 | |
| 295 | |
| 296 | _7ZIP_PPMD_SUPPPORT - Define it if you don't want to support PPMD method in AMSI-C .7z decoder. |
| 297 | |
| 298 | |
| 299 | C++ LZMA Encoder/Decoder |
| 300 | ~~~~~~~~~~~~~~~~~~~~~~~~ |
| 301 | C++ LZMA code use COM-like interfaces. So if you want to use it, |
| 302 | you can study basics of COM/OLE. |
| 303 | C++ LZMA code is just wrapper over ANSI-C code. |
| 304 | |
| 305 | |
| 306 | C++ Notes |
| 307 | ~~~~~~~~~~~~~~~~~~~~~~~~ |
| 308 | If you use some C++ code folders in 7-Zip (for example, C++ code for .7z handling), |
| 309 | you must check that you correctly work with "new" operator. |
| 310 | 7-Zip can be compiled with MSVC 6.0 that doesn't throw "exception" from "new" operator. |
| 311 | So 7-Zip uses "CPP\Common\NewHandler.cpp" that redefines "new" operator: |
| 312 | operator new(size_t size) |
| 313 | { |
| 314 | void *p = ::malloc(size); |
| 315 | if (p == 0) |
| 316 | throw CNewException(); |
| 317 | return p; |
| 318 | } |
| 319 | If you use MSCV that throws exception for "new" operator, you can compile without |
| 320 | "NewHandler.cpp". So standard exception will be used. Actually some code of |
| 321 | 7-Zip catches any exception in internal code and converts it to HRESULT code. |
| 322 | So you don't need to catch CNewException, if you call COM interfaces of 7-Zip. |
| 323 | |
| 324 | --- |
| 325 | |
| 326 | http://www.7-zip.org |
| 327 | http://www.7-zip.org/sdk.html |
| 328 | http://www.7-zip.org/support.html |