| 1 | LZMA compression\r |
| 2 | ----------------\r |
| 3 | Version: 9.35\r |
| 4 | \r |
| 5 | This file describes LZMA encoding and decoding functions written in C language.\r |
| 6 | \r |
| 7 | LZMA is an improved version of famous LZ77 compression algorithm. \r |
| 8 | It was improved in way of maximum increasing of compression ratio,\r |
| 9 | keeping high decompression speed and low memory requirements for \r |
| 10 | decompressing.\r |
| 11 | \r |
| 12 | Note: you can read also LZMA Specification (lzma-specification.txt from LZMA SDK)\r |
| 13 | \r |
| 14 | Also you can look source code for LZMA encoding and decoding:\r |
| 15 | C/Util/Lzma/LzmaUtil.c\r |
| 16 | \r |
| 17 | \r |
| 18 | LZMA compressed file format\r |
| 19 | ---------------------------\r |
| 20 | Offset Size Description\r |
| 21 | 0 1 Special LZMA properties (lc,lp, pb in encoded form)\r |
| 22 | 1 4 Dictionary size (little endian)\r |
| 23 | 5 8 Uncompressed size (little endian). -1 means unknown size\r |
| 24 | 13 Compressed data\r |
| 25 | \r |
| 26 | \r |
| 27 | \r |
| 28 | ANSI-C LZMA Decoder\r |
| 29 | ~~~~~~~~~~~~~~~~~~~\r |
| 30 | \r |
| 31 | Please note that interfaces for ANSI-C code were changed in LZMA SDK 4.58.\r |
| 32 | If you want to use old interfaces you can download previous version of LZMA SDK\r |
| 33 | from sourceforge.net site.\r |
| 34 | \r |
| 35 | To use ANSI-C LZMA Decoder you need the following files:\r |
| 36 | 1) LzmaDec.h + LzmaDec.c + 7zTypes.h + Precomp.h + Compiler.h\r |
| 37 | \r |
| 38 | Look example code:\r |
| 39 | C/Util/Lzma/LzmaUtil.c\r |
| 40 | \r |
| 41 | \r |
| 42 | Memory requirements for LZMA decoding\r |
| 43 | -------------------------------------\r |
| 44 | \r |
| 45 | Stack usage of LZMA decoding function for local variables is not \r |
| 46 | larger than 200-400 bytes.\r |
| 47 | \r |
| 48 | LZMA Decoder uses dictionary buffer and internal state structure.\r |
| 49 | Internal state structure consumes\r |
| 50 | state_size = (4 + (1.5 << (lc + lp))) KB\r |
| 51 | by default (lc=3, lp=0), state_size = 16 KB.\r |
| 52 | \r |
| 53 | \r |
| 54 | How To decompress data\r |
| 55 | ----------------------\r |
| 56 | \r |
| 57 | LZMA Decoder (ANSI-C version) now supports 2 interfaces:\r |
| 58 | 1) Single-call Decompressing\r |
| 59 | 2) Multi-call State Decompressing (zlib-like interface)\r |
| 60 | \r |
| 61 | You must use external allocator:\r |
| 62 | Example:\r |
| 63 | void *SzAlloc(void *p, size_t size) { p = p; return malloc(size); }\r |
| 64 | void SzFree(void *p, void *address) { p = p; free(address); }\r |
| 65 | ISzAlloc alloc = { SzAlloc, SzFree };\r |
| 66 | \r |
| 67 | You can use p = p; operator to disable compiler warnings.\r |
| 68 | \r |
| 69 | \r |
| 70 | Single-call Decompressing\r |
| 71 | -------------------------\r |
| 72 | When to use: RAM->RAM decompressing\r |
| 73 | Compile files: LzmaDec.h + LzmaDec.c + 7zTypes.h\r |
| 74 | Compile defines: no defines\r |
| 75 | Memory Requirements:\r |
| 76 | - Input buffer: compressed size\r |
| 77 | - Output buffer: uncompressed size\r |
| 78 | - LZMA Internal Structures: state_size (16 KB for default settings) \r |
| 79 | \r |
| 80 | Interface:\r |
| 81 | int LzmaDecode(Byte *dest, SizeT *destLen, const Byte *src, SizeT *srcLen,\r |
| 82 | const Byte *propData, unsigned propSize, ELzmaFinishMode finishMode, \r |
| 83 | ELzmaStatus *status, ISzAlloc *alloc);\r |
| 84 | In: \r |
| 85 | dest - output data\r |
| 86 | destLen - output data size\r |
| 87 | src - input data\r |
| 88 | srcLen - input data size\r |
| 89 | propData - LZMA properties (5 bytes)\r |
| 90 | propSize - size of propData buffer (5 bytes)\r |
| 91 | finishMode - It has meaning only if the decoding reaches output limit (*destLen).\r |
| 92 | LZMA_FINISH_ANY - Decode just destLen bytes.\r |
| 93 | LZMA_FINISH_END - Stream must be finished after (*destLen).\r |
| 94 | You can use LZMA_FINISH_END, when you know that \r |
| 95 | current output buffer covers last bytes of stream. \r |
| 96 | alloc - Memory allocator.\r |
| 97 | \r |
| 98 | Out: \r |
| 99 | destLen - processed output size \r |
| 100 | srcLen - processed input size \r |
| 101 | \r |
| 102 | Output:\r |
| 103 | SZ_OK\r |
| 104 | status:\r |
| 105 | LZMA_STATUS_FINISHED_WITH_MARK\r |
| 106 | LZMA_STATUS_NOT_FINISHED \r |
| 107 | LZMA_STATUS_MAYBE_FINISHED_WITHOUT_MARK\r |
| 108 | SZ_ERROR_DATA - Data error\r |
| 109 | SZ_ERROR_MEM - Memory allocation error\r |
| 110 | SZ_ERROR_UNSUPPORTED - Unsupported properties\r |
| 111 | SZ_ERROR_INPUT_EOF - It needs more bytes in input buffer (src).\r |
| 112 | \r |
| 113 | If LZMA decoder sees end_marker before reaching output limit, it returns OK result,\r |
| 114 | and output value of destLen will be less than output buffer size limit.\r |
| 115 | \r |
| 116 | You can use multiple checks to test data integrity after full decompression:\r |
| 117 | 1) Check Result and "status" variable.\r |
| 118 | 2) Check that output(destLen) = uncompressedSize, if you know real uncompressedSize.\r |
| 119 | 3) Check that output(srcLen) = compressedSize, if you know real compressedSize. \r |
| 120 | You must use correct finish mode in that case. */ \r |
| 121 | \r |
| 122 | \r |
| 123 | Multi-call State Decompressing (zlib-like interface)\r |
| 124 | ----------------------------------------------------\r |
| 125 | \r |
| 126 | When to use: file->file decompressing \r |
| 127 | Compile files: LzmaDec.h + LzmaDec.c + 7zTypes.h\r |
| 128 | \r |
| 129 | Memory Requirements:\r |
| 130 | - Buffer for input stream: any size (for example, 16 KB)\r |
| 131 | - Buffer for output stream: any size (for example, 16 KB)\r |
| 132 | - LZMA Internal Structures: state_size (16 KB for default settings) \r |
| 133 | - LZMA dictionary (dictionary size is encoded in LZMA properties header)\r |
| 134 | \r |
| 135 | 1) read LZMA properties (5 bytes) and uncompressed size (8 bytes, little-endian) to header:\r |
| 136 | unsigned char header[LZMA_PROPS_SIZE + 8];\r |
| 137 | ReadFile(inFile, header, sizeof(header)\r |
| 138 | \r |
| 139 | 2) Allocate CLzmaDec structures (state + dictionary) using LZMA properties\r |
| 140 | \r |
| 141 | CLzmaDec state;\r |
| 142 | LzmaDec_Constr(&state);\r |
| 143 | res = LzmaDec_Allocate(&state, header, LZMA_PROPS_SIZE, &g_Alloc);\r |
| 144 | if (res != SZ_OK)\r |
| 145 | return res;\r |
| 146 | \r |
| 147 | 3) Init LzmaDec structure before any new LZMA stream. And call LzmaDec_DecodeToBuf in loop\r |
| 148 | \r |
| 149 | LzmaDec_Init(&state);\r |
| 150 | for (;;)\r |
| 151 | {\r |
| 152 | ... \r |
| 153 | int res = LzmaDec_DecodeToBuf(CLzmaDec *p, Byte *dest, SizeT *destLen, \r |
| 154 | const Byte *src, SizeT *srcLen, ELzmaFinishMode finishMode);\r |
| 155 | ...\r |
| 156 | }\r |
| 157 | \r |
| 158 | \r |
| 159 | 4) Free all allocated structures\r |
| 160 | LzmaDec_Free(&state, &g_Alloc);\r |
| 161 | \r |
| 162 | Look example code:\r |
| 163 | C/Util/Lzma/LzmaUtil.c\r |
| 164 | \r |
| 165 | \r |
| 166 | How To compress data\r |
| 167 | --------------------\r |
| 168 | \r |
| 169 | Compile files: \r |
| 170 | 7zTypes.h\r |
| 171 | Threads.h \r |
| 172 | LzmaEnc.h\r |
| 173 | LzmaEnc.c\r |
| 174 | LzFind.h\r |
| 175 | LzFind.c\r |
| 176 | LzFindMt.h\r |
| 177 | LzFindMt.c\r |
| 178 | LzHash.h\r |
| 179 | \r |
| 180 | Memory Requirements:\r |
| 181 | - (dictSize * 11.5 + 6 MB) + state_size\r |
| 182 | \r |
| 183 | Lzma Encoder can use two memory allocators:\r |
| 184 | 1) alloc - for small arrays.\r |
| 185 | 2) allocBig - for big arrays.\r |
| 186 | \r |
| 187 | For example, you can use Large RAM Pages (2 MB) in allocBig allocator for \r |
| 188 | better compression speed. Note that Windows has bad implementation for \r |
| 189 | Large RAM Pages. \r |
| 190 | It's OK to use same allocator for alloc and allocBig.\r |
| 191 | \r |
| 192 | \r |
| 193 | Single-call Compression with callbacks\r |
| 194 | --------------------------------------\r |
| 195 | \r |
| 196 | Look example code:\r |
| 197 | C/Util/Lzma/LzmaUtil.c\r |
| 198 | \r |
| 199 | When to use: file->file compressing \r |
| 200 | \r |
| 201 | 1) you must implement callback structures for interfaces:\r |
| 202 | ISeqInStream\r |
| 203 | ISeqOutStream\r |
| 204 | ICompressProgress\r |
| 205 | ISzAlloc\r |
| 206 | \r |
| 207 | static void *SzAlloc(void *p, size_t size) { p = p; return MyAlloc(size); }\r |
| 208 | static void SzFree(void *p, void *address) { p = p; MyFree(address); }\r |
| 209 | static ISzAlloc g_Alloc = { SzAlloc, SzFree };\r |
| 210 | \r |
| 211 | CFileSeqInStream inStream;\r |
| 212 | CFileSeqOutStream outStream;\r |
| 213 | \r |
| 214 | inStream.funcTable.Read = MyRead;\r |
| 215 | inStream.file = inFile;\r |
| 216 | outStream.funcTable.Write = MyWrite;\r |
| 217 | outStream.file = outFile;\r |
| 218 | \r |
| 219 | \r |
| 220 | 2) Create CLzmaEncHandle object;\r |
| 221 | \r |
| 222 | CLzmaEncHandle enc;\r |
| 223 | \r |
| 224 | enc = LzmaEnc_Create(&g_Alloc);\r |
| 225 | if (enc == 0)\r |
| 226 | return SZ_ERROR_MEM;\r |
| 227 | \r |
| 228 | \r |
| 229 | 3) initialize CLzmaEncProps properties;\r |
| 230 | \r |
| 231 | LzmaEncProps_Init(&props);\r |
| 232 | \r |
| 233 | Then you can change some properties in that structure.\r |
| 234 | \r |
| 235 | 4) Send LZMA properties to LZMA Encoder\r |
| 236 | \r |
| 237 | res = LzmaEnc_SetProps(enc, &props);\r |
| 238 | \r |
| 239 | 5) Write encoded properties to header\r |
| 240 | \r |
| 241 | Byte header[LZMA_PROPS_SIZE + 8];\r |
| 242 | size_t headerSize = LZMA_PROPS_SIZE;\r |
| 243 | UInt64 fileSize;\r |
| 244 | int i;\r |
| 245 | \r |
| 246 | res = LzmaEnc_WriteProperties(enc, header, &headerSize);\r |
| 247 | fileSize = MyGetFileLength(inFile);\r |
| 248 | for (i = 0; i < 8; i++)\r |
| 249 | header[headerSize++] = (Byte)(fileSize >> (8 * i));\r |
| 250 | MyWriteFileAndCheck(outFile, header, headerSize)\r |
| 251 | \r |
| 252 | 6) Call encoding function:\r |
| 253 | res = LzmaEnc_Encode(enc, &outStream.funcTable, &inStream.funcTable, \r |
| 254 | NULL, &g_Alloc, &g_Alloc);\r |
| 255 | \r |
| 256 | 7) Destroy LZMA Encoder Object\r |
| 257 | LzmaEnc_Destroy(enc, &g_Alloc, &g_Alloc);\r |
| 258 | \r |
| 259 | \r |
| 260 | If callback function return some error code, LzmaEnc_Encode also returns that code\r |
| 261 | or it can return the code like SZ_ERROR_READ, SZ_ERROR_WRITE or SZ_ERROR_PROGRESS.\r |
| 262 | \r |
| 263 | \r |
| 264 | Single-call RAM->RAM Compression\r |
| 265 | --------------------------------\r |
| 266 | \r |
| 267 | Single-call RAM->RAM Compression is similar to Compression with callbacks,\r |
| 268 | but you provide pointers to buffers instead of pointers to stream callbacks:\r |
| 269 | \r |
| 270 | SRes LzmaEncode(Byte *dest, SizeT *destLen, const Byte *src, SizeT srcLen,\r |
| 271 | const CLzmaEncProps *props, Byte *propsEncoded, SizeT *propsSize, int writeEndMark, \r |
| 272 | ICompressProgress *progress, ISzAlloc *alloc, ISzAlloc *allocBig);\r |
| 273 | \r |
| 274 | Return code:\r |
| 275 | SZ_OK - OK\r |
| 276 | SZ_ERROR_MEM - Memory allocation error \r |
| 277 | SZ_ERROR_PARAM - Incorrect paramater\r |
| 278 | SZ_ERROR_OUTPUT_EOF - output buffer overflow\r |
| 279 | SZ_ERROR_THREAD - errors in multithreading functions (only for Mt version)\r |
| 280 | \r |
| 281 | \r |
| 282 | \r |
| 283 | Defines\r |
| 284 | -------\r |
| 285 | \r |
| 286 | _LZMA_SIZE_OPT - Enable some optimizations in LZMA Decoder to get smaller executable code.\r |
| 287 | \r |
| 288 | _LZMA_PROB32 - It can increase the speed on some 32-bit CPUs, but memory usage for \r |
| 289 | some structures will be doubled in that case.\r |
| 290 | \r |
| 291 | _LZMA_UINT32_IS_ULONG - Define it if int is 16-bit on your compiler and long is 32-bit.\r |
| 292 | \r |
| 293 | _LZMA_NO_SYSTEM_SIZE_T - Define it if you don't want to use size_t type.\r |
| 294 | \r |
| 295 | \r |
| 296 | _7ZIP_PPMD_SUPPPORT - Define it if you don't want to support PPMD method in AMSI-C .7z decoder.\r |
| 297 | \r |
| 298 | \r |
| 299 | C++ LZMA Encoder/Decoder \r |
| 300 | ~~~~~~~~~~~~~~~~~~~~~~~~\r |
| 301 | C++ LZMA code use COM-like interfaces. So if you want to use it, \r |
| 302 | you can study basics of COM/OLE.\r |
| 303 | C++ LZMA code is just wrapper over ANSI-C code.\r |
| 304 | \r |
| 305 | \r |
| 306 | C++ Notes\r |
| 307 | ~~~~~~~~~~~~~~~~~~~~~~~~\r |
| 308 | If you use some C++ code folders in 7-Zip (for example, C++ code for .7z handling),\r |
| 309 | you must check that you correctly work with "new" operator.\r |
| 310 | 7-Zip can be compiled with MSVC 6.0 that doesn't throw "exception" from "new" operator.\r |
| 311 | So 7-Zip uses "CPP\Common\NewHandler.cpp" that redefines "new" operator:\r |
| 312 | operator new(size_t size)\r |
| 313 | {\r |
| 314 | void *p = ::malloc(size);\r |
| 315 | if (p == 0)\r |
| 316 | throw CNewException();\r |
| 317 | return p;\r |
| 318 | }\r |
| 319 | If you use MSCV that throws exception for "new" operator, you can compile without \r |
| 320 | "NewHandler.cpp". So standard exception will be used. Actually some code of \r |
| 321 | 7-Zip catches any exception in internal code and converts it to HRESULT code.\r |
| 322 | So you don't need to catch CNewException, if you call COM interfaces of 7-Zip.\r |
| 323 | \r |
| 324 | ---\r |
| 325 | \r |
| 326 | http://www.7-zip.org\r |
| 327 | http://www.7-zip.org/sdk.html\r |
| 328 | http://www.7-zip.org/support.html\r |