deps/libchdr/deps/lzma-22.01/lzma-specification.txt

   1 LZMA specification (DRAFT version)\r
   2 ----------------------------------\r
   3 \r
   4 Author: Igor Pavlov\r
   5 Date: 2015-06-14\r
   6 \r
   7 This specification defines the format of LZMA compressed data and lzma file format.\r
   8 \r
   9 Notation \r
  10 --------\r
  11 \r
  12 We use the syntax of C++ programming language.\r
  13 We use the following types in C++ code:\r
  14   unsigned - unsigned integer, at least 16 bits in size\r
  15   int      - signed integer, at least 16 bits in size\r
  16   UInt64   - 64-bit unsigned integer\r
  17   UInt32   - 32-bit unsigned integer\r
  18   UInt16   - 16-bit unsigned integer\r
  19   Byte     - 8-bit unsigned integer\r
  20   bool     - boolean type with two possible values: false, true\r
  21 \r
  22 \r
  23 lzma file format\r
  24 ================\r
  25 \r
  26 The lzma file contains the raw LZMA stream and the header with related properties.\r
  27 \r
  28 The files in that format use ".lzma" extension.\r
  29 \r
  30 The lzma file format layout:\r
  31 \r
  32 Offset Size Description\r
  33 \r
  34   0     1   LZMA model properties (lc, lp, pb) in encoded form\r
  35   1     4   Dictionary size (32-bit unsigned integer, little-endian)\r
  36   5     8   Uncompressed size (64-bit unsigned integer, little-endian)\r
  37  13         Compressed data (LZMA stream)\r
  38 \r
  39 LZMA properties:\r
  40 \r
  41     name  Range          Description\r
  42 \r
  43       lc  [0, 8]         the number of "literal context" bits\r
  44       lp  [0, 4]         the number of "literal pos" bits\r
  45       pb  [0, 4]         the number of "pos" bits\r
  46 dictSize  [0, 2^32 - 1]  the dictionary size \r
  47 \r
  48 The following code encodes LZMA properties:\r
  49 \r
  50 void EncodeProperties(Byte *properties)\r
  51 {\r
  52   properties[0] = (Byte)((pb * 5 + lp) * 9 + lc);\r
  53   Set_UInt32_LittleEndian(properties + 1, dictSize);\r
  54 }\r
  55 \r
  56 If the value of dictionary size in properties is smaller than (1 << 12),\r
  57 the LZMA decoder must set the dictionary size variable to (1 << 12).\r
  58 \r
  59 #define LZMA_DIC_MIN (1 << 12)\r
  60 \r
  61   unsigned lc, pb, lp;\r
  62   UInt32 dictSize;\r
  63   UInt32 dictSizeInProperties;\r
  64 \r
  65   void DecodeProperties(const Byte *properties)\r
  66   {\r
  67     unsigned d = properties[0];\r
  68     if (d >= (9 * 5 * 5))\r
  69       throw "Incorrect LZMA properties";\r
  70     lc = d % 9;\r
  71     d /= 9;\r
  72     pb = d / 5;\r
  73     lp = d % 5;\r
  74     dictSizeInProperties = 0;\r
  75     for (int i = 0; i < 4; i++)\r
  76       dictSizeInProperties |= (UInt32)properties[i + 1] << (8 * i);\r
  77     dictSize = dictSizeInProperties;\r
  78     if (dictSize < LZMA_DIC_MIN)\r
  79       dictSize = LZMA_DIC_MIN;\r
  80   }\r
  81 \r
  82 If "Uncompressed size" field contains ones in all 64 bits, it means that\r
  83 uncompressed size is unknown and there is the "end marker" in stream,\r
  84 that indicates the end of decoding point.\r
  85 In opposite case, if the value from "Uncompressed size" field is not\r
  86 equal to ((2^64) - 1), the LZMA stream decoding must be finished after\r
  87 specified number of bytes (Uncompressed size) is decoded. And if there \r
  88 is the "end marker", the LZMA decoder must read that marker also.\r
  89 \r
  90 \r
  91 The new scheme to encode LZMA properties\r
  92 ----------------------------------------\r
  93 \r
  94 If LZMA compression is used for some another format, it's recommended to\r
  95 use a new improved scheme to encode LZMA properties. That new scheme was\r
  96 used in xz format that uses the LZMA2 compression algorithm.\r
  97 The LZMA2 is a new compression algorithm that is based on the LZMA algorithm.\r
  98 \r
  99 The dictionary size in LZMA2 is encoded with just one byte and LZMA2 supports\r
 100 only reduced set of dictionary sizes:\r
 101   (2 << 11), (3 << 11),\r
 102   (2 << 12), (3 << 12),\r
 103   ...\r
 104   (2 << 30), (3 << 30),\r
 105   (2 << 31) - 1\r
 106 \r
 107 The dictionary size can be extracted from encoded value with the following code:\r
 108 \r
 109   dictSize = (p == 40) ? 0xFFFFFFFF : (((UInt32)2 | ((p) & 1)) << ((p) / 2 + 11));\r
 110 \r
 111 Also there is additional limitation (lc + lp <= 4) in LZMA2 for values of \r
 112 "lc" and "lp" properties:\r
 113 \r
 114   if (lc + lp > 4)\r
 115     throw "Unsupported properties: (lc + lp) > 4";\r
 116 \r
 117 There are some advantages for LZMA decoder with such (lc + lp) value\r
 118 limitation. It reduces the maximum size of tables allocated by decoder.\r
 119 And it reduces the complexity of initialization procedure, that can be \r
 120 important to keep high speed of decoding of big number of small LZMA streams.\r
 121 \r
 122 It's recommended to use that limitation (lc + lp <= 4) for any new format\r
 123 that uses LZMA compression. Note that the combinations of "lc" and "lp" \r
 124 parameters, where (lc + lp > 4), can provide significant improvement in \r
 125 compression ratio only in some rare cases.\r
 126 \r
 127 The LZMA properties can be encoded into two bytes in new scheme:\r
 128 \r
 129 Offset Size Description\r
 130 \r
 131   0     1   The dictionary size encoded with LZMA2 scheme\r
 132   1     1   LZMA model properties (lc, lp, pb) in encoded form\r
 133 \r
 134 \r
 135 The RAM usage \r
 136 =============\r
 137 \r
 138 The RAM usage for LZMA decoder is determined by the following parts:\r
 139 \r
 140 1) The Sliding Window (from 4 KiB to 4 GiB).\r
 141 2) The probability model counter arrays (arrays of 16-bit variables).\r
 142 3) Some additional state variables (about 10 variables of 32-bit integers).\r
 143 \r
 144 \r
 145 The RAM usage for Sliding Window\r
 146 --------------------------------\r
 147 \r
 148 There are two main scenarios of decoding:\r
 149 \r
 150 1) The decoding of full stream to one RAM buffer.\r
 151 \r
 152   If we decode full LZMA stream to one output buffer in RAM, the decoder \r
 153   can use that output buffer as sliding window. So the decoder doesn't \r
 154   need additional buffer allocated for sliding window.\r
 155 \r
 156 2) The decoding to some external storage.\r
 157 \r
 158   If we decode LZMA stream to external storage, the decoder must allocate\r
 159   the buffer for sliding window. The size of that buffer must be equal \r
 160   or larger than the value of dictionary size from properties of LZMA stream.\r
 161 \r
 162 In this specification we describe the code for decoding to some external\r
 163 storage. The optimized version of code for decoding of full stream to one\r
 164 output RAM buffer can require some minor changes in code.\r
 165 \r
 166 \r
 167 The RAM usage for the probability model counters\r
 168 ------------------------------------------------\r
 169 \r
 170 The size of the probability model counter arrays is calculated with the \r
 171 following formula:\r
 172 \r
 173 size_of_prob_arrays = 1846 + 768 * (1 << (lp + lc))\r
 174 \r
 175 Each probability model counter is 11-bit unsigned integer.\r
 176 If we use 16-bit integer variables (2-byte integers) for these probability \r
 177 model counters, the RAM usage required by probability model counter arrays \r
 178 can be estimated with the following formula:\r
 179 \r
 180   RAM = 4 KiB + 1.5 KiB * (1 << (lp + lc))\r
 181 \r
 182 For example, for default LZMA parameters (lp = 0 and lc = 3), the RAM usage is\r
 183 \r
 184   RAM_lc3_lp0 = 4 KiB + 1.5 KiB * 8 = 16 KiB\r
 185 \r
 186 The maximum RAM state usage is required for decoding the stream with lp = 4 \r
 187 and lc = 8:\r
 188 \r
 189   RAM_lc8_lp4 = 4 KiB + 1.5 KiB * 4096 = 6148 KiB\r
 190 \r
 191 If the decoder uses LZMA2's limited property condition \r
 192 (lc + lp <= 4), the RAM usage will be not larger than\r
 193 \r
 194   RAM_lc_lp_4 = 4 KiB + 1.5 KiB * 16 = 28 KiB\r
 195 \r
 196 \r
 197 The RAM usage for encoder\r
 198 -------------------------\r
 199 \r
 200 There are many variants for LZMA encoding code.\r
 201 These variants have different values for memory consumption.\r
 202 Note that memory consumption for LZMA Encoder can not be \r
 203 smaller than memory consumption of LZMA Decoder for same stream.\r
 204 \r
 205 The RAM usage required by modern effective implementation of \r
 206 LZMA Encoder can be estimated with the following formula:\r
 207 \r
 208   Encoder_RAM_Usage = 4 MiB + 11 * dictionarySize.\r
 209 \r
 210 But there are some modes of the encoder that require less memory.\r
 211 \r
 212 \r
 213 LZMA Decoding\r
 214 =============\r
 215 \r
 216 The LZMA compression algorithm uses LZ-based compression with Sliding Window\r
 217 and Range Encoding as entropy coding method.\r
 218 \r
 219 \r
 220 Sliding Window\r
 221 --------------\r
 222 \r
 223 LZMA uses Sliding Window compression similar to LZ77 algorithm.\r
 224 \r
 225 LZMA stream must be decoded to the sequence that consists\r
 226 of MATCHES and LITERALS:\r
 227   \r
 228   - a LITERAL is a 8-bit character (one byte).\r
 229     The decoder just puts that LITERAL to the uncompressed stream.\r
 230   \r
 231   - a MATCH is a pair of two numbers (DISTANCE-LENGTH pair).\r
 232     The decoder takes one byte exactly "DISTANCE" characters behind\r
 233     current position in the uncompressed stream and puts it to \r
 234     uncompressed stream. The decoder must repeat it "LENGTH" times.\r
 235 \r
 236 The "DISTANCE" can not be larger than dictionary size.\r
 237 And the "DISTANCE" can not be larger than the number of bytes in\r
 238 the uncompressed stream that were decoded before that match.\r
 239 \r
 240 In this specification we use cyclic buffer to implement Sliding Window\r
 241 for LZMA decoder:\r
 242 \r
 243 class COutWindow\r
 244 {\r
 245   Byte *Buf;\r
 246   UInt32 Pos;\r
 247   UInt32 Size;\r
 248   bool IsFull;\r
 249 \r
 250 public:\r
 251   unsigned TotalPos;\r
 252   COutStream OutStream;\r
 253 \r
 254   COutWindow(): Buf(NULL) {}\r
 255   ~COutWindow() { delete []Buf; }\r
 256  \r
 257   void Create(UInt32 dictSize)\r
 258   {\r
 259     Buf = new Byte[dictSize];\r
 260     Pos = 0;\r
 261     Size = dictSize;\r
 262     IsFull = false;\r
 263     TotalPos = 0;\r
 264   }\r
 265 \r
 266   void PutByte(Byte b)\r
 267   {\r
 268     TotalPos++;\r
 269     Buf[Pos++] = b;\r
 270     if (Pos == Size)\r
 271     {\r
 272       Pos = 0;\r
 273       IsFull = true;\r
 274     }\r
 275     OutStream.WriteByte(b);\r
 276   }\r
 277 \r
 278   Byte GetByte(UInt32 dist) const\r
 279   {\r
 280     return Buf[dist <= Pos ? Pos - dist : Size - dist + Pos];\r
 281   }\r
 282 \r
 283   void CopyMatch(UInt32 dist, unsigned len)\r
 284   {\r
 285     for (; len > 0; len--)\r
 286       PutByte(GetByte(dist));\r
 287   }\r
 288 \r
 289   bool CheckDistance(UInt32 dist) const\r
 290   {\r
 291     return dist <= Pos || IsFull;\r
 292   }\r
 293 \r
 294   bool IsEmpty() const\r
 295   {\r
 296     return Pos == 0 && !IsFull;\r
 297   }\r
 298 };\r
 299 \r
 300 \r
 301 In another implementation it's possible to use one buffer that contains \r
 302 Sliding Window and the whole data stream after uncompressing.\r
 303 \r
 304 \r
 305 Range Decoder\r
 306 -------------\r
 307 \r
 308 LZMA algorithm uses Range Encoding (1) as entropy coding method.\r
 309 \r
 310 LZMA stream contains just one very big number in big-endian encoding.\r
 311 LZMA decoder uses the Range Decoder to extract a sequence of binary\r
 312 symbols from that big number.\r
 313 \r
 314 The state of the Range Decoder:\r
 315 \r
 316 struct CRangeDecoder\r
 317 {\r
 318   UInt32 Range; \r
 319   UInt32 Code;\r
 320   InputStream *InStream;\r
 321 \r
 322   bool Corrupted;\r
 323 }\r
 324 \r
 325 The notes about UInt32 type for the "Range" and "Code" variables:\r
 326 \r
 327   It's possible to use 64-bit (unsigned or signed) integer type\r
 328   for the "Range" and the "Code" variables instead of 32-bit unsigned,\r
 329   but some additional code must be used to truncate the values to \r
 330   low 32-bits after some operations.\r
 331 \r
 332   If the programming language does not support 32-bit unsigned integer type \r
 333   (like in case of JAVA language), it's possible to use 32-bit signed integer, \r
 334   but some code must be changed. For example, it's required to change the code\r
 335   that uses comparison operations for UInt32 variables in this specification.\r
 336 \r
 337 The Range Decoder can be in some states that can be treated as \r
 338 "Corruption" in LZMA stream. The Range Decoder uses the variable "Corrupted":\r
 339 \r
 340   (Corrupted == false), if the Range Decoder has not detected any corruption.\r
 341   (Corrupted == true), if the Range Decoder has detected some corruption.\r
 342 \r
 343 The reference LZMA Decoder ignores the value of the "Corrupted" variable.\r
 344 So it continues to decode the stream, even if the corruption can be detected\r
 345 in the Range Decoder. To provide the full compatibility with output of the \r
 346 reference LZMA Decoder, another LZMA Decoder implementations must also \r
 347 ignore the value of the "Corrupted" variable.\r
 348 \r
 349 The LZMA Encoder is required to create only such LZMA streams, that will not \r
 350 lead the Range Decoder to states, where the "Corrupted" variable is set to true.\r
 351 \r
 352 The Range Decoder reads first 5 bytes from input stream to initialize\r
 353 the state:\r
 354 \r
 355 bool CRangeDecoder::Init()\r
 356 {\r
 357   Corrupted = false;\r
 358   Range = 0xFFFFFFFF;\r
 359   Code = 0;\r
 360 \r
 361   Byte b = InStream->ReadByte();\r
 362   \r
 363   for (int i = 0; i < 4; i++)\r
 364     Code = (Code << 8) | InStream->ReadByte();\r
 365   \r
 366   if (b != 0 || Code == Range)\r
 367     Corrupted = true;\r
 368   return b == 0;\r
 369 }\r
 370 \r
 371 The LZMA Encoder always writes ZERO in initial byte of compressed stream.\r
 372 That scheme allows to simplify the code of the Range Encoder in the \r
 373 LZMA Encoder. If initial byte is not equal to ZERO, the LZMA Decoder must\r
 374 stop decoding and report error.\r
 375 \r
 376 After the last bit of data was decoded by Range Decoder, the value of the\r
 377 "Code" variable must be equal to 0. The LZMA Decoder must check it by \r
 378 calling the IsFinishedOK() function:\r
 379 \r
 380   bool IsFinishedOK() const { return Code == 0; }\r
 381 \r
 382 If there is corruption in data stream, there is big probability that\r
 383 the "Code" value will be not equal to 0 in the Finish() function. So that\r
 384 check in the IsFinishedOK() function provides very good feature for \r
 385 corruption detection.\r
 386 \r
 387 The value of the "Range" variable before each bit decoding can not be smaller \r
 388 than ((UInt32)1 << 24). The Normalize() function keeps the "Range" value in \r
 389 described range.\r
 390 \r
 391 #define kTopValue ((UInt32)1 << 24)\r
 392 \r
 393 void CRangeDecoder::Normalize()\r
 394 {\r
 395   if (Range < kTopValue)\r
 396   {\r
 397     Range <<= 8;\r
 398     Code = (Code << 8) | InStream->ReadByte();\r
 399   }\r
 400 }\r
 401 \r
 402 Notes: if the size of the "Code" variable is larger than 32 bits, it's\r
 403 required to keep only low 32 bits of the "Code" variable after the change\r
 404 in Normalize() function.\r
 405 \r
 406 If the LZMA Stream is not corrupted, the value of the "Code" variable is\r
 407 always smaller than value of the "Range" variable.\r
 408 But the Range Decoder ignores some types of corruptions, so the value of\r
 409 the "Code" variable can be equal or larger than value of the "Range" variable\r
 410 for some "Corrupted" archives.\r
 411 \r
 412 \r
 413 LZMA uses Range Encoding only with binary symbols of two types:\r
 414   1) binary symbols with fixed and equal probabilities (direct bits)\r
 415   2) binary symbols with predicted probabilities\r
 416 \r
 417 The DecodeDirectBits() function decodes the sequence of direct bits:\r
 418 \r
 419 UInt32 CRangeDecoder::DecodeDirectBits(unsigned numBits)\r
 420 {\r
 421   UInt32 res = 0;\r
 422   do\r
 423   {\r
 424     Range >>= 1;\r
 425     Code -= Range;\r
 426     UInt32 t = 0 - ((UInt32)Code >> 31);\r
 427     Code += Range & t;\r
 428     \r
 429     if (Code == Range)\r
 430       Corrupted = true;\r
 431     \r
 432     Normalize();\r
 433     res <<= 1;\r
 434     res += t + 1;\r
 435   }\r
 436   while (--numBits);\r
 437   return res;\r
 438 }\r
 439 \r
 440 \r
 441 The Bit Decoding with Probability Model\r
 442 ---------------------------------------\r
 443 \r
 444 The task of Bit Probability Model is to estimate probabilities of binary\r
 445 symbols. And then it provides the Range Decoder with that information.\r
 446 The better prediction provides better compression ratio.\r
 447 The Bit Probability Model uses statistical data of previous decoded\r
 448 symbols.\r
 449 \r
 450 That estimated probability is presented as 11-bit unsigned integer value\r
 451 that represents the probability of symbol "0".\r
 452 \r
 453 #define kNumBitModelTotalBits 11\r
 454 \r
 455 Mathematical probabilities can be presented with the following formulas:\r
 456      probability(symbol_0) = prob / 2048.\r
 457      probability(symbol_1) =  1 - Probability(symbol_0) =  \r
 458                            =  1 - prob / 2048 =  \r
 459                            =  (2048 - prob) / 2048\r
 460 where the "prob" variable contains 11-bit integer probability counter.\r
 461 \r
 462 It's recommended to use 16-bit unsigned integer type, to store these 11-bit\r
 463 probability values:\r
 464 \r
 465 typedef UInt16 CProb;\r
 466 \r
 467 Each probability value must be initialized with value ((1 << 11) / 2),\r
 468 that represents the state, where probabilities of symbols 0 and 1 \r
 469 are equal to 0.5:\r
 470 \r
 471 #define PROB_INIT_VAL ((1 << kNumBitModelTotalBits) / 2)\r
 472 \r
 473 The INIT_PROBS macro is used to initialize the array of CProb variables:\r
 474 \r
 475 #define INIT_PROBS(p) \\r
 476  { for (unsigned i = 0; i < sizeof(p) / sizeof(p[0]); i++) p[i] = PROB_INIT_VAL; }\r
 477 \r
 478 \r
 479 The DecodeBit() function decodes one bit.\r
 480 The LZMA decoder provides the pointer to CProb variable that contains \r
 481 information about estimated probability for symbol 0 and the Range Decoder \r
 482 updates that CProb variable after decoding. The Range Decoder increases \r
 483 estimated probability of the symbol that was decoded:\r
 484 \r
 485 #define kNumMoveBits 5\r
 486 \r
 487 unsigned CRangeDecoder::DecodeBit(CProb *prob)\r
 488 {\r
 489   unsigned v = *prob;\r
 490   UInt32 bound = (Range >> kNumBitModelTotalBits) * v;\r
 491   unsigned symbol;\r
 492   if (Code < bound)\r
 493   {\r
 494     v += ((1 << kNumBitModelTotalBits) - v) >> kNumMoveBits;\r
 495     Range = bound;\r
 496     symbol = 0;\r
 497   }\r
 498   else\r
 499   {\r
 500     v -= v >> kNumMoveBits;\r
 501     Code -= bound;\r
 502     Range -= bound;\r
 503     symbol = 1;\r
 504   }\r
 505   *prob = (CProb)v;\r
 506   Normalize();\r
 507   return symbol;\r
 508 }\r
 509 \r
 510 \r
 511 The Binary Tree of bit model counters\r
 512 -------------------------------------\r
 513 \r
 514 LZMA uses a tree of Bit model variables to decode symbol that needs\r
 515 several bits for storing. There are two versions of such trees in LZMA:\r
 516   1) the tree that decodes bits from high bit to low bit (the normal scheme).\r
 517   2) the tree that decodes bits from low bit to high bit (the reverse scheme).\r
 518 \r
 519 Each binary tree structure supports different size of decoded symbol\r
 520 (the size of binary sequence that contains value of symbol).\r
 521 If that size of decoded symbol is "NumBits" bits, the tree structure \r
 522 uses the array of (2 << NumBits) counters of CProb type. \r
 523 But only ((2 << NumBits) - 1) items are used by encoder and decoder.\r
 524 The first item (the item with index equal to 0) in array is unused.\r
 525 That scheme with unused array's item allows to simplify the code.\r
 526 \r
 527 unsigned BitTreeReverseDecode(CProb *probs, unsigned numBits, CRangeDecoder *rc)\r
 528 {\r
 529   unsigned m = 1;\r
 530   unsigned symbol = 0;\r
 531   for (unsigned i = 0; i < numBits; i++)\r
 532   {\r
 533     unsigned bit = rc->DecodeBit(&probs[m]);\r
 534     m <<= 1;\r
 535     m += bit;\r
 536     symbol |= (bit << i);\r
 537   }\r
 538   return symbol;\r
 539 }\r
 540 \r
 541 template <unsigned NumBits>\r
 542 class CBitTreeDecoder\r
 543 {\r
 544   CProb Probs[(unsigned)1 << NumBits];\r
 545 \r
 546 public:\r
 547 \r
 548   void Init()\r
 549   {\r
 550     INIT_PROBS(Probs);\r
 551   }\r
 552 \r
 553   unsigned Decode(CRangeDecoder *rc)\r
 554   {\r
 555     unsigned m = 1;\r
 556     for (unsigned i = 0; i < NumBits; i++)\r
 557       m = (m << 1) + rc->DecodeBit(&Probs[m]);\r
 558     return m - ((unsigned)1 << NumBits);\r
 559   }\r
 560 \r
 561   unsigned ReverseDecode(CRangeDecoder *rc)\r
 562   {\r
 563     return BitTreeReverseDecode(Probs, NumBits, rc);\r
 564   }\r
 565 };\r
 566 \r
 567 \r
 568 LZ part of LZMA \r
 569 ---------------\r
 570 \r
 571 LZ part of LZMA describes details about the decoding of MATCHES and LITERALS.\r
 572 \r
 573 \r
 574 The Literal Decoding\r
 575 --------------------\r
 576 \r
 577 The LZMA Decoder uses (1 << (lc + lp)) tables with CProb values, where \r
 578 each table contains 0x300 CProb values:\r
 579 \r
 580   CProb *LitProbs;\r
 581 \r
 582   void CreateLiterals()\r
 583   {\r
 584     LitProbs = new CProb[(UInt32)0x300 << (lc + lp)];\r
 585   }\r
 586   \r
 587   void InitLiterals()\r
 588   {\r
 589     UInt32 num = (UInt32)0x300 << (lc + lp);\r
 590     for (UInt32 i = 0; i < num; i++)\r
 591       LitProbs[i] = PROB_INIT_VAL;\r
 592   }\r
 593 \r
 594 To select the table for decoding it uses the context that consists of\r
 595 (lc) high bits from previous literal and (lp) low bits from value that\r
 596 represents current position in outputStream.\r
 597 \r
 598 If (State > 7), the Literal Decoder also uses "matchByte" that represents \r
 599 the byte in OutputStream at position the is the DISTANCE bytes before \r
 600 current position, where the DISTANCE is the distance in DISTANCE-LENGTH pair\r
 601 of latest decoded match.\r
 602 \r
 603 The following code decodes one literal and puts it to Sliding Window buffer:\r
 604 \r
 605   void DecodeLiteral(unsigned state, UInt32 rep0)\r
 606   {\r
 607     unsigned prevByte = 0;\r
 608     if (!OutWindow.IsEmpty())\r
 609       prevByte = OutWindow.GetByte(1);\r
 610     \r
 611     unsigned symbol = 1;\r
 612     unsigned litState = ((OutWindow.TotalPos & ((1 << lp) - 1)) << lc) + (prevByte >> (8 - lc));\r
 613     CProb *probs = &LitProbs[(UInt32)0x300 * litState];\r
 614     \r
 615     if (state >= 7)\r
 616     {\r
 617       unsigned matchByte = OutWindow.GetByte(rep0 + 1);\r
 618       do\r
 619       {\r
 620         unsigned matchBit = (matchByte >> 7) & 1;\r
 621         matchByte <<= 1;\r
 622         unsigned bit = RangeDec.DecodeBit(&probs[((1 + matchBit) << 8) + symbol]);\r
 623         symbol = (symbol << 1) | bit;\r
 624         if (matchBit != bit)\r
 625           break;\r
 626       }\r
 627       while (symbol < 0x100);\r
 628     }\r
 629     while (symbol < 0x100)\r
 630       symbol = (symbol << 1) | RangeDec.DecodeBit(&probs[symbol]);\r
 631     OutWindow.PutByte((Byte)(symbol - 0x100));\r
 632   }\r
 633 \r
 634 \r
 635 The match length decoding\r
 636 -------------------------\r
 637 \r
 638 The match length decoder returns normalized (zero-based value) \r
 639 length of match. That value can be converted to real length of the match \r
 640 with the following code:\r
 641 \r
 642 #define kMatchMinLen 2\r
 643 \r
 644     matchLen = len + kMatchMinLen;\r
 645 \r
 646 The match length decoder can return the values from 0 to 271.\r
 647 And the corresponded real match length values can be in the range \r
 648 from 2 to 273.\r
 649 \r
 650 The following scheme is used for the match length encoding:\r
 651 \r
 652   Binary encoding    Binary Tree structure    Zero-based match length \r
 653   sequence                                    (binary + decimal):\r
 654 \r
 655   0 xxx              LowCoder[posState]       xxx\r
 656   1 0 yyy            MidCoder[posState]       yyy + 8\r
 657   1 1 zzzzzzzz       HighCoder                zzzzzzzz + 16\r
 658 \r
 659 LZMA uses bit model variable "Choice" to decode the first selection bit.\r
 660 \r
 661 If the first selection bit is equal to 0, the decoder uses binary tree \r
 662   LowCoder[posState] to decode 3-bit zero-based match length (xxx).\r
 663 \r
 664 If the first selection bit is equal to 1, the decoder uses bit model \r
 665   variable "Choice2" to decode the second selection bit.\r
 666 \r
 667   If the second selection bit is equal to 0, the decoder uses binary tree \r
 668     MidCoder[posState] to decode 3-bit "yyy" value, and zero-based match\r
 669     length is equal to (yyy + 8).\r
 670 \r
 671   If the second selection bit is equal to 1, the decoder uses binary tree \r
 672     HighCoder to decode 8-bit "zzzzzzzz" value, and zero-based \r
 673     match length is equal to (zzzzzzzz + 16).\r
 674 \r
 675 LZMA uses "posState" value as context to select the binary tree \r
 676 from LowCoder and MidCoder binary tree arrays:\r
 677 \r
 678     unsigned posState = OutWindow.TotalPos & ((1 << pb) - 1);\r
 679 \r
 680 The full code of the length decoder:\r
 681 \r
 682 class CLenDecoder\r
 683 {\r
 684   CProb Choice;\r
 685   CProb Choice2;\r
 686   CBitTreeDecoder<3> LowCoder[1 << kNumPosBitsMax];\r
 687   CBitTreeDecoder<3> MidCoder[1 << kNumPosBitsMax];\r
 688   CBitTreeDecoder<8> HighCoder;\r
 689 \r
 690 public:\r
 691 \r
 692   void Init()\r
 693   {\r
 694     Choice = PROB_INIT_VAL;\r
 695     Choice2 = PROB_INIT_VAL;\r
 696     HighCoder.Init();\r
 697     for (unsigned i = 0; i < (1 << kNumPosBitsMax); i++)\r
 698     {\r
 699       LowCoder[i].Init();\r
 700       MidCoder[i].Init();\r
 701     }\r
 702   }\r
 703 \r
 704   unsigned Decode(CRangeDecoder *rc, unsigned posState)\r
 705   {\r
 706     if (rc->DecodeBit(&Choice) == 0)\r
 707       return LowCoder[posState].Decode(rc);\r
 708     if (rc->DecodeBit(&Choice2) == 0)\r
 709       return 8 + MidCoder[posState].Decode(rc);\r
 710     return 16 + HighCoder.Decode(rc);\r
 711   }\r
 712 };\r
 713 \r
 714 The LZMA decoder uses two instances of CLenDecoder class.\r
 715 The first instance is for the matches of "Simple Match" type,\r
 716 and the second instance is for the matches of "Rep Match" type:\r
 717 \r
 718   CLenDecoder LenDecoder;\r
 719   CLenDecoder RepLenDecoder;\r
 720 \r
 721 \r
 722 The match distance decoding\r
 723 ---------------------------\r
 724 \r
 725 LZMA supports dictionary sizes up to 4 GiB minus 1.\r
 726 The value of match distance (decoded by distance decoder) can be \r
 727 from 1 to 2^32. But the distance value that is equal to 2^32 is used to\r
 728 indicate the "End of stream" marker. So real largest match distance \r
 729 that is used for LZ-window match is (2^32 - 1).\r
 730 \r
 731 LZMA uses normalized match length (zero-based length) \r
 732 to calculate the context state "lenState" do decode the distance value:\r
 733 \r
 734 #define kNumLenToPosStates 4\r
 735 \r
 736     unsigned lenState = len;\r
 737     if (lenState > kNumLenToPosStates - 1)\r
 738       lenState = kNumLenToPosStates - 1;\r
 739 \r
 740 The distance decoder returns the "dist" value that is zero-based value \r
 741 of match distance. The real match distance can be calculated with the\r
 742 following code:\r
 743   \r
 744   matchDistance = dist + 1; \r
 745 \r
 746 The state of the distance decoder and the initialization code: \r
 747 \r
 748   #define kEndPosModelIndex 14\r
 749   #define kNumFullDistances (1 << (kEndPosModelIndex >> 1))\r
 750   #define kNumAlignBits 4\r
 751 \r
 752   CBitTreeDecoder<6> PosSlotDecoder[kNumLenToPosStates];\r
 753   CProb PosDecoders[1 + kNumFullDistances - kEndPosModelIndex];\r
 754   CBitTreeDecoder<kNumAlignBits> AlignDecoder;\r
 755 \r
 756   void InitDist()\r
 757   {\r
 758     for (unsigned i = 0; i < kNumLenToPosStates; i++)\r
 759       PosSlotDecoder[i].Init();\r
 760     AlignDecoder.Init();\r
 761     INIT_PROBS(PosDecoders);\r
 762   }\r
 763 \r
 764 At first stage the distance decoder decodes 6-bit "posSlot" value with bit\r
 765 tree decoder from PosSlotDecoder array. It's possible to get 2^6=64 different \r
 766 "posSlot" values.\r
 767 \r
 768     unsigned posSlot = PosSlotDecoder[lenState].Decode(&RangeDec);\r
 769 \r
 770 The encoding scheme for distance value is shown in the following table:\r
 771 \r
 772 posSlot (decimal) /\r
 773       zero-based distance (binary)\r
 774  0    0\r
 775  1    1\r
 776  2    10\r
 777  3    11\r
 778 \r
 779  4    10 x\r
 780  5    11 x\r
 781  6    10 xx\r
 782  7    11 xx\r
 783  8    10 xxx\r
 784  9    11 xxx\r
 785 10    10 xxxx\r
 786 11    11 xxxx\r
 787 12    10 xxxxx\r
 788 13    11 xxxxx\r
 789 \r
 790 14    10 yy zzzz\r
 791 15    11 yy zzzz\r
 792 16    10 yyy zzzz\r
 793 17    11 yyy zzzz\r
 794 ...\r
 795 62    10 yyyyyyyyyyyyyyyyyyyyyyyyyy zzzz\r
 796 63    11 yyyyyyyyyyyyyyyyyyyyyyyyyy zzzz\r
 797 \r
 798 where \r
 799   "x ... x" means the sequence of binary symbols encoded with binary tree and \r
 800       "Reverse" scheme. It uses separated binary tree for each posSlot from 4 to 13.\r
 801   "y" means direct bit encoded with range coder.\r
 802   "zzzz" means the sequence of four binary symbols encoded with binary\r
 803       tree with "Reverse" scheme, where one common binary tree "AlignDecoder"\r
 804       is used for all posSlot values.\r
 805 \r
 806 If (posSlot < 4), the "dist" value is equal to posSlot value.\r
 807 \r
 808 If (posSlot >= 4), the decoder uses "posSlot" value to calculate the value of\r
 809   the high bits of "dist" value and the number of the low bits.\r
 810 \r
 811   If (4 <= posSlot < kEndPosModelIndex), the decoder uses bit tree decoders.\r
 812     (one separated bit tree decoder per one posSlot value) and "Reverse" scheme.\r
 813     In this implementation we use one CProb array "PosDecoders" that contains \r
 814     all CProb variables for all these bit decoders.\r
 815   \r
 816   if (posSlot >= kEndPosModelIndex), the middle bits are decoded as direct \r
 817     bits from RangeDecoder and the low 4 bits are decoded with a bit tree \r
 818     decoder "AlignDecoder" with "Reverse" scheme.\r
 819 \r
 820 The code to decode zero-based match distance:\r
 821   \r
 822   unsigned DecodeDistance(unsigned len)\r
 823   {\r
 824     unsigned lenState = len;\r
 825     if (lenState > kNumLenToPosStates - 1)\r
 826       lenState = kNumLenToPosStates - 1;\r
 827     \r
 828     unsigned posSlot = PosSlotDecoder[lenState].Decode(&RangeDec);\r
 829     if (posSlot < 4)\r
 830       return posSlot;\r
 831     \r
 832     unsigned numDirectBits = (unsigned)((posSlot >> 1) - 1);\r
 833     UInt32 dist = ((2 | (posSlot & 1)) << numDirectBits);\r
 834     if (posSlot < kEndPosModelIndex)\r
 835       dist += BitTreeReverseDecode(PosDecoders + dist - posSlot, numDirectBits, &RangeDec);\r
 836     else\r
 837     {\r
 838       dist += RangeDec.DecodeDirectBits(numDirectBits - kNumAlignBits) << kNumAlignBits;\r
 839       dist += AlignDecoder.ReverseDecode(&RangeDec);\r
 840     }\r
 841     return dist;\r
 842   }\r
 843 \r
 844 \r
 845 \r
 846 LZMA Decoding modes\r
 847 -------------------\r
 848 \r
 849 There are 2 types of LZMA streams:\r
 850 \r
 851 1) The stream with "End of stream" marker.\r
 852 2) The stream without "End of stream" marker.\r
 853 \r
 854 And the LZMA Decoder supports 3 modes of decoding:\r
 855 \r
 856 1) The unpack size is undefined. The LZMA decoder stops decoding after \r
 857    getting "End of stream" marker. \r
 858    The input variables for that case:\r
 859     \r
 860       markerIsMandatory = true\r
 861       unpackSizeDefined = false\r
 862       unpackSize contains any value\r
 863 \r
 864 2) The unpack size is defined and LZMA decoder supports both variants, \r
 865    where the stream can contain "End of stream" marker or the stream is\r
 866    finished without "End of stream" marker. The LZMA decoder must detect \r
 867    any of these situations.\r
 868    The input variables for that case:\r
 869     \r
 870       markerIsMandatory = false\r
 871       unpackSizeDefined = true\r
 872       unpackSize contains unpack size\r
 873 \r
 874 3) The unpack size is defined and the LZMA stream must contain \r
 875    "End of stream" marker\r
 876    The input variables for that case:\r
 877     \r
 878       markerIsMandatory = true\r
 879       unpackSizeDefined = true\r
 880       unpackSize contains unpack size\r
 881 \r
 882 \r
 883 The main loop of decoder\r
 884 ------------------------\r
 885 \r
 886 The main loop of LZMA decoder:\r
 887 \r
 888 Initialize the LZMA state.\r
 889 loop\r
 890 {\r
 891   // begin of loop\r
 892   Check "end of stream" conditions.\r
 893   Decode Type of MATCH / LITERAL. \r
 894     If it's LITERAL, decode LITERAL value and put the LITERAL to Window.\r
 895     If it's MATCH, decode the length of match and the match distance. \r
 896         Check error conditions, check end of stream conditions and copy\r
 897         the sequence of match bytes from sliding window to current position\r
 898         in window.\r
 899   Go to begin of loop\r
 900 }\r
 901 \r
 902 The reference implementation of LZMA decoder uses "unpackSize" variable\r
 903 to keep the number of remaining bytes in output stream. So it reduces \r
 904 "unpackSize" value after each decoded LITERAL or MATCH.\r
 905 \r
 906 The following code contains the "end of stream" condition check at the start\r
 907 of the loop:\r
 908 \r
 909     if (unpackSizeDefined && unpackSize == 0 && !markerIsMandatory)\r
 910       if (RangeDec.IsFinishedOK())\r
 911         return LZMA_RES_FINISHED_WITHOUT_MARKER;\r
 912 \r
 913 LZMA uses three types of matches:\r
 914 \r
 915 1) "Simple Match" -     the match with distance value encoded with bit models.\r
 916 \r
 917 2) "Rep Match" -        the match that uses the distance from distance\r
 918                         history table.\r
 919 \r
 920 3) "Short Rep Match" -  the match of single byte length, that uses the latest \r
 921                         distance from distance history table.\r
 922 \r
 923 The LZMA decoder keeps the history of latest 4 match distances that were used \r
 924 by decoder. That set of 4 variables contains zero-based match distances and \r
 925 these variables are initialized with zero values:\r
 926 \r
 927   UInt32 rep0 = 0, rep1 = 0, rep2 = 0, rep3 = 0;\r
 928 \r
 929 The LZMA decoder uses binary model variables to select type of MATCH or LITERAL:\r
 930 \r
 931 #define kNumStates 12\r
 932 #define kNumPosBitsMax 4\r
 933 \r
 934   CProb IsMatch[kNumStates << kNumPosBitsMax];\r
 935   CProb IsRep[kNumStates];\r
 936   CProb IsRepG0[kNumStates];\r
 937   CProb IsRepG1[kNumStates];\r
 938   CProb IsRepG2[kNumStates];\r
 939   CProb IsRep0Long[kNumStates << kNumPosBitsMax];\r
 940 \r
 941 The decoder uses "state" variable value to select exact variable \r
 942 from "IsRep", "IsRepG0", "IsRepG1" and "IsRepG2" arrays.\r
 943 The "state" variable can get the value from 0 to 11.\r
 944 Initial value for "state" variable is zero:\r
 945 \r
 946   unsigned state = 0;\r
 947 \r
 948 The "state" variable is updated after each LITERAL or MATCH with one of the\r
 949 following functions:\r
 950 \r
 951 unsigned UpdateState_Literal(unsigned state)\r
 952 {\r
 953   if (state < 4) return 0;\r
 954   else if (state < 10) return state - 3;\r
 955   else return state - 6;\r
 956 }\r
 957 unsigned UpdateState_Match   (unsigned state) { return state < 7 ? 7 : 10; }\r
 958 unsigned UpdateState_Rep     (unsigned state) { return state < 7 ? 8 : 11; }\r
 959 unsigned UpdateState_ShortRep(unsigned state) { return state < 7 ? 9 : 11; }\r
 960 \r
 961 The decoder calculates "state2" variable value to select exact variable from \r
 962 "IsMatch" and "IsRep0Long" arrays:\r
 963 \r
 964 unsigned posState = OutWindow.TotalPos & ((1 << pb) - 1);\r
 965 unsigned state2 = (state << kNumPosBitsMax) + posState;\r
 966 \r
 967 The decoder uses the following code flow scheme to select exact \r
 968 type of LITERAL or MATCH:\r
 969 \r
 970 IsMatch[state2] decode\r
 971   0 - the Literal\r
 972   1 - the Match\r
 973     IsRep[state] decode\r
 974       0 - Simple Match\r
 975       1 - Rep Match\r
 976         IsRepG0[state] decode\r
 977           0 - the distance is rep0\r
 978             IsRep0Long[state2] decode\r
 979               0 - Short Rep Match\r
 980               1 - Rep Match 0\r
 981           1 - \r
 982             IsRepG1[state] decode\r
 983               0 - Rep Match 1\r
 984               1 - \r
 985                 IsRepG2[state] decode\r
 986                   0 - Rep Match 2\r
 987                   1 - Rep Match 3\r
 988 \r
 989 \r
 990 LITERAL symbol\r
 991 --------------\r
 992 If the value "0" was decoded with IsMatch[state2] decoding, we have "LITERAL" type.\r
 993 \r
 994 At first the LZMA decoder must check that it doesn't exceed \r
 995 specified uncompressed size:\r
 996 \r
 997       if (unpackSizeDefined && unpackSize == 0)\r
 998         return LZMA_RES_ERROR;\r
 999 \r
1000 Then it decodes literal value and puts it to sliding window:\r
1001 \r
1002       DecodeLiteral(state, rep0);\r
1003 \r
1004 Then the decoder must update the "state" value and "unpackSize" value;\r
1005 \r
1006       state = UpdateState_Literal(state);\r
1007       unpackSize--;\r
1008 \r
1009 Then the decoder must go to the begin of main loop to decode next Match or Literal.\r
1010 \r
1011 \r
1012 Simple Match\r
1013 ------------\r
1014 \r
1015 If the value "1" was decoded with IsMatch[state2] decoding,\r
1016 we have the "Simple Match" type.\r
1017 \r
1018 The distance history table is updated with the following scheme:\r
1019     \r
1020       rep3 = rep2;\r
1021       rep2 = rep1;\r
1022       rep1 = rep0;\r
1023 \r
1024 The zero-based length is decoded with "LenDecoder":\r
1025 \r
1026       len = LenDecoder.Decode(&RangeDec, posState);\r
1027 \r
1028 The state is update with UpdateState_Match function:\r
1029 \r
1030       state = UpdateState_Match(state);\r
1031 \r
1032 and the new "rep0" value is decoded with DecodeDistance:\r
1033 \r
1034       rep0 = DecodeDistance(len);\r
1035 \r
1036 That "rep0" will be used as zero-based distance for current match.\r
1037 \r
1038 If the value of "rep0" is equal to 0xFFFFFFFF, it means that we have \r
1039 "End of stream" marker, so we can stop decoding and check finishing \r
1040 condition in Range Decoder:\r
1041 \r
1042       if (rep0 == 0xFFFFFFFF)\r
1043         return RangeDec.IsFinishedOK() ?\r
1044             LZMA_RES_FINISHED_WITH_MARKER :\r
1045             LZMA_RES_ERROR;\r
1046 \r
1047 If uncompressed size is defined, LZMA decoder must check that it doesn't \r
1048 exceed that specified uncompressed size:\r
1049 \r
1050       if (unpackSizeDefined && unpackSize == 0)\r
1051         return LZMA_RES_ERROR;\r
1052 \r
1053 Also the decoder must check that "rep0" value is not larger than dictionary size\r
1054 and is not larger than the number of already decoded bytes:\r
1055 \r
1056       if (rep0 >= dictSize || !OutWindow.CheckDistance(rep0))\r
1057         return LZMA_RES_ERROR;\r
1058 \r
1059 Then the decoder must copy match bytes as described in \r
1060 "The match symbols copying" section.\r
1061 \r
1062 \r
1063 Rep Match\r
1064 ---------\r
1065 \r
1066 If the LZMA decoder has decoded the value "1" with IsRep[state] variable,\r
1067 we have "Rep Match" type.\r
1068 \r
1069 At first the LZMA decoder must check that it doesn't exceed \r
1070 specified uncompressed size:\r
1071 \r
1072       if (unpackSizeDefined && unpackSize == 0)\r
1073         return LZMA_RES_ERROR;\r
1074 \r
1075 Also the decoder must return error, if the LZ window is empty:\r
1076 \r
1077       if (OutWindow.IsEmpty())\r
1078         return LZMA_RES_ERROR;\r
1079 \r
1080 If the match type is "Rep Match", the decoder uses one of the 4 variables of\r
1081 distance history table to get the value of distance for current match.\r
1082 And there are 4 corresponding ways of decoding flow. \r
1083 \r
1084 The decoder updates the distance history with the following scheme \r
1085 depending from type of match:\r
1086 \r
1087 - "Rep Match 0" or "Short Rep Match":\r
1088       ; LZMA doesn't update the distance history    \r
1089 \r
1090 - "Rep Match 1":\r
1091       UInt32 dist = rep1;\r
1092       rep1 = rep0;\r
1093       rep0 = dist;\r
1094 \r
1095 - "Rep Match 2":\r
1096       UInt32 dist = rep2;\r
1097       rep2 = rep1;\r
1098       rep1 = rep0;\r
1099       rep0 = dist;\r
1100 \r
1101 - "Rep Match 3":\r
1102       UInt32 dist = rep3;\r
1103       rep3 = rep2;\r
1104       rep2 = rep1;\r
1105       rep1 = rep0;\r
1106       rep0 = dist;\r
1107 \r
1108 Then the decoder decodes exact subtype of "Rep Match" using "IsRepG0", "IsRep0Long",\r
1109 "IsRepG1", "IsRepG2".\r
1110 \r
1111 If the subtype is "Short Rep Match", the decoder updates the state, puts \r
1112 the one byte from window to current position in window and goes to next \r
1113 MATCH/LITERAL symbol (the begin of main loop):\r
1114 \r
1115           state = UpdateState_ShortRep(state);\r
1116           OutWindow.PutByte(OutWindow.GetByte(rep0 + 1));\r
1117           unpackSize--;\r
1118           continue;\r
1119 \r
1120 In other cases (Rep Match 0/1/2/3), it decodes the zero-based \r
1121 length of match with "RepLenDecoder" decoder:\r
1122 \r
1123       len = RepLenDecoder.Decode(&RangeDec, posState);\r
1124 \r
1125 Then it updates the state:\r
1126 \r
1127       state = UpdateState_Rep(state);\r
1128 \r
1129 Then the decoder must copy match bytes as described in \r
1130 "The Match symbols copying" section.\r
1131 \r
1132 \r
1133 The match symbols copying\r
1134 -------------------------\r
1135 \r
1136 If we have the match (Simple Match or Rep Match 0/1/2/3), the decoder must\r
1137 copy the sequence of bytes with calculated match distance and match length.\r
1138 If uncompressed size is defined, LZMA decoder must check that it doesn't \r
1139 exceed that specified uncompressed size:\r
1140 \r
1141     len += kMatchMinLen;\r
1142     bool isError = false;\r
1143     if (unpackSizeDefined && unpackSize < len)\r
1144     {\r
1145       len = (unsigned)unpackSize;\r
1146       isError = true;\r
1147     }\r
1148     OutWindow.CopyMatch(rep0 + 1, len);\r
1149     unpackSize -= len;\r
1150     if (isError)\r
1151       return LZMA_RES_ERROR;\r
1152 \r
1153 Then the decoder must go to the begin of main loop to decode next MATCH or LITERAL.\r
1154 \r
1155 \r
1156 \r
1157 NOTES\r
1158 -----\r
1159 \r
1160 This specification doesn't describe the variant of decoder implementation \r
1161 that supports partial decoding. Such partial decoding case can require some \r
1162 changes in "end of stream" condition checks code. Also such code \r
1163 can use additional status codes, returned by decoder.\r
1164 \r
1165 This specification uses C++ code with templates to simplify describing.\r
1166 The optimized version of LZMA decoder doesn't need templates.\r
1167 Such optimized version can use just two arrays of CProb variables:\r
1168   1) The dynamic array of CProb variables allocated for the Literal Decoder.\r
1169   2) The one common array that contains all other CProb variables.\r
1170 \r
1171 \r
1172 References:      \r
1173 \r
1174 1. G. N. N. Martin, Range encoding: an algorithm for removing redundancy \r
1175    from a digitized message, Video & Data Recording Conference, \r
1176    Southampton, UK, July 24-27, 1979.\r