[pcsx_rearmed.git] / deps / libchdr / deps / zstd-1.5.5 / contrib / match_finders / README.md

## Edit Distance Match Finder

```
/* This match finder leverages techniques used in file comparison algorithms
 * to find matches between a dictionary and a source file.
 * 
 * The original motivation for studying this approach was to try and optimize 
 * Zstandard for the use case of patching: the most common scenario being 
 * updating an existing software package with the next version. When patching,
 * the difference between the old version of the package and the new version 
 * is generally tiny (most of the new file will be identical to 
 * the old one). In more technical terms, the edit distance (the minimal number 
 * of changes required to take one sequence of bytes to another) between the 
 * files would be small relative to the size of the file. 
 * 
 * Various 'diffing' algorithms utilize this notion of edit distance and 
 * the corresponding concept of a minimal edit script between two 
 * sequences to identify the regions within two files where they differ. 
 * The core algorithm used in this match finder is described in: 
 * 
 * "An O(ND) Difference Algorithm and its Variations", Eugene W. Myers,
 *    Algorithmica Vol. 1, 1986, pp. 251-266,
 *    <https://doi.org/10.1007/BF01840446>.
 * 
 * Additional algorithmic heuristics for speed improvement have also been included.
 * These we inspired from implementations of various regular and binary diffing 
 * algorithms such as GNU diff, bsdiff, and Xdelta. 
 * 
 * Note: after some experimentation, this approach proved to not provide enough 
 * utility to justify the additional CPU used in finding matches. The one area
 * where this approach consistently outperforms Zstandard even on level 19 is 
 * when compressing small files (<10 KB) using an equally small dictionary that 
 * is very similar to the source file. For the use case that this was intended,
 * (large similar files) this approach by itself took 5-10X longer than zstd-19 and 
 * generally resulted in 2-3X larger files. The core advantage that zstd-19 has 
 * over this approach for match finding is the overlapping matches. This approach 
 * cannot find any. 
 * 
 * I'm leaving this in the contrib section in case this ever becomes interesting 
 * to explore again.
 * */
```
Commit	Line	Data
648db22b	1	## Edit Distance Match Finder
	2
	3	```
	4	/* This match finder leverages techniques used in file comparison algorithms
	5	* to find matches between a dictionary and a source file.
	6	*
	7	* The original motivation for studying this approach was to try and optimize
	8	* Zstandard for the use case of patching: the most common scenario being
	9	* updating an existing software package with the next version. When patching,
	10	* the difference between the old version of the package and the new version
	11	* is generally tiny (most of the new file will be identical to
	12	* the old one). In more technical terms, the edit distance (the minimal number
	13	* of changes required to take one sequence of bytes to another) between the
	14	* files would be small relative to the size of the file.
	15	*
	16	* Various 'diffing' algorithms utilize this notion of edit distance and
	17	* the corresponding concept of a minimal edit script between two
	18	* sequences to identify the regions within two files where they differ.
	19	* The core algorithm used in this match finder is described in:
	20	*
	21	* "An O(ND) Difference Algorithm and its Variations", Eugene W. Myers,
	22	* Algorithmica Vol. 1, 1986, pp. 251-266,
	23	* <https://doi.org/10.1007/BF01840446>.
	24	*
	25	* Additional algorithmic heuristics for speed improvement have also been included.
	26	* These we inspired from implementations of various regular and binary diffing
	27	* algorithms such as GNU diff, bsdiff, and Xdelta.
	28	*
	29	* Note: after some experimentation, this approach proved to not provide enough
	30	* utility to justify the additional CPU used in finding matches. The one area
	31	* where this approach consistently outperforms Zstandard even on level 19 is
	32	* when compressing small files (<10 KB) using an equally small dictionary that
	33	* is very similar to the source file. For the use case that this was intended,
	34	* (large similar files) this approach by itself took 5-10X longer than zstd-19 and
	35	* generally resulted in 2-3X larger files. The core advantage that zstd-19 has
	36	* over this approach for match finding is the overlapping matches. This approach
	37	* cannot find any.
	38	*
	39	* I'm leaving this in the contrib section in case this ever becomes interesting
	40	* to explore again.
	41	* */
	42	```