Memory eXpansion Technologies - Parallel Compression Demo


Parallel Compression using a Cooperative Dictionary

  • The general idea of LZ1 type compression algorithms is as follows: given an input string, replace phrases (consecutive sequences of characters in the input) with encodings of {pointer,length} pairs referring to earlier occurrences of the same phrases.
  • For example, the string 'ABCABCX' could be encoded (in symbolic form) as 'ABC{0,3}X', where "{0,3}" refers to the phrase beginning at position 0 of length 3.
  • Phrases can overlap; for example the string 'ABABABAB' would be encoded (symbolically) as 'AB{0,6}'; note that this means that not only does LZ1 type compression give good results for runs of characters, but for any sequence consisting of repeated patterns of characters (unlike most run-length encoding schemes).
  • LZ1 type compression lends itself to efficient hardware implementations using associative (also called content addressable) memories, since in each successive cycle the current character can be simultaneously compared with all previous characters (for example see references [2], [4] below for more details).
  • In order to achieve even faster compression, the input string may be divided into n blocks, and each block compressed in parallel.
  • Unfortunately dividing the input string into n blocks reduces the average "dictionary size" (that is the size of the "already seen" data for which phrase matches can potentially be found) by a factor of n, with a resulting loss of compression.
  • A solution to this problem is to allow phrases to match "already seen" data not only in the block in which the phrase resides, but in the "already seen" data in any of the other blocks being compressed in parallel.
  • The result is parallel compression using a cooperative dictionary; using this class of methods the average dictionary size is not reduced, and therefore the amount of compression is roughly the same as that of sequential methods.
  • The demo below illustrates 4-way parallel compression using a cooperative dictionary.
Demo

Enter four strings of characters, each of length up to 32, in each input block (strings less than 32 characters will be padded with blanks at the end):





Then click on the "compress" button to display the symbolic-form results of 4-way parallel compression.



Compressed strings (in symbolic form):





Note 1: "{B,P,L}" represents a pointer to data in input block B (note blocks are numbered 1,2,3,4), starting at position P (where the index is based at 0), of length L.

Note 2: this emulates "lock-step" parallel compression; that is, a pointer for data beginning at position P1 in some block can only point to a starting position P2<P1 not only in the same block but in any of the other three blocks.

Note 3: the simplest form of encoding the compressed output is to use a flag bit which indicates whether the following bits are a literal or a {B,P,L} triple; then to use either 8 bits for a literal, or 2+5+5=12 bits for a {B,P,L} triple (this can be improved on in various ways, for example by the use of Huffman coding); using this simplest encoding scheme, the input is of size 4x32x8=1024 bits, and the is bits.

Note 4: typically, better compression is obtained with longer input strings; for example, in compressed memory systems, compression ratios ranging from 2:1 to 5:1 are obtained with a unit of compression of 1024 bytes.

 

References
  1. Bell, Cleary, and Witten. Text Compression, Prentice Hall, 1990.
  2. Craft, "A fast hardware data compression algorithm and some algorithmic extensions," IBM Jour. Research & Devlopment 42, 6 (Nov. 1998), 733-746.
  3. Franaszek, Robinson, and Thomas, "Parallel compression with cooperative dictionary construction," In Proc. DCC'96 Data Compression Conf., pp. 200-209, IEEE, 1996.
  4. Franaszek, Robinson, and Thomas, "Parallel Compression and Decompression using a Cooperative Dictionary," U.S. Patent No. 5,729,228, 1998.