Recently, my advisor ( Yoram Bresler) and I have been working on parallel algorithms for lossless data compression. This page describes what lossless compression algorithms are good for, why one might want to parallelize them, and what the design goals for a parallal compression algorithm are. We then describe our research in this context.
In lossless data compression, we want to represent an input x, which we assume to be N bits long, with a code C(x), which is also comprised of bits. We also require that
Many modern data compression methods are lossy, i.e., the input cannot be reconstructed perfectly from the code. For some types of data, such as images and video, some distortion between the input and its reconstruction may be acceptable. Lossy compression methods have the additional benefit that they often compress very well.
In some applications, reconstructing the input perfectly is essential:
As processors, storage systems, and communications devices become faster, data compression systems must also support much higher data rates. At a fixed data rate, there is a tradeoff between the amount of resources, e.g., memory, that we allow the compression algorithm to use, and how well it compresses.
If we want to increase the speed by a small factor, it may suffice to choose a (faster) algorithm that may use more resources and/or compress poorly. Alternatively, we could run the original algorithm on faster hardware, but that (again) uses more resources. If we need substantially higher data rates, the only feasible option is using parallel lossless compression algorithms.
There are two trivial approaches to speed up compression by a factor of B:
The goal of our recent research has been to develop parallel lossless compression algorithms that avoid both the problems described before. We want to have B computational units processing a single length-N input, such that the compression ratio is almost as good as that obtained with a regular (serial) compression algorithm.