Parallel data compression algorithms



Overview

Recently, my advisor ( Yoram Bresler) and I have been working on parallel algorithms for lossless data compression. This page describes what lossless compression algorithms are good for, why one might want to parallelize them, and what the design goals for a parallal compression algorithm are. We then describe our research in this context.



Lossless data compression

In lossless data compression, we want to represent an input x, which we assume to be N bits long, with a code C(x), which is also comprised of bits. We also require that

A lossless code is useful if for most ``typical'' length-N inputs, the length of the code is significantly smaller than N. A low compression ratio is the main ambition of most lossless data compression algorithms.

Motivation

Many modern data compression methods are lossy, i.e., the input cannot be reconstructed perfectly from the code. For some types of data, such as images and video, some distortion between the input and its reconstruction may be acceptable. Lossy compression methods have the additional benefit that they often compress very well.

In some applications, reconstructing the input perfectly is essential:

In these applications, lossless compression can be useful for reducing the storage requirements for the information, or reducing the amount of time required for communicating the information to another location.

Parallel compression

As processors, storage systems, and communications devices become faster, data compression systems must also support much higher data rates. At a fixed data rate, there is a tradeoff between the amount of resources, e.g., memory, that we allow the compression algorithm to use, and how well it compresses.

If we want to increase the speed by a small factor, it may suffice to choose a (faster) algorithm that may use more resources and/or compress poorly. Alternatively, we could run the original algorithm on faster hardware, but that (again) uses more resources. If we need substantially higher data rates, the only feasible option is using parallel lossless compression algorithms.



Parallel design space

There are two trivial approaches to speed up compression by a factor of B:

Therefore, these trivial approaches have severe shortcomings.

Our research

The goal of our recent research has been to develop parallel lossless compression algorithms that avoid both the problems described before. We want to have B computational units processing a single length-N input, such that the compression ratio is almost as good as that obtained with a regular (serial) compression algorithm.


Back to my homepage.