Data Compression

Data compression is a standard feature of most bridges and routers, as well as modems, especially those used for transferring bulky files over wireless links. Compression improves throughput by capitalizing on the redundancies found in the data to reduce frame size and thereby allow more data to be transmitted over a link.

An algorithm detects repeating characters or strings of characters and represents them as a symbol or token. At the receiving end, the process works in reverse to restore the original data. There are many different algorithms available to compress data, which are designed for specific types of data sources and the redundancies found in them but do a poor job when applied to other sources of data.

For example, the Moving Pictures Experts Group (MPEG) compression standards were designed to take advantage of the relatively small difference from one frame to another in a video stream and so do an excellent job of compressing motion pictures. On the other hand, MPEG would not be effective if applied to still images. For this data source, the Joint Photographic Experts Group (JPEG) compression standards would be applied.

JPEG is “lossy,” meaning that the decompressed image is not quite the same as the original compressed image—there is some degradation. JPEG is designed to exploit known limitations of the human eye, notably that small color details are not perceived as well as small details of light and dark.

JPEG eliminates the unnecessary details to greatly reduce the size of image files, allowing them to be transmitted faster and take up less space in a storage server. On wide area network (WAN) links, the compression ratio tends to differ by application. The compression ratio can be as high as 6 to 1 when the traffic consists of heavy-duty file transfers. The compression ratio is less than 4 to 1 when the traffic is mostly database queries.

When there are only “keep alive” signals or sporadic query traffic on a T1 line, the compression ratio can dip below 2 to 1. Encrypted data exhibit little or no compression because the encryption process expands the data and uses more bandwidth. However, if data expansion is detected and compression is withheld until the encrypted data are completely transmitted, the need for more bandwidth can be avoided.

Types of Data Compression

There are several different data-compression methods in use today over WANs—among them are Transmission Control Protocol/Internet Protocol (TCP/IP) header compression, link compression, and multichannel payload compression. Depending on the method used, there can be a significant tradeoff between lower bandwidth consumption and increased packet delay.

TCP/IP Header Compression With TCP/IP header compression, the packet headers are compressed, but the data payload remains unchanged. Since the TCP/IP header must be replaced at each node for IP routing to be possible, this compression method requires hop-by-hop compression and decompression processing. This adds delay to each compressed/decompressed packet and puts an added burden on the router’s CPU at each network node.

TCP/IP header compression was designed for use on slow serial links of 32 kbps or less and to produce a significant performance impact. It needs highly interactive traffic with small packet sizes. In such traffic, the ratio of Layer 3 and 4 headers to payload is relatively high, so just shrinking the headers can result in a substantial performance improvement.

Payload Compression Payload compression entails the compression of the payload of a Layer 2 WAN protocol, such as the Point-to-Point Protocol (PPP), Frame Relay, High-Level Data Link Control (HDLC), X.25, and Link Access Procedure–Balanced (LAPB). The Layer 2 packet header is not compressed, but the entire contents of the payload, including higher-layer protocol headers (i.e., TCP/IP), are compressed.

They are compressed using the industry standard Lemple-Ziv algorithm or some variation of that algorithm. Layer 2 payload compression applies the compression algorithm to the entire frame payload, including the TCP/IP headers. This method of compression is used on links operating at speeds from 56 to 1.544 Mbps and is useful on all traffic types as long as the traffic has not been compressed previously by a higher-layer application.

TCP/IP header compression and Layer 2 payload compression, however, should not be applied at the same time because it is redundant and wasteful and could result in the link not coming up to not passing IP traffic.

Link Compression With link compression, the entire frame— both protocol header and payload—is compressed. This form of compression is typically used in local area network (LAN)–only or legacy-only environments. However, this method requires error-correction and packet-sequencing software, which adds to the processing overhead already introduced by link compression and results in increased packet delays.

Also, like TCP/IP header compression, link compression requires hop-by-hop compression and decompression, so processor loading and packet delays occur at each router node the data traverses. With link compression, a single data compression vocabulary dictionary or history buffer is maintained for all virtual circuits compressed over the WAN link. This buffer holds a running history about what data have been transmitted to help make future transmissions more efficient.

To obtain optimal compression ratios, the history buffer must be large, requiring a significant amount of memory. The vocabulary dictionary resets at the end of each frame. This technique offers lower compression ratios than multichannel, multihistory buffer (vocabulary) data-compression methods. This is particularly true when transmitting mixed LAN and serial protocol traffic over the WAN link and frame sizes are 2 kilobytes or less. This translates into higher costs, but if more memory is added to get better ratios, this increases the upfront cost of the solution.

Mixed-Channel Payload Data Compression By using separate history buffers or vocabularies for each virtual circuit, multichannel payload data compression can yield higher compression ratios that require much less memory than other data-compression methods. This is particularly true in cases where mixed LAN and serial protocol traffic traverses the network.

Higher compression ratios translate into lower WAN bandwidth requirements and greater cost savings. But performance varies because vendors define payload data compression differently. Some consider it to be compression of everything that follows the IP header. However, the IP header can be a significant number of bytes. For overall compression to be effective, header compression must be applied. This adds to the processing burden of the CPU and increases packet delays.

External Data Compression Solutions Bridges and routers can perform data compression with optional software or add on hardware modules. While compression can be implemented via software, hardware-based compression off-loads the bridge/router’s main processor to deliver even higher levels of throughput.

With a data-compression module, the compression process can occur without as much processing delay as a software solution. The use of a separate digital signal processor (DSP) for data compression, instead of the software-only approach, enables the bridge/router to perform all its core functions without any performance penalty. This parallel-processing approach minimizes the packet delay that can occur when the router’s CPU is forced to handle all these tasks by itself.

If there is no vacant slot in the bridge/router for the addition of a data-compression module, there are two alternatives: the software-only approach or an external compression device. The software-only approach could bog down the overall performance of the router, since its processor would be used to implement compression in addition to core functions. Although an external data compression device would not bog down the router’s core functions, it means that one more device must be provisioned and managed at each remote site.

Data compression will become increasingly important to most organizations as the volume of data traffic at branch locations begins to exceed the capacity of the wide area links and as wireless services become available in the 2.4- and 5- GHz range. Multichannel payload solutions provide the highest compression ratios and reduce the number of packets transmitted across the network.

Reducing packet latency can be effectively achieved via a dedicated processor like a DSP and by employing end-to-end compression techniques rather than node-to-node compression/decompression. All these factors contribute to reducing bandwidth and equipment costs as well as improving the network response time for user applications.