Voice Compression

Voice compression entails the application of various algorithms to the voice stream to reduce bandwidth requirements while preserving the quality or audibility of the voice transmission. Numerous compression standards for voice have emerged over the years that allow businesses to achieve substantial savings on leased lines with only a modest cost for additional hardware.

Using these standards, the normal 64-kbps voice channel can be reduced to 32, 16, or 8 kbps, or even as little as 6.3 and 5.3 kbps, for sending voice over the Internet or cellular phone networks. As the compression ratio increases, however, voice quality diminishes. In the 1960s, the CCITT standardized the use of Pulse Code Modulation (PCM) as the internationally accepted coding standard (G.711) for toll-quality voice transmission.

Under this standard, a single voice channel requires 64 kbps when transmitted over the telephone network, which is based on Time Division Multiplexing (TDM). The 64-kbps PCM time slot—or payload bit rate—forms the basic building block for today’s public telephone services and equipment, such that 24 time slots or channels of 64 kbps each that can be supported on a T1 line.

Pulse Code Modulation

A voice signal takes the shape of a wave, with the top and the bottom of the wave constituting the signal’s frequency level, or amplitude. The voice is converted into digital form by an encoding technique called Pulse Code Modulation (PCM). Under PCM, voice signals are sampled at the minimum rate of two times the highest voice frequency level of 4000 Hertz (Hz), which equates to 8000 times per second.

The amplitudes of the samples are encoded into binary form using enough bits per sample to maintain a high signal-to-noise ratio. For quality reproduction, the required digital transmission speed for 4-kHz voice signals works out to 8000 samples per second × 8 bits per sample = 64,000 bps (64 kbps). The conversion of analog voice signals to and from digital is performed by a coder-decoder, or codec, which is a key component of D4 channel banks and multiplexers.

The codec translates amplitudes into binary values and performs mu-law quantizing. The mu-law process (North America only) is an encoding-decoding scheme for improving the signal-to-noise ratio. This is similar in concept to Dolby noise reduction, which ensures quality sound reproduction.

Other components in the channel bank or multiplexer interleave the digital signals representing as many as 24 channels to form a 1.544-Mbps bit stream (including 8 kbps for control) suitable for transmission over a T1 line. PCM exhibits high quality, is robust enough for switching through the public network without suffering noticeable degradation, and is simple to implement. But PCM allows for only 24 voice channels over a T1 line. Digital compression techniques can be applied to multiply the number of channels on a T1 line.

Compression Basics

Among the most popular compression methods is Adaptive Differential Pulse Code Modulation (ADPCM), which has been a worldwide standard since 1984. It is used primarily on private T-carrier networks to double the channel capacity of the available bandwidth from 24 to 48 channels, but it can be applied to microwave and satellite links as well.

ADPCM is also used on some cellular networks such as those based on the Personal Handyphone System (PHS) and Personal Air Communications Systems (PACS). Both employ 32-kbps ADPCM waveform encoding, which provides near landline voice quality. ADPCM has demonstrated a high degree of tolerance to the cascading of voice encoders (vocoders), as experienced when a mobile subscriber calls a voice-mail system and the mailbox owner retrieves the message from a mobile phone.

With other mobile technologies, the playback quality is noticeably diminished, but with PHS and PACS, it is very clear. The ADPCM device accepts the 8000-sample-per-second rate of PCM and uses a special algorithm to reduce the 8-bit samples to 4-bit words. These 4-bit words, however, no longer represent sample amplitudes but only the difference between successive samples. This is all that is necessary for a like device at the other end of the line to reconstruct the original amplitudes.

Integral to the ADPCM device is circuitry called the “adaptive predictor” that predicts the value of the next signal based only on the level of the previously sampled signal. Since the human voice does not usually change significantly from one sampling interval to the next, prediction accuracy can be very high. Afeedback loop used by the predictor ensures that voice variations are followed with minimal deviation.

Consequently, the high accuracy of the prediction means that the difference in the predicted and actual signal is very small and can be encoded with only 4 bits rather than the 8 bits used in PCM. In the event that successive samples vary widely, the algorithm adapts by increasing the range represented by the 4 bits. However, this adaptation will decrease the signal-to-noise ratio and reduce the accuracy of voice frequency reproduction.

At the other end of the digital facility is another compression device, in which an identical predictor performs the process in reverse to reinsert the predicted signal and restore the original 8-bit code. By halving the number of bits to accurately encode a voice signal, T1 transmission capacity is doubled from the original 24 channels to 48 channels, providing the user with a 2 for 1 cost savings on monthly charges for leased T1 lines.

It is also possible for ADPCM to compress voice to 16 kbps by encoding voice signals with only 2 bits instead of 4 bits, as discussed above. This 4 to 1 level of compression provides 96 channels on a T1 line without significantly reducing signal quality. Although other compression techniques are available for use on wire and wireless networks, ADPCM offers several advantages.

ADPCM holds up well in the multinode environment, where it may undergo compression and decompression several times before arriving at its final destination. And unlike many other compression methods, ADPCM does not distort the distinguishing characteristics of a person’s voice during transmission.

Variable-Rate

ADPCM Some vendors have designed ADPCM processors that not only compress voice but also accommodate 64-kbps passthrough as well. The use of very compact codes allows several different algorithms to be handled by the same ADPCM processor. The selection of algorithm is controlled in software and is done by the network manager. Variable-rate ADPCM offers several advantages.

Compressed voice is more susceptible to distortion than uncompressed voice—16 kbps more so than 32 kbps. When line conditions deteriorate to the point where voice compression is not possible without seriously disrupting communications, a lesser compression ratio may be invoked to compensate for the distortion. If line conditions do not permit compression even at 32 kbps, 64-kbps pass-through may be invoked to maintain quality voice communication.

Of course, channel availability is greatly reduced, but the ability to communicate with the outside world becomes the overriding concern at this point rather than the number of channels. Variable-rate ADPCM provides opportunities to allocate channel quality according to the needs of different classes of users. For example, all intracompany voice links may operate at 16 kbps, while those used to communicate externally may be configured to operate at 32 kbps.

The number of channels may be increased temporarily by compressing voice to 16 kbps instead of 32 kbps until new facilities can be ordered, installed, and put into service. As new links are added to keep up with the demand for more channels, the other links may be returned to operation at 32 kbps. Variable-rate ADPCM, then, offers much more channel configuration flexibility than products that offer voice compression at only 32 kbps.

Other Compression Techniques

Other compression schemes can be used over T-carrier facilities, such as Continuously Variable Slope Delta (CVSD) modulation and Time Assigned Speech Interpolation (TASI).

CVSD The higher the sampling rate, the smaller is the average difference between amplitudes. At a high enough sampling rate—32,000 times a second in the case of 32- kbps voice—the average difference is small enough to be represented by only 1 bit. This is the concept behind CVSD modulation, where the 1 bit represents the change in the slope of the analog curve.

Successive 1s or 0s indicate that the slope should get steeper and steeper. This technique can result in very good voice quality if the sampling rate is fast enough. Like ADPCM, CVSD will yield 48 voice channels at 32 kbps on a T1 line. But CVSD is more flexible than ADPCM in that it can provide 64 voice channels at 24 kbps or 96 voice channels at 16 kbps. This is so because the single-bit words are sampled at the signaling rate.

Thus, to achieve 64 voice channels, the sampling rate is 24,000 times a second, while 96 voice channels takes only 16,000 samples per second. In reducing the sampling rate to obtain more channels, however, the average difference between amplitudes becomes greater. And since the greater difference between amplitudes is still represented by only 1 bit, there is a noticeable drop in voice quality. Thus the flexibility of CVSD comes at the expense of quality. It is even possible for CVSD to provide 192 voice channels at 8 kbps.

TASI Since people are not normally able to talk and listen simultaneously, network efficiency at best is only 50 percent. And since all human speech contains pauses that constitute wasted time, network efficiency is further reduced by as much as 10 percent, putting maximum network efficiency at only 40 percent.

Statistical voice compression techniques, such as Time Assignment Speech Interpolation (TASI), take advantage of this quiet time by interleaving various other conversation segments together over the same channel. TASI-based systems actually seek out and detect the active speech on any line and assign only active talkers to the T1 facility. Thus TASI makes more efficient utilization of “time” to double T1 capacity.

At the distant end, the TASI system sorts out and reassembles the interwoven conversations on the line to which they were originally intended. The drawback to statistical compression methods is that they have trouble maintaining consistent quality. This is so because such techniques require a high number of channels, at least 100, from which a good statistical probability of usable quiet periods may be gleaned. However, with as few as 72 channels, a channel gain ratio of 1.5 to 1 may be achieved.

If the number of input channels is too few, a condition known as “clipping” may occur, in which speech signals are deformed by the cutting off of initial or final syllables. Arelated problem with statistical compression techniques is freeze-out, which usually occurs when all trunks are in use during periods of heavy traffic.

In such cases, a sudden burst in speech can completely overwhelm the total available bandwidth, resulting in loss of entire strings of syllables. Another liability inherent in statistical compression techniques, even for large T1 users, is that they are not suitable for transmissions having too few quiet periods, such as when facsimile and music on-hold is used. Statistical compression techniques, then, work better in large configurations than in small ones.

Adding lines and equipment is one way that organizations can keep pace with increases in traffic. But even when funds are immediately available for such network upgrades, communications managers must contend with the delays inherent in ordering, installing, and putting new facilities into service.

To accommodate the demand for bandwidth in a timely manner, communications managers can apply an appropriate level of voice compression to obtain more channels out of the available bandwidth. Depending on the compression technique selected, there need not be a noticeable decrease in voice quality.