Blog / Sound Design

How the Channel Vocoder Works: From Bell Labs to Kraftwerk

How the Channel Vocoder Works: From Bell Labs to Kraftwerk
Photo by Adi Goldstein on Unsplash

In 1943, the United States Army installed a room-sized machine in the Pentagon basement. Code-named SIGSALY, it encoded speech using a vocoder: a device that analyzed voice into a compact stream of amplitude measurements, transmitted those measurements securely, then reconstructed speech from them at a distant terminal. The system was good enough to encrypt Roosevelt and Churchill's conversations across the Atlantic. Three decades later, Kraftwerk plugged a Sennheiser VSM 201 into their studio and used the same underlying principle to sing about highways.

Understanding why the channel vocoder works, and why it produces that distinctive robotic timbre, requires following the signal from microphone to output.

Analysis: Turning Speech Into Envelopes

A channel vocoder separates its job into two parallel processes: analysis and synthesis.

The analysis stage begins with a bank of bandpass filters covering the audible frequency range. Homer Dudley's original 1938 Bell Labs design used 10 bands with 300 Hz spacing from 0 to roughly 3,000 Hz, enough to cover the formants that distinguish vowel sounds in human speech. Modern implementations extend this considerably. The EMS Vocoder 2000 splits the spectrum across 16 bands from 20 Hz to 18 kHz, with 30 dB/octave filter slopes. More bands preserve more spectral detail.

Each filter isolates a frequency slice of the incoming signal. What comes out of each bandpass output is an amplitude-modulated signal at that band's center frequency: loud when the input has energy there, quiet when it does not. An envelope follower at each output converts this AC signal into a slowly-varying DC control voltage. The process is straightforward: rectify the signal, then smooth it with a lowpass filter. The result is a time-varying measurement of how much energy the input has in that frequency band at each moment.

The time constants of the smoothing filter matter considerably. Attack too slow and transients blur; release too long and adjacent phonemes bleed together. The EMS Vocoder 2000 includes a "slew" control that deliberately extends these time constants. The smearing becomes an effect rather than a deficiency, softening hard consonants into something more legato and synthetic.

What exits the analysis stage is a set of control voltages, one per band, each describing the instantaneous amplitude of the input at that frequency. The speech has been reduced to an envelope fingerprint.

Synthesis: Imposing the Fingerprint on a Carrier

The synthesis stage takes these control voltages and applies them to a second signal: the carrier.

The carrier passes through its own bank of bandpass filters with the same center frequencies as the analysis bank. Each filtered carrier output feeds into a voltage-controlled amplifier. The control voltage from the corresponding analysis band drives the VCA gain. When the analysis band detects a formant, its control voltage rises and the VCA lets more carrier through at that frequency. When the band goes quiet, the VCA closes.

Sum all the VCA outputs together and you have a signal with the carrier's tonal character but the spectral envelope of the original speech. Use a harmonically rich synthesizer patch as the carrier and the result speaks in that synth's voice. Use white noise, a cello, or a choir and the character shifts entirely. The VSM 201 was commonly driven with a sawtooth oscillator: dense, buzzy, unmistakably synthetic.

Channel Vocoder vs. Phase Vocoder

The shared word creates confusion. The underlying signal processing is quite different.

A channel vocoder works in the filter domain. It uses a parallel bank of bandpass filters, discards phase information entirely, and represents each band as a time-varying amplitude. The process can be built from entirely analog components. Dudley's prototype was electromechanical.

A phase vocoder works in the frequency domain. It applies a Short-Time Fourier Transform to the input, producing complex-valued spectral bins that encode both amplitude and phase at each frequency. The phase derivative over time gives an estimate of each bin's instantaneous frequency, which is what allows the phase vocoder to stretch time without changing pitch, or shift pitch without changing duration. Phase vocoders are inherently digital and computationally more demanding.

Concretely: a channel vocoder collapses each band to a single amplitude value per time frame. A phase vocoder retains amplitude and phase per FFT bin per frame. The phase vocoder contains far more information and enables operations the channel vocoder cannot, but the channel vocoder's simplicity is what made it practical in 1938 and what gives it its characteristic character today.

Voiced and Unvoiced Detection

One problem the basic vocoder design struggles with is consonant intelligibility. Consonants like /s/, /f/, and /t/ are unvoiced: produced by turbulent airflow through constrictions in the vocal tract, not by periodic vocal cord vibration. They are noise-like in character, with high zero-crossing rates and energy spread broadly across high frequencies. A voiced carrier, a periodic oscillator, does a poor job representing them.

More sophisticated vocoders include a voiced/unvoiced detector that monitors the zero-crossing rate of the incoming signal. When the detector flags an unvoiced region, the synthesis stage mixes in white noise alongside or instead of the oscillator. The EMS Vocoder 2000 implements this as an explicit module. Without it, sibilants dissolve and words become ambiguous. With it, the vocoder can reproduce speech with enough clarity to function as communications gear, which was the original requirement.

For musical use, the intelligibility loss from a simple oscillator carrier is often exactly what is wanted. For speech synthesis applications, the detector is necessary.

Band Count and Practical Choices

Eight bands produce a coarse approximation. Sixteen is where articulation becomes usable for musical purposes. The Sennheiser VSM 201 ran 20 bands, and approximately 50 units were ever manufactured. Kraft­werk, Jean-Michel Jarre, Herbie Hancock, and Daft Punk all worked with it. Its 20 VCA outputs also provided control voltages for each band, making it possible to route vocoder analysis data into a modular synthesizer and use the envelope information to control parameters other than amplitude.

The Roland VP-330 used only 10 bands, trading spectral fidelity for a warmer, more diffuse character. That softer quality suited the string pad sounds it was also designed to produce, but it meant less articulation clarity compared to higher-band-count designs.

The carrier signal matters as much as the band count. A sawtooth wave's harmonic content extends across all synthesis filter bands; a sine wave has energy only at the fundamental and produces very little output from upper bands regardless of what the analysis stage requests. For maximum intelligibility, choose a carrier with energy distributed across the full frequency range the filters cover. For deliberate tonal coloring, the carrier choice is half the instrument.

From SIGSALY to the Studio

SIGSALY was decommissioned in 1946. The vocoder technology it used migrated into telecommunications research, speech coding, and eventually into music. Dudley's 1938 patents described the bandpass-filter-bank architecture that still defines channel vocoders today.

What changed over those decades was the carrier. In Dudley's design, the carrier was a telephone-quality buzzer source meant to reconstruct speech efficiently over limited bandwidth. When Wendy Carlos and Robert Moog built a vocoder for the 1971 A Clockwork Orange soundtrack, the carrier was a Moog synthesizer, used to vocalize the fourth movement of Beethoven's Ninth Symphony. The analysis-synthesis architecture was identical to Dudley's. The intention was different. So, consequently, was the sound.

The channel vocoder's core insight remains unchanged after nearly 90 years: you can separate the spectral envelope of a sound from its excitation source, discard the source, and impose those envelopes on something else entirely. SIGSALY used that separation to compress and encrypt speech for secure transmission. Kraftwerk used it to make a synthesizer sing. The mathematics behind both applications is the same filter bank, the same envelope followers, the same bank of VCAs.