John Chowning didn't set out to invent FM synthesis. He was studying vibrato.
In 1967, working at Stanford's Artificial Intelligence Lab, Chowning was experimenting with the extreme ends of vibrato — pushing modulation rates far beyond anything a cellist or singer could produce. When the modulation rate crossed out of the perceptually meaningful range, something unexpected happened: the vibrato effect disappeared, and a completely new, richer tone materialized in its place. The "vibrato" had become sidebands.
That accident became the most commercially successful synthesis algorithm in history. Stanford licensed it to Yamaha in 1973. The DX7 shipped in 1983. It's on Prince's "When Doves Cry," A-ha's "Take On Me," George Michael's "Careless Whisper," and roughly half the records made that decade.
The math behind Chowning's discovery is more specific — and more useful — than most introductions to FM suggest.
What FM Actually Does to a Signal
The FM synthesis equation is:
y(t) = A · sin(2π · fc · t + I · sin(2π · fm · t))
Two oscillators: a carrier (frequency fc) and a modulator (frequency fm). The modulator isn't heard directly — it modulates the carrier's instantaneous phase. The parameter I is the modulation index, defined as:
I = Δf / fm
where Δf is the peak frequency deviation the modulator causes. At I=0, the modulator has no effect and you hear a pure sine at fc. As I increases, the carrier's phase gets pushed further and further, creating a waveform whose spectrum is increasingly complex.
The key question is: which frequencies appear, and how loud is each one?
Bessel Functions Are the Answer
Expand the FM equation using trigonometric identities, and the resulting spectrum consists of a carrier at fc plus sidebands at:
fc ± n·fm for n = 1, 2, 3, ...
Every integer multiple of the modulator frequency appears as a pair of sidebands above and below the carrier. This is what Chowning saw: his "vibrato" modulator, once it ran fast enough, was scattering energy into sidebands that combined into a new timbre.
The amplitude of each sideband pair is governed by Bessel functions of the first kind, denoted Jₙ(I):
- Carrier (n=0): J₀(I)
- 1st sideband pair (n=1): J₁(I)
- 2nd sideband pair (n=2): J₂(I)
- nth sideband pair: Jₙ(I)
This is not an approximation. It is the exact analytical solution.
Some consequences of Bessel function behavior that have direct implications for sound design:
At I=0: J₀(0) = 1, all Jₙ(0) = 0. Pure sine. No sidebands.
At I=1: J₀(1) ≈ 0.77, J₁(1) ≈ 0.44, J₂(1) ≈ 0.11. One audible sideband pair, mostly carrier.
At I=2: J₀(2) ≈ 0.22, J₁(2) ≈ 0.58, J₂(2) ≈ 0.35, J₃(2) ≈ 0.13. The carrier is now quieter than its own first sideband.
At I≈2.4: J₀(2.4) ≈ 0. The carrier frequency disappears from the spectrum entirely. This is the first zero of J₀ — the sidebands have absorbed all the energy that was in the carrier component. The sound is present, rich with harmonics, but contains no energy at fc itself.
At I=3: J₀(3) ≈ -0.26. The carrier comes back, but now with inverted phase.
A practical shortcut: the number of sidebands with significant amplitude is approximately I + 1. At I=5, you get roughly six audible sideband pairs — twelve frequency components on each side of the carrier. This is why increasing modulation index makes FM sounds progressively brighter and more complex.
C:M Ratio: Harmonic vs. Inharmonic
Whether a given FM patch sounds tonal or metallic comes down to the ratio between carrier and modulator frequencies.
If the C:M ratio is a ratio of small integers (1:1, 1:2, 2:1, 3:2, etc.), the sidebands land on frequencies that are all integer multiples of some common fundamental. The result is a harmonic spectrum — the ear hears a definite pitch.
If the C:M ratio is irrational or involves large integers, sidebands land at inharmonic frequencies. The ear hears a metallic, bell-like, or noise-adjacent quality with no clear fundamental.
Some useful ratios and what they produce:
| C:M Ratio | Spectrum type | Character |
|---|---|---|
| 1:1 | Harmonic | Trumpet-like brightness, dense harmonics |
| 1:2 | Harmonic (odd partials emphasized) | Clarinet-like quality |
| 2:1 | Harmonic | Different odd/even distribution from 1:1 |
| 1:1.4 | Inharmonic | Bell-like, attack-heavy, decays inharmonically |
| 1:3.5 | Inharmonic | Gong-like, dense metallic shimmer |
The DX7 presets that sound like acoustic instruments typically use integer or near-integer C:M ratios tuned so the carrier sits on the fundamental of the desired note. Patches with non-integer ratios are the metallic clangors, tuned percussion, and atmospheres.
Operator Topology: The DX7's 32 Algorithms
A DX7 has six oscillators, which Yamaha called operators. Each operator can be either a carrier (its output goes to the audio output) or a modulator (its output modulates another operator's phase). Yamaha provided 32 fixed routing configurations — "algorithms" — that determine which operators modulate which.
The simplest topology: one operator modulates another, and the second outputs audio. This is "2-operator FM," producing the spectrum described above.
Stack three operators in series (A → B → C, where → means "modulates"): operator A modulates B, which modulates C, which outputs audio. This is FM of FM. The spectrum becomes a convolution of two FM operations — far more complex than a single pair. With six operators in a deep chain, the spectral complexity is enormous, and each operator's modulation index becomes a critical variable.
Parallel topologies work differently: multiple carriers each modulated independently, their outputs mixed. This effectively layers multiple distinct FM timbres into one voice, which is how the DX7 achieves lush, multi-layered pads.
A practical consequence: the DX7's envelope generators were applied per-operator, not per-voice. Animating a modulator's amplitude over time — bringing its envelope in slowly — corresponds to increasing I from 0 to some peak value. The result is a timbre that evolves from pure (I≈0, carrier only) to complex (I grows, sidebands emerge) to a different pure character (I≈2.4, carrier null) and back. That evolution is not possible with subtractive synthesis in the same way, which is why FM attacks don't sound like filtered subtractive attacks.
The Feedback Operator
Several of the DX7's algorithms include an operator that feeds back into itself: its own output modulates its own input. Self-modulation produces a waveform whose spectrum is no longer analytically solvable with Bessel functions — it becomes progressively more complex and eventually sawtooth-like at high feedback levels. This is how the DX7 patches that sound like brass or raw oscillator tones work: feedback takes a sine and transforms it into something with a rich, continuous harmonic series.
What This Means for Programming FM
Understanding the math gives you predictive control that knob-twisting alone doesn't:
To brighten a sound without changing pitch: increase modulation index (raise operator output level or modulation depth). More sidebands, higher in frequency.
To create a bell: use an inharmonic C:M ratio (try 1:1.4 or 1:3.5), apply a fast-decaying envelope to the modulator only. The modulation index drops from high to low — the complex inharmonic spectrum collapses toward a pure carrier tone as it decays.
To null the carrier: set I near 2.4. This produces a hollow, formant-shifted quality where the fundamental is absent. Useful for certain horn emulations.
To get bright transients that decay to a simpler sound: start with high modulation index and let it fall with the envelope. The attack has many sidebands; the sustain is simpler. This is exactly what most DX7 piano and bass patches do.
Chowning published his original paper in 1973: "The Synthesis of Complex Audio Spectra by Means of Frequency Modulation" in the Journal of the Audio Engineering Society. It remains readable, specific, and full of worked examples. If you've ever built an FM patch and wondered why you got what you got, the answer is in there — in the form of tables of Bessel function values, one for each modulation index and sideband order.
The DX7 didn't succeed because FM synthesis was mysterious. It succeeded because once you know what the Bessel functions predict, FM synthesis is one of the most precisely controllable algorithms in audio.