Blog / Development

"Partitioned Convolution: The Algorithm That Makes Real-Time Reverb Possible"

Photo by Nicolas Brigante on Unsplash

A two-second impulse response at 48 kHz contains 96,000 samples. To convolve a live audio stream with that IR in the time domain, each output sample requires 96,000 multiply-accumulate operations — roughly 4.6 billion operations per second for a 48 kHz stream, per plugin instance. Direct convolution is not a viable option for real-time work.

The algorithm that makes convolution reverb practical is partitioned convolution. It was formalized by William G. Gardner of MIT in 1995, and every convolution reverb plugin you have ever used is built on some variant of it.

Why the FFT Alone Does Not Solve the Problem

FFT-based convolution reduces the computational complexity from O(N²) to O(N log N). For 1024-sample blocks, that is roughly a 34x reduction in operations compared to direct convolution — the direct approach needs 1024² = 1,048,576 operations, the FFT-based approach needs approximately 3 × 1024 × log₂(1024) = 30,720.

But FFT convolution introduces a different problem: latency. Processing works on complete blocks. A single-block FFT convolution of a 2-second IR requires filling a 2-second input buffer before producing any output. The minimum latency equals the block size: two seconds. That is not reverb; that is rendering to disk.

Uniform Partitioned Convolution

Gardner's solution was to divide the IR into equal-sized blocks and process each independently. This is uniform partitioned convolution, published in the Journal of the Audio Engineering Society, Volume 43, Issue 3, pages 127-136, March 1995.

A 2-second IR at 48 kHz (96,000 samples) divided into 512-sample partitions yields 187 partitions. Each runs through its own FFT pipeline. The overlap-add method handles boundary conditions between partitions: each input block is zero-padded to 2× the partition size (1024 samples), FFT-convolved with the corresponding stored IR partition, and the outputs are accumulated with the correct time offsets.

Maximum added latency is now determined by the partition size, not the IR length: 512 samples at 48 kHz = 10.7 milliseconds.

The forward FFTs of the IR partitions are computed once during IR loading and stored in memory. At playback, the plugin computes one forward FFT per audio block (for the incoming audio), multiplies it against each pre-computed partition spectrum, accumulates the products, and runs one inverse FFT. This is why loading a long IR takes a brief moment — the plugin is partitioning and pre-computing. A 2-second IR at 48 kHz with 512-sample partitions stores 187 frequency-domain buffers, each 1024 complex floats. That is roughly 1.5 MB just for the stored IR spectra.

Non-Uniform Partitioning: The Practical Standard

Uniform partitioning is clean but not optimal. Small partitions mean low latency but many FFT calls, each with overhead. Large partitions are cheaper per sample but add more latency. The optimal strategy is to use different partition sizes at different points in the IR.

Non-uniform partitioned convolution was described in detail by Guillermo García at AES Convention 113 (October 2002, Paper #5660), building on the partitioned approach. A typical scheme:

  • Direct time-domain convolution for the first 50-100 samples: zero added latency, manageable cost for a short segment
  • Short FFT partitions (64-256 samples) for early reflections (first ~200 ms): low latency where the ear needs it
  • Medium partitions (512-1024 samples) for the reverb body
  • Large partitions (2048-8192 samples) for the diffuse tail

This works because the ear tolerates latency very differently across the reverb timeline. Pre-delay in the initial attack changes the perceived distance and character of a space immediately. Latency in the diffuse tail — which arrives tens of milliseconds after the transient — is perceptually irrelevant. The algorithm exploits this asymmetry.

A practical open-source implementation is Fons Adriaensen's jconvolver (Linux/JACK), which uses five partition sizes distributed across separate threads. The early-reflection stages process every audio buffer. Large tail partitions process every 8th or 16th buffer, producing results that are accumulated into an output queue. Adriaensen published the threading approach at the Linux Audio Conference in 2006.

What Explains Those Plugin Settings

Non-uniform partitioned convolution explains several plugin behaviors that might otherwise seem arbitrary.

Waves IR-1 lists a latency of 10.6 ms at 48 kHz in standard mode. That is approximately one 512-sample FFT block — the minimum partition size for early reflections in their implementation. Their "Low CPU" mode uses fewer, larger partitions, trading some early-reflection precision for roughly 45% less CPU load.

Logic Pro's Space Designer has a quality slider that adjusts the partition scheme. Coarser quality settings use fewer partitions with larger sizes. The reverb body and tail remain intact, but the early reflections become slightly less precise.

Zero-latency convolution modes — offered by some plugins — use direct time-domain convolution for the first N samples before the FFT stages take over. This eliminates added latency for the initial transient but runs at O(N²) cost for those initial samples. It is practical only for modest IR lengths or when the host buffer itself already introduces a fixed delay that makes the FFT latency acceptable by comparison.

Memory Implications of Partition Pre-computation

One implication worth understanding: the stored IR spectra grow with sample rate, not just with IR length. At 96 kHz, the same 2-second IR is 192,000 samples. At 512-sample partitions, that is 375 buffers, each 1024 complex floats — about 3 MB for the spectra alone. With multiple instances or long IRs (church halls, cathedrals regularly run 5-8 seconds), RAM consumption from stored FFT data adds up quickly.

This also explains why convolution reverbs offer a "true stereo" mode at premium CPU and RAM cost: true stereo requires four IR measurements (LL, LR, RL, RR) to capture cross-channel reflections. That is four complete sets of partitioned spectra.

The Modulation Problem

Partitioned convolution does not modulate. A static IR produces a static response. Real rooms have temperature variation, air movement, and subtle acoustic shifts that cause the late reverb tail to vary slightly over time — an effect that gives algorithmic reverbs their characteristic vitality.

Some plugins address this with hybrid architectures. LiquidSonics' Lustrous Plates combines partitioned convolution for accuracy with an algorithmic synthesis layer that introduces time-varying modulation into the tail. The partitioned convolution handles early reflections and initial character; the algorithmic layer continuously regenerates the diffuse tail with variation. The result sounds closer to hardware plate reverb units, which have the same structural limitation — a static metallic sheet — but the electronic drive circuit introduces subtle, nonlinear variation.

This modulation problem is not solvable within the convolution framework itself. It requires additional synthesis running alongside the IR playback.

A Second Implementation Reference

For developers, HiFi-LoFi's FFTConvolver library (GitHub: HiFi-LoFi/FFTConvolver) is a compact, header-only C++ non-uniform partitioned convolution library. It is practical for embedding in plugin projects and demonstrates the same structural principles: latency is set by the smallest partition, CPU cost is dominated by the largest partition's FFT operations, and the design space is about choosing the right crossover points.

Both jconvolver and FFTConvolver show the same tradeoff expressed in code: you are always choosing between latency and computational efficiency, and non-uniform partitioning is how you buy low latency in the perceptually critical early portion of the IR without paying that cost across the entire reverb.

The core algorithm is now 30 years old. What has changed is how implementations layer additional processing on top of it to address what straightforward convolution cannot do.