Every time you load a 44.1 kHz sample into a 48 kHz session, convert a project for broadcast, or enable oversampling on a saturation plugin, your DAW runs a polyphase filter. Most engineers never think about this. The algorithm it uses determines whether the output has 97 dB or 144 dB of dynamic range, and whether your saturator introduces audible aliasing or clean harmonic content.
Why 44,100 Hz Exists
Before getting into the algorithm, there is a historical oddity that explains why sample rate conversion is so common in the first place. The 44.1 kHz standard was not chosen for acoustic reasons. It was chosen because it fits on a standard NTSC videotape cassette.
Sony's PCM-1 recorder from 1977 stored digital audio on Betamax tape by encoding samples as black-and-white dots in video frames. NTSC runs at 30 frames per second with 490 active scan lines per frame. Pack 3 samples per scan line and you get exactly 44,100 samples per second. The PAL version gives the same number through different arithmetic: 3 samples times 588 active lines times 25 frames equals 44,100.
The 48 kHz standard came later, designed for professional digital audio without the videotape constraint. That gap between the two rates is why almost every project crosses a sample rate boundary at least once, and why every DAW has an SRC engine running somewhere in the signal chain.
The Ideal Converter and Why It Cannot Exist
The theoretically perfect sample rate converter is a windowed sinc filter. The sinc function (sin(πx)/πx) is the ideal interpolation kernel: applied to a sampled signal, it reconstructs the original continuous waveform exactly up to the Nyquist frequency. Convolve the upsampled signal with an ideal sinc filter and you get perfect reconstruction with no aliasing and no distortion.
The problem is that the sinc function extends infinitely in both directions in time. To make it usable, you multiply it by a window function that tapers it to zero at some point. The Kaiser window is the standard choice because its shape parameter gives precise control over two competing requirements:
Transition bandwidth describes how steeply the filter cuts off above the anti-aliasing frequency. A steep cutoff requires more filter taps.
Stopband attenuation describes how much out-of-band energy the filter suppresses. Higher attenuation also requires more taps.
A low Kaiser parameter gives moderate stopband rejection with a short filter. A high parameter can achieve attenuation above 130 dB but requires a filter several thousand taps long. This is the fundamental cost of quality: longer filters mean more computation and more latency.
libsamplerate, the open-source SRC library used by many DAWs and audio applications, offers three preset quality levels, each reflecting a different Kaiser parameter and filter length:
| Quality Level | SNR | Bandwidth |
|---|---|---|
| Best quality | 144 dB | 96% of Nyquist |
| Medium quality | 121 dB | 90% of Nyquist |
| Fastest | 97 dB | 80% of Nyquist |
The fast mode achieves 97 dB, which sounds comfortable. Consider what that means in context: 16-bit audio has a theoretical dynamic range of 96.3 dB. The cheapest quality setting in a major open-source SRC library is roughly equivalent to injecting 16-bit quantization noise into your signal during conversion. For most listening purposes this is inaudible, but for mastering work where you are processing 32-bit float signals, it is a real degradation.
Polyphase Decomposition: How It Becomes Real-Time
Here is the computational problem. To convert 44,100 Hz to 48,000 Hz, the simplest approach is to upsample by 160 (reaching 7,056,000 Hz), apply the anti-aliasing filter, then downsample by 147. Upsampling by 160 means inserting 159 zeros between every input sample. Running a 2,048-tap filter over this zero-padded signal wastes enormous amounts of computation, because most of the multiplications involve multiplying filter coefficients by zero.
Polyphase decomposition avoids this waste entirely. The insight is that you can decompose the full filter kernel into M subfilters, called polyphase branches, where M is the upsampling factor. Each subfilter only handles one class of output sample. When you need output sample n, you apply subfilter (n mod M) directly to the input samples without any zero-padding.
For the 44.1 to 48 kHz conversion with M=160, you precompute 160 subfilters from the Kaiser-windowed sinc kernel. Each subfilter contains 1/160th of the total taps. Computing any one output sample costs the same as running a much shorter filter on the original input signal, not the inflated zero-padded version. The mathematics is equivalent to the naive approach; the computation is a fraction of the cost.
r8brain, a high-quality SRC library used in several professional applications, extends this further. Rather than a single polyphase stage, it converts through a series of intermediate rates that can be expressed as whole-number ratios, applying a polyphase filter at each stage. The multi-step approach allows each individual filter to be designed with simpler specifications while the full conversion chain achieves the target quality. The implementation is open-source and worth examining if you want to see a production-quality polyphase SRC in C++.
Plugin Oversampling and Nonlinear Processing
For linear processing, sample rate conversion is a transparent technical step. Nonlinear processing is different.
When a saturation plugin clips or distorts a signal, it creates harmonics. At 44.1 kHz, any harmonic above 22,050 Hz wraps back into the audible spectrum through aliasing. A 5 kHz input signal under hard clipping generates harmonics at 10 kHz, 15 kHz, 20 kHz, 25 kHz, and beyond. The 25 kHz harmonic aliases back to 44,100 minus 25,000 equals 19,100 Hz. That alias is not harmonically related to 5 kHz, and it appears as an inharmonic artifact in the output.
Running the same plugin at 4x oversampling (176,400 Hz) shifts the Nyquist boundary to 88,200 Hz. Harmonics would need to reach 88.2 kHz before aliasing into the audible spectrum. The oversampled signal is then downsampled back to 44.1 kHz through a polyphase filter, which removes everything above 22,050 Hz including any remaining aliased components. The result is saturation with only harmonically correct overtones.
For soft saturation and gentle drive, 2x oversampling is usually adequate. For hard clipping or aggressive distortion processing high-frequency content, the higher-order harmonics generated can still produce audible aliases at 2x, particularly when the input signal has significant energy above 8 kHz. 4x is a practical minimum for alias-free hard distortion in typical use cases.
The computational cost scales linearly with the oversampling ratio. 4x costs four times as much as 1x, which can be prohibitive for complex nonlinear models in real-time contexts.
One alternative is Antiderivative Anti-Aliasing (ADAA), developed by researcher Jatin Chowdhury. Rather than brute-forcing alias rejection through oversampling, ADAA reduces aliasing at the mathematical level by integrating the nonlinear transfer function analytically. For applicable nonlinear functions, it can achieve quality comparable to 4x oversampling at the computational cost closer to 1x. The technique has been implemented in open-source form and is beginning to appear in commercial plugins.
What This Means in Practice
If your DAW exposes an export SRC quality setting, use the highest option for any final bounce. The compute cost at render time is negligible, and the quality difference between the fast and best presets is measurable with any decent analyzer.
For plugin oversampling, treat the setting as meaningful rather than a marketing feature. 2x is not the same as 4x for distortion, especially on modern high-frequency synthesis content. If latency permits, 4x is the practical standard for clean nonlinear processing.
If you are writing audio processing code, the polyphase approach is already implemented in most FIR interpolator classes in JUCE and similar frameworks. The less visible choice is the filter quality: the kernel length and Kaiser parameter passed to the FIR designer. Default values are usually set for medium quality at real-time cost. For offline rendering modes, using the highest-quality preset costs only time.
The algorithm running quietly behind every project export and every plugin oversample has decades of mathematical refinement behind it. Knowing where it can fail, and why some converters sound better than others, is useful knowledge any time a sample rate boundary appears in your signal chain.