STM32 Polyphonic Synth: Bare-Metal DSP
8-Voice Polyphonic Wavetable Synthesizer on Cortex-M4
Overview
I built a fully polyphonic synthesizer on bare hardware (no Linux, no RTOS, no audio framework) to understand the full stack from clock configuration to DSP algorithm. The STM32F407G Discovery was the target: an on-board CS43L22 codec, a Cortex-M4 FPU, and enough SRAM to be genuinely interesting without being trivially easy.
The result is a self-contained USB MIDI instrument built solo over 4 weeks: 8-voice polyphony, real-time wavetable morphing across 8 waveforms (sine, saw, square, Rhodes, clav, choir, acid, glass), a zero-delay feedback filter, hardware knobs, an OLED display, and per-voice LED metering.

Constraints
The system runs under hard real-time constraints with no safety net: no OS scheduler, no dynamic memory allocation on the audio path, and no tolerance for dropouts.
| Constraint | Requirement |
|---|---|
| Sample rate | 48 kHz, interrupt-driven; no dropout tolerance |
| Audio bit depth | 16-bit signed int via I2S to CS43L22 DAC |
| CPU | STM32F407VG @ 168 MHz, Cortex-M4 with FPV4-SP-D16 FPU |
| Memory | No dynamic allocation on the audio path; VoiceManager placed in CCMRAM |
| Polyphony | 8 simultaneous voices, no audible artifacts on voice steal |
| Buffer / Latency | 128-sample stereo circular buffer; 64-sample halves filled inside one 1.333 ms interrupt window |
| Scheduling | All timing is interrupt-driven; no OS |
Architecture
The system strictly separates hardware control from the synthesis engine. The audio path runs exclusively inside the DMA half-transfer interrupt and never touches I2C, UART, or any shared peripheral; this is the core guarantee that makes real-time stability possible.

To guarantee deterministic audio processing, I eliminated mutexes within the audio callback. Parameter updates leverage the ARM Cortex-M4 memory model to ensure atomic float writes. This maintains thread safety between the control loop and the audio ISR without introducing priority inversion or jitter. MIDI input is decoupled via a lock-free circular buffer, allowing the synthesis engine to operate in total isolation from the main loop.
User hardware interaction is managed via a dedicated TIM4 interrupt, which drives ADC DMA scanning across eight potentiometers via a analog multiplexer. A GPIO interrupt handles button debouncing for waveform selection. To prevent bus contention, the OLED and LED drivers update over I2C at a lower priority, fully decoupled from the time-critical audio path.
CCMRAM placement of VoiceManager eliminates bus contention with the DMA controller during the critical audio interrupt window.
Engineering Deep-Dive
Problem Filter Selection Within the CPU Budget
The original target was a Moog Ladder filter. Profiling with DWT->CYCCNT revealed it costs 275 cycles/sample, and that's already the aggressively reduced version: per-stage tanh saturation was replaced with a single tanh on the combined input+feedback signal. A faithful Huovilainen implementation (5 tanh calls per oversample iteration, ×4 for oversampling) would run around 4 times more expensive. Even the stripped-down version costs 838 µs per 8-voice block before oscillators or ADSR run.
Solution:
Replaced it with a Zero-Delay Feedback State Variable Filter (ZDF SVF). Solving the algebraic loop analytically per sample eliminates the unit-delay approximation of naive implementations and gives analog-accurate phase response at 82 cycles/sample, a 3.35x reduction. At 8 voices, filter cost drops to 251 µs (18.8% of budget), making full polyphony viable.
Problem Voice Stealing Without Audible Clicks
When all 8 voices are active, a new note must steal an existing one. A hard oscillator reset produces an immediate discontinuity in the audio signal, an audible click.
Solution:
A two-stage soft-kill: adsr.kill() calculates a ramp that fades the stolen voice to silence over 240 samples (~5 ms at 48 kHz). The oscillator stores the incoming note in a pending struct and waits for the ramp to reach zero before executing noteOn. Voice allocation follows a deterministic LRU priority: idle → released → oldest active. Only the oldest-active path triggers the kill ramp.
Problem Custom USB MIDI Class Compliance
The STM32 HAL USB library provides generic device templates but no MIDI class, so the board would appear as an unknown device to any DAW.
Solution:
Modified the USB Device Middlewares to implement custom MIDI descriptors from scratch. The board now enumerates as a standard class-compliant MIDI device and is recognized plug-and-play by any DAW without drivers.
Validation & Performance
CPU Profiling
8-voice polyphony runs at 26% CPU utilization, leaving 74% headroom in the 1.333 ms audio block window. Measured with DWT->CYCCNT reads before and after VoiceManager::process().
| Condition | Time | % of 1.333 ms window |
|---|---|---|
| Baseline overhead (0 voices) | 20 µs | 1.5% |
| 8 voices, full polyphony | 347 µs | 26.0% |
| Per-voice cost | ~41 µs | ~3.1% |
| Headroom remaining | 986 µs | 74.0% |
Filter Comparison
The Moog Ladder was profiled against the ZDF SVF under identical conditions. At 8 voices, the Moog consumes 62.9% of the entire audio block budget on filtering alone before a single oscillator or envelope runs. The SVF's 3.35× efficiency advantage is what makes 8-voice polyphony viable on this hardware.
| Filter | Cycles/sample | Cost at 8 voices | % of budget |
|---|---|---|---|
| ZDF SVF (shipped) | 82 | ~251 µs | 18.8% |
| Moog Ladder (rejected) | 275 | ~838 µs | 62.9% |
Algorithm Correctness
A host-side CTest suite in tests/ runs on any desktop compiler, no hardware required. It separates algorithm correctness from hardware bring-up and runs on every push via GitHub Actions CI.
| Test | Result |
|---|---|
| SVF −3 dB at cutoff | Measured ratio 0.7071 vs. theoretical 0.70711, < 0.001% error |
| SVF self-oscillation stability | Bounded over 10,000 samples at max resonance (k = 0.01) |
| Moog Ladder DC stability | Converges without drift at resonance = 0 |
| ADSR attack accuracy | Reaches 1.0 within ±10% of configured attack time |
| ADSR kill ramp duration | Reaches silence within 300 samples (contract: 240 = 5 ms @ 48 kHz) |
| ADSR release convergence | Forced to exact 0.0 → IDLE, no asymptotic fade |
Memory Utilization
| Region | Total | Used | Utilization |
|---|---|---|---|
| Flash | 1 MB | 212 KB | 20.7% |
| SRAM | 128 KB | 69 KB | 54.0% |
| CCMRAM | 64 KB | 34 KB | 52.6% |
Flash usage is dominated by wavetable data: 8 waveforms x 4096 samples x 4 bytes = 128 KB. CCMRAM is almost entirely the VoiceManager, a deliberate placement, not a forced one.
Reflection
Real-time audio programming has a single governing rule: the audio callback is sacred. No allocations, no blocking calls, no peripheral access. Every architectural decision in this project (CCMRAM placement, atomic float writes, the kill ramp) exists to protect that invariant.
The filter swap was the most instructive moment. Choosing the Moog Ladder based on "analog character" before measuring its CPU cost would have made 8-voice polyphony impossible. The limited speed of the MCU revealed how expensive the Huovilainen model is, and how much of a privilege it is to be able to use it in real-time.
If I rebuilt this today, I'd replace the implicit Cortex-M4 atomicity of float writes with an explicit lock-free SPSC ring buffer for parameter updates, more portable, easier to audit, and not dependent on knowing the architecture's memory guarantees.
Future Work
Lock-Free Parameter Queue
Replace implicit Cortex-M4 atomic float writes with an explicit SPSC ring buffer, more portable and easier to reason about under code review.
Wavetable Anti-Aliasing
Implement BLIT or MinBLEP to suppress harmonic aliasing at high pitches, a known limitation of the current phase accumulator approach.
LFO Modulation Matrix
Add LFO-to-filter and LFO-to-amplitude routing with a configurable mod matrix, enabling filter sweeps and tremolo without hardware changes.
CMSIS-DSP SIMD Optimization
Port the ZDF SVF inner loop to CMSIS-DSP SIMD intrinsics for further CPU reduction, freeing headroom for additional voices or effects.
Custom PCB Shield
Consolidate the breadboarded multiplexer and potentiometer arrays into a unified PCB shield for the Discovery board.
SysEx Patch Storage
Add SysEx support for saving and recalling patches over MIDI, turning the synth into a fully programmable instrument.