I have been building a set of audio engineering skills for coding agents, and the deeper I get into it, the more obvious the goal becomes: an agent should not merely know how to write C++, generate a patch, or call a compiler. It should understand the shape of audio work.
Audio software has a different nervous system than most code. A web app can wait a few milliseconds. A database job can retry. A command-line tool can allocate a string whenever it feels like it. Audio cannot be that casual. The process callback is a deadline. Every sample has to arrive on time. Every filter coefficient has to respect sample rate. Every patch format has its own small laws. Every plugin framework has a lifecycle, and violating it usually produces clicks, silence, crashes, or the kind of bug that only appears after five minutes of feedback delay at 64 samples.
That is why I started the Shortwav Labs agent skills collection. It is a set of focused, local knowledge packs for agents working on audio projects. The first group covers DSP, Faust, plugdata and Pure Data patches, JUCE plugins, VCV Rack modules, and Vult.
The point is not to make an agent sound more confident. The point is to make it useful in the places where audio engineering is unforgiving.
Why Skills Instead of Just Prompts
A prompt can say, "write a filter." A skill can say, "when writing a filter, remember coefficient derivation, sample-rate independence, denormal prevention, smoothing, state reset, and how this target framework expects audio to move."
That distinction matters. Most agent mistakes in audio code are not exotic. They are mundane and destructive:
- Allocating in
processBlock() - Recalculating expensive coefficients every sample
- Forgetting to clear delay buffers on reset
- Treating control-rate and audio-rate signals as interchangeable
- Hardcoding 48 kHz into an algorithm
- Writing a Pure Data patch with broken object indices
- Generating a Rack module without respecting +/-5V audio conventions
- Building a JUCE parameter layout that the UI cannot safely attach to
The skills are designed to make those mistakes harder. They turn domain assumptions into reusable working memory.
The Core DSP Skill
The dsp skill is the foundation. It is a general-purpose real-time audio reference for C and C++ work, organized around the things that come up constantly: filters, effects, synthesis, spectral analysis, and utility code.
This one is intentionally close to the metal. It talks about RBJ biquads, Moog ladder filters, state variable filters, FIR design, delay interpolation, reverb topologies, compressors, waveshapers, oscillators, envelopes, FFT analysis, Goertzel detection, dithering, denormal prevention, fast math, and lock-free FIFO patterns.
More importantly, it teaches an agent the rules of the hot path:
// Good audio instincts start here.
// Derive from sample rate. Smooth changes. Keep state explicit.
float w0 = 2.0f * M_PI * cutoffHz / sampleRate;
That sounds basic, but it is the difference between writing a code-shaped answer and writing something that might survive inside an actual plugin. The DSP skill keeps pulling the agent back to the boring truths that make audio code good: no hidden allocations, no blocking, no surprise discontinuities, no sample-rate fantasy.
I think of it as the shared grammar for everything else in the collection.
Faust as a Block Diagram Brain
Faust is interesting because it asks you to think in signal processors rather than imperative loops. The faust-dsp skill teaches the agent to work from that model: define process, import stdfaust.lib, sketch a block diagram, add UI primitives, and compile to the target.
Faust has its own logic. The five composition operators are the language:
- Parallel with
, - Sequential with
: - Split with
<: - Merge with
:> - Recursive feedback with
~
If an agent misses that, it will write text that looks like Faust but does not think like Faust. The skill nudges it toward the real workflow: describe the graph, pick library functions, expose parameters with hslider, button, or checkbox, and then choose the compilation route.
That last part is where Faust becomes especially agent-friendly. A single .dsp file can target C++, WebAssembly, Pure Data, Max, SuperCollider, JACK apps, embedded boards, mobile, or plugin wrappers. The skill carries that map so the agent can help decide whether something should become a quick experiment, a web demo, an external, or a production DSP core.
plugdata and the Text Behind the Patch
The plugdata-patch skill is probably the most visibly satisfying one, because Pure Data patches are already text. A .pd file is a graph serialized as lines:
#N canvas 0 0 450 300 12;
#X obj 100 50 osc~ 440;
#X obj 100 120 *~ 0.5;
#X obj 100 200 dac~;
#X connect 0 0 1 0;
#X connect 1 0 2 0;
#X connect 1 0 2 1;
That means an agent can generate patches directly, but only if it respects the format. Object indices are sequential. Connections reference those indices. Every line ends in a semicolon. Signal objects with ~ live in the audio domain. Control objects do not. sig~ and snapshot~ are the bridges. Small mistakes break the patch.
The skill includes the patch grammar, layout guidelines, object-selection strategy, and references for vanilla Pd, ELSE, Cyclone, hvcc, and pd-lua. That gives the agent a practical path through several different jobs:
- Make a quick synth or effect patch
- Prefer ELSE objects for modern plugdata work
- Use Cyclone when porting Max/MSP ideas
- Stay inside the hvcc-compatible subset when compiling to C/C++, DPF, Daisy, or Web Audio
- Drop into pd-lua when the patch needs custom logic
The hvcc part is especially important. A patch that opens in plugdata is not automatically a patch that compiles. The skill keeps track of constraints like supported objects, @hv_param annotations, signal-domain preferences, ignored block~, table rules, and target generators. It gives the agent a way to answer the second question, not just "can we make this patch?" but "can this patch become a plugin or embedded target?"
JUCE and the Plugin Contract
The juce-plugin skill is about respecting the plugin lifecycle. JUCE is powerful, but it has strong opinions: AudioProcessor owns DSP, AudioProcessorEditor owns UI, APVTS bridges parameters and state, prepareToPlay() is where buffers get sized, and processBlock() is where real-time discipline matters.
The skill covers project scaffolding with CMake, juce_add_plugin, APVTS parameter layout, attachments, state serialization, DSP module chains, custom editors, meters, FFT displays, WebView UIs in JUCE 8, and format-specific targets like VST3, AU, AAX, LV2, and Standalone.
The most important file in that skill might be the audio-thread safety reference. It states the rules plainly:
- Never allocate on the audio thread
- Never take locks on the audio thread
- Never call blocking functions on the audio thread
- Cache parameter pointers
- Use atomics and lock-free queues for cross-thread communication
- Pre-allocate in
prepareToPlay() - Use
ScopedNoDenormals
This is the kind of guidance agents need because JUCE code can look correct while being musically unusable. A generated plugin that crackles under automation is not a success. A plugin that validates, saves state, restores parameters, behaves at tiny buffer sizes, and keeps UI work off the audio thread is much closer.
VCV Rack as Modular Software
The vcv-rack-plugin skill is close to my day-to-day world. Rack has its own mental model: a plugin contains modules, each module has a Module engine and a ModuleWidget panel, and process() is called at audio rate.
The skill teaches that structure along with the Rack-specific details that matter:
plugin.jsonmanifest rulesplugin.hppandplugin.cppregistrationcreateModel<Module, ModuleWidget>()configParam,configInput,configOutput, and bypass routing- SVG panel dimensions in millimeters
- Component placement for knobs, ports, screws, lights, and switches
- Voltage conventions for audio, CV, gates, triggers, and pitch
- Polyphony patterns
- JSON state persistence
- DSP testing outside the Rack runtime
- Cross-platform builds and release workflows
Rack is a perfect example of why agent skills need framework empathy. A generic C++ agent may know what a class is. That does not mean it knows that Rack audio is commonly +/-5V, that CV might be 0-10V or +/-5V, that pitch is 1V/oct, or that a module panel is an SVG with Eurorack dimensions.
The skill gives the agent those instincts before it starts writing code.
Vult and Stateful DSP Without Boilerplate
The vult-dsp skill exists for a different kind of audio thinking. Vult is a transcompiled DSP language that can generate C/C++, JavaScript, Lua, Pure Data externals, Teensy objects, and code for VCV Rack workflows.
Its central idea is state. Filters remember previous samples. Oscillators remember phase. Envelopes remember their stage. In C++, you usually build structs and member variables around that. In Vult, mem variables and function context make that state explicit in the language:
fun filter(x, fc) {
mem y;
val alpha = 0.1;
y = y + (x - y) * alpha;
return y;
}
That is a useful thing for an agent to know, because it changes how you design the answer. You think in small DSP functions, named contexts for stereo or oversampling, compile-time targets, and generated integration code. The skill also carries Vult-specific conventions for signal ranges, fixed-point output, lookup tables, WAV embedding, and platform integration.
It is not the right tool for every project. But when you want compact, stateful DSP that can travel across targets, Vult is a sharp one.
The Pattern Across All of Them
The skills are different, but they share a common design principle: preserve the constraints of the medium.
Audio engineering is not just algorithms. It is where the algorithm runs, how often it runs, what format wraps it, what state survives a patch reload, what the host expects, what the UI thread is allowed to touch, and what happens when the user changes a parameter at exactly the wrong time.
That is why these skills include references instead of only examples. The references let the agent pull the right context for the job:
- A filter task should read filter design and smoothing patterns
- A VCV module should read Rack API and panel placement notes
- A JUCE plugin should read parameter management and thread-safety guidance
- A plugdata patch should read
.pdformat and object catalogs - A Faust target question should read compilation options
- A Vult integration task should read code generation details
The result is less guesswork. The agent can enter a repo, recognize the audio environment, and respond with the right local rules.
What I Want From Audio Agents
I do not want audio agents that simply produce more code. We already have plenty of ways to make code appear.
I want agents that can help think through a signal path, choose a representation, protect the real-time thread, test stability, package a plugin, generate a patch that opens, and notice when a target has constraints that change the solution.
Sometimes that means writing a JUCE effect with APVTS done properly. Sometimes it means generating a plugdata prototype in five minutes. Sometimes it means turning a DSP idea into Faust so it can become WebAssembly. Sometimes it means building a Rack module and remembering that voltage is part of the API.
These skills are a step toward that kind of agent: not a generic assistant wearing an audio hat, but a collaborator with enough embedded engineering memory to be useful in the studio and in the source tree.
The collection is open source and installable with:
npx skills add shortwavlabs/agent-skills
The goal is simple: make agents better at the details that audio people already know are not details.
