Add real-time audio separation to any pipeline

The AudioShake SDK integrates streaming-capable, real-time sound separation right into your app or self-deployed Enterprise service. AudioShake's stem separation SDK enables you to separate vocals, isolate instruments, remove music, and clean speech, in real-time, with industry-leading quality.

Available on iOS/MacOS, Android, Windows, and Linux platforms, with local inference times optimized for each. You’ll always get the best speed wherever you deploy. 

What is a stem separation SDK?

Real-time separation processes live audio streams as they happen — isolating dialogue, vocals, or instruments instantly, without sending files to the cloud or waiting for post-production.
AudioShake's stem separation SDK runs models locally on-device, making clean, separated audio available before it reaches the next stage of your pipeline. For dubbing and captioning, dialogue is isolated from crowd noise or background music the moment it's captured. In broadcast, music can be removed from streams to ensure rights compliance. For speech workflows, developers can turn messy, real-world audio into clean, structured inputs for ASR and LLM systems.  And for music apps, stem-level control lets users interact and mix tracks in real time.
On-device, no cloud processing
11ms dialogue isolation latency
Up to 200x real-time inference
01

Separation models across voice, film, TV, and music

AudioShake's SDK gives developers access to real-time music removal, dialogue separation, and instrument stem isolation – all with low latency performance, on-device, across iOS, Android, Windows, and Linux.
DIALOGUE
Dialogue Isolation
View product page

Isolates spoken dialogue from background sound in real-time streams. Cleans voice inputs before they reach ASR, transcription, translation, or A1 audio engineer workflows — with ~25% improvement in ASR accuracy in noisy audio environment.

DIALOGUE RT
Low-Latency Dialogue Isolation
New – 11ms latency

Dialogue RT delivers 11ms latency for live broadcast workflows — built for live sports, news, commentary, transcription, and real-time speech applications.

Read more about Dialogue RT →
Film: “Hidden in Plain Sight” — Gregg Dunham & Mason Frenzel
Dialogue Isolation
0:00
COPYRIGHT
Commercial Music Removal
View product page

Removes copyrighted background music from live or streamed audio while preserving dialogue and effects. Built for broadcasters, sports commentary, and content platforms managing copyright compliance on live feeds.

Film Credits: Jaywalker Music
Commercial Music Removal
0:00
MUSIC
Instrument Stem Separation
View product page

Isolates vocals, instruments, or up to 14 different instrument stems from any song in real time. Used by music apps, learning platforms, and DJ tools to give users stem-level control inside a consumer experience.

02

Built for production workflows

The SDK runs locally on-device, integrates in a few lines of code, and fits into mobile apps, desktop DAWs, live streaming platforms, embedded devices, and high-volume on-premise media processing workflows.
11ms
dialogue isolation latency–first to meet live broadcast threshold
200x
real-time inference speeds
25%
reported ASR accuracy improvement with SDK preprocessing
03

Authentic and scalable speech recovery

VOICE AI
Improve transcription accuracy with speech isolation
Models as small as 9mb 200x realtime
11ms latency
NPU/GPU/CPU runtimes available for real-time performance
Native support for low-res and high-res audio
MUSic
Power music production, mixing, songwriting, and education apps
Up to 14 instrument targets or joint 4-stem, 6-stem, drum kits, vocals available
Up to 250x real-time processing (vocals) with per-platform optimizations
Cross-platform SDKs available: iOS/MacOS, Windows, Linux, Android
BROADCAST
Remove copyright material from your audio

Isolate clean dialogue from crowd noise
Streaming-capable dialogue and music removal models
Models as small as 9mb 200x realtime
11ms latency
Support for hi-res audio
ON PREM/SElf-deployed
Run any of AudioShake’s edge or API models in your own cloud or offline
All API models are available for safe and secure local inference
Manage compute and process large amounts of data
Streaming or batch API available
04

Get started with our SDK

SDK
Bring sound separation to your edge device
On-device inference, no cloud round-trip, under 50ms latency or better. Includes sample apps, integration guides, and demo code. Contact to access.
REQUEST ACCESS
API
Evaluate before committing to on-device
Full model access via our API. No hardware requirements. Same separation quality, cloud-based. Ideal for prototyping, batch processing, or teams not yet building for edge.
ACCESS NOW
05

FAQ

No items found.
Get in touch.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.