Add real-time audio separation to any pipeline

The AudioShake SDK integrates streaming-capable, real-time sound separation right into your app or self-deployed Enterprise service. AudioShake's stem separation SDK enables you to separate vocals, isolate instruments, remove music, and clean speech, in real-time, with industry-leading quality.

Available on iOS/MacOS, Android, Windows, and Linux platforms, with local inference times optimized for each. You’ll always get the best speed wherever you deploy.

What is a stem separation SDK?

Real-time separation processes live audio streams as they happen — isolating dialogue, vocals, or instruments instantly, without sending files to the cloud or waiting for post-production.

AudioShake's stem separation SDK runs models locally on-device, making clean, separated audio available before it reaches the next stage of your pipeline. For dubbing and captioning, dialogue is isolated from crowd noise or background music the moment it's captured. In broadcast, music can be removed from streams to ensure rights compliance. For speech workflows, developers can turn messy, real-world audio into clean, structured inputs for ASR and LLM systems. And for music apps, stem-level control lets users interact and mix tracks in real time.

On-device, no cloud processing

11ms dialogue isolation latency

Up to 200x real-time inference

Separation models across voice, film, TV, and music

AudioShake's SDK gives developers access to real-time music removal, dialogue separation, and instrument stem isolation – all with low latency performance, on-device, across iOS, Android, Windows, and Linux.

DIALOGUE

Dialogue Isolation

View product page →

Isolates spoken dialogue from background sound in real-time streams. Cleans voice inputs before they reach ASR, transcription, translation, or A1 audio engineer workflows — with ~25% improvement in ASR accuracy in noisy audio environment.

DIALOGUE RT

Low-Latency Dialogue Isolation

New – 11ms latency

Dialogue RT delivers 11ms latency for live broadcast workflows — built for live sports, news, commentary, transcription, and real-time speech applications.

Built for production workflows

The SDK runs locally on-device, integrates in a few lines of code, and fits into mobile apps, desktop DAWs, live streaming platforms, embedded devices, and high-volume on-premise media processing workflows.

11ms

dialogue isolation latency–first to meet live broadcast threshold

200x

real-time inference speeds

25%

reported ASR accuracy improvement with SDK preprocessing

Authentic and scalable speech recovery

VOICE AI

Improve transcription accuracy with speech isolation

Models as small as 9mb 200x realtime
11ms latency

NPU/GPU/CPU runtimes available for real-time performance

Native support for low-res and high-res audio

MUSic

Power music production, mixing, songwriting, and education apps

Up to 14 instrument targets or joint 4-stem, 6-stem, drum kits, vocals available

Up to 250x real-time processing (vocals) with per-platform optimizations

Cross-platform SDKs available: iOS/MacOS, Windows, Linux, Android

BROADCAST

Remove copyright material from your audio

Isolate clean dialogue from crowd noise

Streaming-capable dialogue and music removal models

Models as small as 9mb 200x realtime
11ms latency

Support for hi-res audio

ON PREM/SElf-deployed

Run any of AudioShake’s edge or API models in your own cloud or offline

All API models are available for safe and secure local inference

Manage compute and process large amounts of data

Streaming or batch API available

Get started with our SDK

SDK

Bring sound separation to your edge device

On-device inference, no cloud round-trip, under 50ms latency or better. Includes sample apps, integration guides, and demo code. Contact to access.

REQUEST ACCESS

API

Evaluate before committing to on-device

Full model access via our API. No hardware requirements. Same separation quality, cloud-based. Ideal for prototyping, batch processing, or teams not yet building for edge.

ACCESS NOW