Introducing Multi-Speaker Separation from AudioShake

AudioShake

March 5, 2025

Natural human conversation is dynamic, with changes in turn, overlap, emotion, and silence. But that very dynamism can create all kinds of problems for audio workflows. In film, TV, and podcasting, overlapping speech can make it difficult to edit different speakers, or isolate their voices for dubbing. In transcription and captioning,speaker overlap and background noise can dramatically lower captioning quality. And in generative voice AI, multi-speaker speech output is typically single track–meaning the user is stuck with what they’ve generated, with no ability to edit.

‍

AudioShake’s Multi-Speaker separation technology changes all that. The first product in the world to offer high resolution multi-speaker separation, AudioShake isolates multiple speakers into distinct speaker streams, delivering clean dialogue tracks for a wide variety of speech content, from media production to accessibility services. Traditional audio tools struggle to separate overlapping speech, but AudioShake’s Multi-Speaker separation technology can detect, diarize, and separate speech into distinct tracks.

‍

Voice Synthesis for Accessibility Services:

One of the most heartbreaking symptoms for many ALS patients and their family is the loss of the patient’s ability to speak. ALS non-profit Bridging Voice is using AudioShake’s voice isolation technology to help ALS patients communicate with their own voices, even after losing the ability to speak. By separating patient voices from noisy recordings, AudioShake acts a pre-step to the high-quality voice cloning provided by Eleven Labs, restoring a personal and authentic way for individuals to connect with loved ones.

‍

Improving Podcast Workflows:

AudioShake’s dialogue isolation and multi-speaker separation helps podcasters clean up messy recordings, reduce editing time, and improve sound quality. Platforms like Wondercraft integrate these tools to simplify podcast editing, allowing creators to focus on producing great content instead of fixing audio issues. The technology can also be used to separate generative speech output, as seen in this example on Wondercraft, using Google’s NotebookLM:

‍

Enabling Film, TV, & UGC Editing:

Editors have long struggled with the presence of multiple speakers in a recording. Whether in archival film, unscripted productions, or user-generated content, the inability to isolate voices can make editing and repurposing material difficult. Even in big budget, multi-tracked productions, directors often avoid speaker overlap because of these challenges. AudioShake’s technology solves this by not only isolating dialogue, music, and effects but also separating individual speakers, making complex audio more manageable.

‍

“Actors tailor their performances to make sure the take is usable,” says indie filmmaker Alex Park of Shep Films. “I’m excited to now be able to do scenes that don’t shy away from overlapping dialogue.”

‍

Pushing the Frontiers of Voice Separation

‍

AudioShake Multi-Speaker is the world’s first product to offer true hi-fi continuous speaker separation. Our model operates at a high sampling rate, making it suitable for broadcast-quality audio. Its advanced neural architecture offers superior speaker tracking for even hours-long recordings, seamlessly handling both high and low overlap scenarios—for example, in lively debates or one-on-one podcasts – ensuring consistent generalization performance across the board.

‍

Starting today, Enterprise users can use Multi-Speaker separation on AudioShake Live. Multi-Speaker Separation is also available via AudioShake’s API. Read our documentation on how to access and implement AudioShake’s API.