AudioShake vs Other Models

Sound Separation & Transcription

Sound Separation

AudioShake leads the field in high-quality source separation, which is typically measured via something called the Signal-to-Distortion (SDR) score. AudioShake has repeatedly demonstrated its ability to achieve the highest SDR scores and set several state of the art benchmarks.
‍
But SDR scores alone are not a good way to measure source separation–it’s quite possible to achieve a high SDR score while producing output that doesn’t sound very good. That’s why we focus even more on building models with the best perceptual quality, and routinely top the lists in third-party evaluations.

SDR Scores

The music information retrieval community typically uses something called the Signal-to-Distortion (SDR) score to measure quality, and AudioShake has repeatedly demonstrated its ability to achieve the highest SDR scores. However, Even though we hold many of the state-of-the-art benchmarks, we caution people from solely using SDR score as the best indicator of a high-quality model, because it is quite possible to achieve a high SDR score while the actual results sound poor.

That's why we have developed perceptual metrics that we use for evaluating model performance, and sometimes choose lower-SDR scores models that perform better on different tasks.

With all that said, below is a sample of our stem separation scores for music.

Hear the difference

Models from 10x-400x faster than real-time

Process hours-long files

Process high-resolution (192kHz) files

Song: "Future" by Torches

READ ABOUT SDR SCORES

Future
Torches

BASS

DRUMS

OTHER

VOCALS

SAM Audio

BASS

DRUMS

OTHER

VOCALS

Demucs v4

BASS

DRUMS

OTHER

VOCALS

Spleeter

BASS

DRUMS

OTHER

VOCALS

Transcription Accuracy

To evaluate the accuracy of transcription models, researchers use industry-wide benchmarks to evaluate the word error rate (WER), as well as punctuation and formatting related metrics. At ISMIR, AudioShake's Research team presented a new benchmark based on the JamendoLyrics dataset that accounted for the finer nuances of written lyrics called Jam-ALT.

Below are the results of running various transcription systems against this new benchmark for evaluation. AudioShake has set the State of the Art benchmarks across both lyric transcription and alignment.

Get in touch.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

CAPABILITIES

POPULAR SEARCHES

CAPABILITIES

POPULAR SEARCHES

CAPABILITIES

POPULAR SEARCHES

VOICE

INFRASTRUCTURE

FILM & TV

MUSIC

BY USE CASE

VOICE

FILM & TV

MUSIC

MUSIC

LEARN

DEVELOPERS

COMPANY

AudioShake vs Other Models

Sound Separation & Transcription

Sound Separation

SDR Scores

Hear the difference

SAM Audio

Demucs v4

Spleeter

Transcription Accuracy