What is an SDR score? Who they matter to and why.

AudioShake
September 25, 2024

In the world of audio separation, one of the most widely used metrics to gauge performance is the Signal-to-Distortion Ratio (SDR) score. SDR scores compare the quality of the source audio to the separated version to measure how well the model performed. They provide a standardized way for researchers and developers to compare the quality of different separation models, offering numerical benchmarks for consistent evaluations across the field.

But audio quality, especially when used in practical, real-world settings, isn’t just about the numbers. What really matters is how good it sounds to humans. If the separated track doesn’t sound natural or usable to people, then a high SDR score is a nice bragging right, but with little practical benefit. While SDR scores are a valuable part of measuring performance, they don’t necessarily align with what people perceive as quality audio. In fact, over-reliance on this metric can be misleading. 

The Limitations of SDR Scores

One of the key issues with SDR scores is that they focus strictly on mathematical accuracy rather than perceptual quality—or what actually sounds good to the human ear. Achieving a high SDR score might indicate strong technical performance, but the output may still sound unnatural or awkward when listened to by real people.This is a fundamental challenge in the field of audio separation.

In fact, in Sony’s annual Demixing Challenge, though ByteDance won the highest SDR score in one of the source separation challenges, they ranked third when evaluated based on perceptual metrics. In a research paper released by the organizers, they reached a similar conclusion that there is very little correlation between the best metrics and perceptual quality. This tracks with what we’ve seen at AudioShake–we’ve won the overall Sony Demixing Challenge twice, but have repeatedly seen that 

our highest scores didn’t match to what we considered to be our best output. 

At AudioShake, while we respect and utilize SDR scores to track our performance, we know they are not the full picture. What matters most to our customers—whether they are musicians, content creators, or broadcasters—is how the separated audio performs in their specific environments. 

Perceptual Metrics: A More Holistic Approach

To address this gap, we’ve supplemented traditional metrics like SDR scores with perceptual metrics. These focus on the listener’s experience—how natural, clear, and usable the audio is in different real-world scenarios. Whether our customers are using our technology for music production or for clearer dialogue in films, their priority is what sounds best in the context of their work.

For us, the goal is no longer to top the SDR charts. Our objective is to deliver audio separation models that truly serve the needs of the user. Sometimes that means choosing a model that doesn’t have the highest SDR score, but performs better based on perceptual tests. After all, our customers are humans, and they know what works well in their environments.

The Balance Between Technical Accuracy and Real-World Use

In the end, it’s about finding the right balance. SDR scores provide valuable technical benchmarks, and we’ll continue to use them alongside other measures. But for us, the real measure of success lies in how well our models perform in the hands of our customers, and how they enhance real-world experiences across music, entertainment, and beyond.

This shift towards perceptual metrics represents an important evolution in the field of audio separation, and it’s something that we at AudioShake are committed to as we develop new models and improve our existing technology.