In the age of remote collaboration and videoconferencing, AI plays an important role in speech processing to remove background noise, enhance speech, and gain insights into speech and audio streams. But how do you assess the efficacy of such technology in the context of decades of legacy hardware and software solutions? At BabbleLabs, in addition to objective metrics, we use a listener perception-based framework to evaluate the efficacy of our AI-based speech enhancement models and products.
Subjective assessments with listener perceptions on sound quality
The gold standard for assessing speech and sound quality utilizes the subjective opinions of a large, diverse panel of human listeners. Traditionally, this process is laborious and expensive to conduct. There are established objective measures using predictive models. However, the predictions may be valid only for very specific types of distortions, and useful, at the earlier stages of audio algorithm development.
At BabbleLabs, we use objective measures (such as: PESQ, ESTOI, SNR) in the early stages of the algorithm development, since they are fast and inexpensive to apply. In the intermediate and, especially at the final stages, we place greater reliance on subjective, human listener opinions in order to:Leverage the gold standard in sound quality assessment Get feedback from real listeners in real-life listening environments Gather reliable opinions for many types of audio distortions
To this end, we have developed a simple, comparative, crowdsourced, subjective testing framework that is performed at scale with a large number of individuals in “our ...