Part I: Restoring the Past
Part II: Rebuilding Speech
Part III: What’s Missing?
Part IV: Better = Cheaper
Just in the last week, we realized another important way to leverage the remarkable speech enhancement progress of BabbleLabs. Normally, people think of better speech enhancement as delivering just that — an improved experience. But improvements in quality can sometimes be transmogrified into reductions in cost. This turns out to be true for communications and storing speech. Communications systems have employed speech coding for decades, to deliver adequate speech quality over narrow bandwidth. However, as the aggressiveness of the encoding increases — squeezing speech into the fewest possible bits per second — the quality of the speech suffers. On top of that, the most ambitious speech coding methods attempt to model the characteristics of human speech production.
Modern speech codecs assume a "source-filter" model of speech production, and typically use two speech components: white Gaussian noise for unvoiced phonemes and periodic pulse train for voiced speech sounds. They use Linear Predictive Coders for the filter that represents resonances of the vocal tract. These concise models work pretty well in the absence of noise, but non-speech noise doesn’t encode well in these models, so that noisy speech is often distorted by these speech coding methods, especially at lower bit-rates.
Combining state-of-the-art speech codecs with state-of-the-art speech enhancement addresses these limitations quite well. We can use BabbleLabs Clear to remove noise, so that speech codecs (e.g., ...