What can you do with intelligent speech enhancement tools?
As you can imagine, speech recognition is just the tip of the iceberg. Deep learning, advanced
speech processing, and optimized computation are changing everything.
The core elements of any speech-driven interface are speech clarity and intelligibility. Imagine
leveraging clear, comprehensible human speech in your development. Harness the power of human speech
and eliminate the frustration users experience when they aren’t heard, understood, and served with
precision on the first try.
Speech Capabilities
Use BabbleLabs Clear Cloud to increase the quality and accuracy of your product in these
areas:
- Enhancement for noise reduction and speech reconstruction
- Speaker identification and authentication
- Video production and live audio streaming
- Separation of speech threads
- Analysis for language, accent, and emotion
- Speech generation
- Searchability
- Efficient audio storage and voice archiving
According to the normalized covariance metric (NCM), an objective measure of speech intelligibility, our average score in high noise conditions (where noise energy is comparable to speech energy) goes from 0.55 to 0.80 (on a scale of 0-1). That works out to a 56% relative reduction of unintelligibility on that metric.
Platforms & Applications
These capabilities can be personalized and customized on a variety of technology platforms from cloud to edge: phones and tablets, desktops, TVs, smart home devices, headsets, conference calling software and systems, augmented reality devices, and more. Delivered at mass scale, these capabilities provide specialized audio services for speaker, vocabulary, music, wind, traffic noise, and more.
Enabling a more human interface
A wave of change is sweeping towards us, with the potential to dramatically change the interactions
between people and our electronic environment. Better human-machine interaction has the potential to
displace the keyboards, mice, remote controls and touchpads we have learned to use, albeit
painfully. In a very real sense, the old interfaces required us to retrain the neurological networks
in our brains — new speech interfaces move that neural network effort onto the computers.
The science of satisfying speech
Here at BabbleLabs, we’re driving speech science forward using powerful combinations of audio signal
processing and neural network models to extract more information, remove ambiguity, and improve the
quality of speech-based systems.
Speech enhancement electronically processes natural human speech to pare back the noise that makes
speech hard to comprehend and unpleasant to listen to. And deep networks can follow more than
sounds; they provide powerful means to overcome conflicting voices, audio impairments, and confusion
of meaning.
Speech has all the key characteristics — including big data and deep complexity — that make it a
prime candidate for neural network processing. BabbleLabs’ neural network has been exposed to
hundreds
of thousands of hours of speech- and noise-based training data. To our knowledge, no one has
attempted to produce speech enhancement AI at this scale.
The measures of success
Generally speaking, there are two sets of criteria we apply to speech:
- Comfort: How does it feel to listen to the speech? Is it annoying or uncomfortable? Is the noise
or reverberation distracting?
- Intelligibility: Regardless of how noisy or unpleasant the speech may be, how completely can you
actually make out the words and the speaker’s intent? Can you understand?
The telephony industry has produced an extensive body of work around measuring speech clarity and intelligibility (e.g., ITU standards); these metrics provide a good start. (More about metrics).