Technology: What's behind the curtain

We’ve launched our first product, BabbleLabs Clear Cloud™, but we’ve got a lot more in store for you. Check this page and our blog to take a deeper dive into the convergence of speech science, deep learning technology, and advanced audio processing techniques.

Speech is a vital medium
Current digital technology forces us to use keyboards, touchscreens, and trackpads. Speech is a more natural, accessible, and efficient way for humans to interact with electronic devices. BabbleLabs is taking speech enhancement to the next level, creating technology that learns your voice and enhances clarity, intelligibility, and identification in real-world scenarios — from the studio to the street — for both human-to-machine and human-to-human interactions.

Beyond speech recognition
There is a lot more we can do with speech technology. Not everybody has a professional sound studio — we want to help you pursue opportunities in speech recognition and analytics in the cloud, audio and video infotainment systems, automotive autonomy, advanced telephony, home automation, industrial control systems — or whatever you dream up next.

Have ideas or questions about potential applications? We'd love to hear from you.

Your world is ready to hear you
We envision devices that learn a voice over time and personalize the algorithms that shape speech-driven applications. Connected environments should be customized to suit our individual needs.

A personalized environment is private when it should be, accurate when communicating, and controlled by the user at all times. Enhanced speech is more precise, eliminating the errors and privacy issues that can make “smart” devices frustrating to use.

What can you do with intelligent speech enhancement tools?
As you can imagine, speech recognition is just the tip of the iceberg. Deep learning, advanced speech processing, and optimized computation are changing everything.

The core elements of any speech-driven interface are speech clarity and intelligibility. Imagine leveraging clear, comprehensible human speech in your development. Harness the power of human speech and eliminate the frustration users experience when they aren’t heard, understood, and served with precision on the first try.

Speech Capabilities

Use BabbleLabs Clear Cloud to increase the quality and accuracy of your product in these areas:

  • Enhancement for noise reduction and speech reconstruction
  • Speaker identification and authentication
  • Video production and live audio streaming
  • Separation of speech threads
  • Analysis for language, accent, and emotion
  • Speech generation
  • Searchability
  • Efficient audio storage and voice archiving

According to the normalized covariance metric (NCM), an objective measure of speech intelligibility, our average score in high noise conditions (where noise energy is comparable to speech energy) goes from 0.55 to 0.80 (on a scale of 0-1). That works out to a 56% relative reduction of unintelligibility on that metric.

Platforms & Applications

These capabilities can be personalized and customized on a variety of technology platforms: cloud, phone, television, remote controls, smart home devices, and more. Delivered at mass scale, these capabilities provide specialized audio services for speaker, vocabulary, music, wind, traffic noise, and more.

Pro Tip: Can Clear Cloud enhance ASR output?
We're all about enhancing audio and speech for human ears. So, we don’t recommend using BabbleLabs speech enhancement as a front end for automatic speech recognition (ASR) software. It might seem logical to take raw voice input, de-noise with BabbleLabs Clear Cloud, then feed it to Alexa or Siri. This won’t work well; digital assistants are designed to handle noise in other ways. We’re working with ASR providers to enhance your interactions with digital assistants. Want to know more? Contact us, we'll figure it out together.

Enabling a more human interface
A wave of change is sweeping towards us, with the potential to dramatically change the interactions between people and our electronic environment. Better human-machine interaction has the potential to displace the keyboards, mice, remote controls and touchpads we have learned to use, albeit painfully. In a very real sense, the old interfaces required us to retrain the neurological networks in our brains — new speech interfaces move that neural network effort onto the computers.

The science of satisfying speech
Here at BabbleLabs, we’re driving speech science forward using powerful combinations of audio signal processing and neural network models to extract more information, remove ambiguity, and improve the quality of speech-based systems.

Speech enhancement electronically processes natural human speech to pare back the noise that makes speech hard to comprehend and unpleasant to listen to. And deep networks can follow more than sounds; they provide powerful means to overcome conflicting voices, audio impairments, and confusion of meaning.

Speech has all the key characteristics — including big data and deep complexity — that make it a prime candidate for neural network processing. BabbleLabs’ neural network has been exposed to hundreds of thousands of hours of speech- and noise-based training data. To our knowledge, no one has attempted to produce speech enhancement AI at this scale.

 

The measures of success
Generally speaking, there are two sets of criteria we apply to speech:

  • Comfort: How does it feel to listen to the speech? Is it annoying or uncomfortable? Is the noise or reverberation distracting?
  • Intelligibility: Regardless of how noisy or unpleasant the speech may be, how completely can you actually make out the words and the speaker’s intent? Can you understand?

The telephony industry has produced an extensive body of work around measuring speech clarity and intelligibility (e.g., ITU standards); these metrics provide a good start. (More about metrics).

What’s Behind the Curtain?

BabbleLabs utilizes powerful Nvidia V100 GPUs for training its neural networks. To produce speech output that is more pleasing and intelligible to the human ear, we use a combination of deep learning and digital signal processing (DSP) techniques.

BabbleLabs Clear Cloud runs on Google Cloud. Customers submit their audio/video files to be enhanced, and those files are processed via our cloud API in Google Cloud. Interested in having Clear Cloud available on another Cloud Service Provider (CSP)? Let us know!

LinkedIn

Start a discussion! Join us and nearly 1,500 members in the “Speech Enhancement” group on LinkedIn, moderated by our very own Kamil Wojcicki. This group is for researchers and professionals interested in the field of speech enhancement, for the purposes of networking and information sharing.