BabbleLabs is now part of Cisco

Technology: Speech Science, Critical Communications, and Voice Control

We’ve only just begun. Clear Cloud has launched successfully, with Clear Edge and Clear Command close behind. We’ve got a lot more in the works, and we don’t dream small around here.

Check this page, our blog and Gabby's Lab to take a deeper dive into the convergence of speech science, deep learning technology, and advanced audio processing techniques.

Speech is a vital medium
Current digital technology forces us to use keyboards, touchscreens, and trackpads. Speech is a more natural, accessible, and efficient way for humans to interact with electronic devices. BabbleLabs is taking speech enhancement to the next level, creating technology that learns your voice and enhances clarity, intelligibility, and identification in real-world scenarios — from the studio to the street — for both human-to-machine and human-to-human interactions.

Beyond speech recognition
There is a lot more we can do with speech technology. Not everybody has a professional sound studio — we want to help you pursue opportunities in speech recognition and analytics in the cloud, audio and video infotainment systems, automotive autonomy, advanced telephony, home automation, industrial control systems — or whatever you dream up next.

Have ideas or questions about potential applications? We'd love to hear from you.

Your world is ready to hear you
We envision devices that learn a voice over time and personalize the algorithms that shape speech-driven applications. Connected environments should be customized to suit our individual needs.

A personalized environment is private when it should be, accurate when communicating, and controlled by the user at all times. Enhanced speech is more precise, eliminating the errors and privacy issues that can make “smart” devices frustrating to use.

What can you do with intelligent speech enhancement tools?
As you can imagine, speech recognition is just the tip of the iceberg. Deep learning, advanced speech processing, and optimized computation are changing everything.

The core elements of any speech-driven interface are speech clarity and intelligibility. Imagine leveraging clear, comprehensible human speech in your development. Harness the power of human speech and eliminate the frustration users experience when they aren’t heard, understood, and served with precision on the first try.

Speech Capabilities

Use BabbleLabs Clear Cloud to increase the quality and accuracy of your product in these areas:

  • Enhancement for noise reduction and speech reconstruction
  • Speaker identification and authentication
  • Video production and live audio streaming
  • Separation of speech threads
  • Analysis for language, accent, and emotion
  • Speech generation
  • Searchability
  • Efficient audio storage and voice archiving

According to the normalized covariance metric (NCM), an objective measure of speech intelligibility, our average score in high noise conditions (where noise energy is comparable to speech energy) goes from 0.55 to 0.80 (on a scale of 0-1). That works out to a 56% relative reduction of unintelligibility on that metric.

Platforms & Applications

These capabilities can be personalized and customized on a variety of technology platforms from cloud to edge: phones and tablets, desktops, TVs, smart home devices, headsets, conference calling software and systems, augmented reality devices, and more. Delivered at mass scale, these capabilities provide specialized audio services for speaker, vocabulary, music, wind, traffic noise, and more.

Enabling a more human interface
A wave of change is sweeping towards us, with the potential to dramatically change the interactions between people and our electronic environment. Better human-machine interaction has the potential to displace the keyboards, mice, remote controls and touchpads we have learned to use, albeit painfully. In a very real sense, the old interfaces required us to retrain the neurological networks in our brains — new speech interfaces move that neural network effort onto the computers.

The science of satisfying speech
Here at BabbleLabs, we’re driving speech science forward using powerful combinations of audio signal processing and neural network models to extract more information, remove ambiguity, and improve the quality of speech-based systems.

Speech enhancement electronically processes natural human speech to pare back the noise that makes speech hard to comprehend and unpleasant to listen to. And deep networks can follow more than sounds; they provide powerful means to overcome conflicting voices, audio impairments, and confusion of meaning.

Speech has all the key characteristics — including big data and deep complexity — that make it a prime candidate for neural network processing. BabbleLabs’ neural network has been exposed to hundreds of thousands of hours of speech- and noise-based training data. To our knowledge, no one has attempted to produce speech enhancement AI at this scale.


The measures of success
Generally speaking, there are two sets of criteria we apply to speech:

  • Comfort: How does it feel to listen to the speech? Is it annoying or uncomfortable? Is the noise or reverberation distracting?
  • Intelligibility: Regardless of how noisy or unpleasant the speech may be, how completely can you actually make out the words and the speaker’s intent? Can you understand?

The telephony industry has produced an extensive body of work around measuring speech clarity and intelligibility (e.g., ITU standards); these metrics provide a good start. (More about metrics).