During my training as an engineer, one of the most obvious concepts I had to learn was “always check your assumptions”. This maxim is applicable in many circumstances. All too often, you find a mismatch between how things should behave, and how they are actually functioning. And this leads us to our main subject — why those of us building BabbleLabs, a speech company, care so deeply about noise in speech. We care so much, we derived our company's name from noise; we are BabbleLabs, not SpeechLabs, after all.
When I hear stories about how computers have achieved better-than-human speech recognition results, I wonder how that can be true, and yet I still cannot successfully dictate a number to my phone. Even if short message transcription works in the comfort of your private office, it completely falls apart in your car.
These observations are screaming, Check Your Assumptions!! It turns out that most automatic speech recognition (ASR) work has historically been primarily focused on doing ASR in anechoic circumstances (i.e. without echoes or reverberation). Robustifying ASR for real-world environments has been a second step built on top of the work done for anechoic ASR. To be clear, that approach is not wrong — you need to crawl before you can walk. However, this approach has its inherent limitations. At BabbleLabs, we have started from the other end of the problem, by addressing the noise.
Noise, of course, means a lot of things; additive noise, modulated noise, as well as linear and non-linear impairments that degrade the quality and intelligibility of the speech. It also spans natural noises captured at the source, as well as (potentially) noise created in the channel as the audio is coded, transmitted, stored, retrieved and decoded. Noise also encompasses the acoustics of the environment; complex reflections affect frequencies differently, further complicating the audio environment for both humans and machines to trying to understand speech.
In our quest to follow the noise trail, we set our eyes on one of the most obvious types of noise in speech targeted for human consumption — additive noise in a single channel audio recording. From this early objective, our first product Clear Cloud was born. Now, for the first time, there is a publicly available cloud-based API and web interface available that is capable of suppressing any type of additive noise impacting your speech. We recognize that even the longest journeys start with one step. And we are on a quest to address all types of noise and distortions plaguing your speech, and to unlock your ability to #speakyourmind. In short, if you have a noise problem, think BabbleLabs, we are here to help. Because at BabbleLabs, #WeLoveNoise!