Clear speech for better meetings

If you are a business professional like me, I bet you spend at least a couple of hours a day or even more in meetings of some sort – with your colleagues, partners or customers. Looking back at my own career, in my 20+ years of working and studies, I’ve probably spent 10+ years just in meetings. What started as in-person meetings, evolved to audio conferencing. In just over a decade, we went from dedicated video conferencing and telepresence rooms, to distributed workforce accustomed to video conference calls from any device, anywhere. That is remarkable progress in technology bolstered by cloud, mobile, advances in audio-video codecs as well as improvements in hardware chipsets and equipment design.

Organizations – small and large enterprises, education and even healthcare institutions – spend millions of dollars on expensive audio-visual equipment and acoustic design of conference rooms to provide the best possible experience to their employees and their customers. Yet the audio quality is often impaired by distracting background noises, for example keyboard typing, HVAC systems, or café noise from remote participants. And when the audio quality is bad, the conference suffers. The whole team’s productivity suffers.

At BabbleLabs, we love speech. We love noise even more. We want to clearly distinguish speech from noise.

Speech tells us important aspects about the conversation – emotions, intent, cultural background and more. With our industry expertise in deep learning, speech science and embedded systems, we trained our novel deep learning neural networks with a ...

Continue Reading

Enhancing Speech to Solve the Pervasive Problem in Conferencing

Seventy years ago, the journalist William H. Whyte coined a popular adage, “The single biggest problem in communication is the illusion that it has taken place.“ Regardless of who the quote is ascribed to (sometimes even George Bernard Shaw is given credit), it gets at the perennial tension between the necessity of communication and the daunting difficulty in making it happen. This is especially true in large organizations with distributed teams. 

Large organizations emerge because they make humans more effective. Corporations, volunteer groups and the military all harness the coordinated energy and diverse talents of teams to create benefits unavailable to individuals. Everything that organizations need for success – shared vision, efficient allocation of resources, coordinated action, communal learning processes – is ultimately built on investment in good communication.

How does the modern organization communicate?
With a marvelous and complex diversity of methods – face-to-face meetings, mail, email, texts, live meetings, phone calls, video and audio conferences, video broadcasts and more. While many are asynchronous communications, live video, and especially live audio, are particularly pervasive, yet often problematic.

We can roughly break this list of communications methods down into two broad categories – non-real time content sharing methods and real-time audio-video methods. Within audio-video collaboration channels, it’s pretty clear that audio is central.  After all, you can have a productive audio conferencing experience without video, but video conferencing without good audio is sadly ineffective. All these tools play different roles in the overall team collaboration experience. Text ...

Continue Reading

Is Speech Recognition Ready for a Noisy World?

Speech-centric user interfaces abound. Siri on the iPhone introduced us to the potential for using speech to control our phones.  Amazon’s Alexa service brought a wealth of new information services into our living rooms. Google’s high quality speech recognition now spans phones and smart speakers and is starting to bring on-the-fly speech translation within our reach. On the surface, it seems that speech interfaces are now ready for the real world.

The reality, alas, is not so rosy. The real world is a chaotic place with serious impairments to understanding because of loud background noises, acoustic reverberation and faulty communications channels. Both human-to-human and human-to-machine communications suffer with severe limitations in comprehensibility. It is worst, naturally, when the noise is loudest, for example outdoors in crowded cafes, in cars, in the kitchen and on the factory floor. Moreover, comprehension by people and by machines (automatic speech recognition or ASR) is worse when there is little context. We often have an easier time understanding a whole sentence than a short sequence of words because the longer utterance gives us more context to use in disambiguating the speech.

A simple experiment illustrates the problem. I recently captured a string of voice commands in a noisy and in a noise-free environment. The noisy environment was set up with playback of recorded crowd noise setup with the power level the same as my speech – 0dB signal-to-noise ratio. I fed the audio stream into the excellent IBM Watson speech recognition system. ...

Continue Reading