Today is the Day — Reflections

BabbleLabs has just launched broad production availability of our commercial speech API, web service, and phone mobile apps for iPhone and Android. These services clean up video and audio recordings to make the speech much easier to understand. The apps work on existing videos as well as new audio and video recorded inside the app. In either case, simply select the item you want to enhance and the app strips out virtually all of the background noise. You can then choose to post or share the enhanced content, or keep it just for yourself. It’s fun to experiment — download the app for free and process your first 125 minutes of video or 250 minutes of audio at no cost — that amount will last most users for a long time. Rest assured, the apps are completely private; they never store anything in the cloud or share anything with BabbleLabs.

You can download the apps here:
Android: https://play.google.com/store/apps/details?id=com.babblelabs.clearcloud
iOS: https://itunes.apple.com/us/app/babblelabs-clearcloud/id1438037795?mt=8

We have great content to explore on our web site — you can learn more about the App, API, and Web Interface, and how BabbleLabs achieves such great results. Explore the site — in Gabby’s Lab, you can see more examples and send us your own!

This release is a major milestone for BabbleLabs — the culmination of more than a year’s effort by a remarkable team. The milestone has triggered some reflections on my career to date.

I have spent almost my whole career on technology start-ups — ...

Continue Reading

Small, Medium, Large: Finding the Right Company Fit

Rachel Gardner, 2018 Summer Intern, Stanford University Computer Science, Class of 2020

In looking for a job, there is a constant question: big company vs small company? Rather than answer this question, I chose “all of the above” and interned at a medium-sized company (Silicon Labs), a large company (NVIDIA) and now a small company (BabbleLabs), one after the other. The first and most obvious difference is that I always have to explain what BabbleLabs does, as “I work at a deep learning startup” usually invokes a fair amount of interest (along with a few knowing smiles). In case you were also wondering, Babblelabs is a speech processing company, using advanced neural networks for cloud and devices. It’s less than a year old, but has already launched its first product (5 months after raising the first $4M).

With such a small company (about 9 in BabbleLabs’ San Jose office), I was immediately treated as a full-time employee, with all that brings. In the often unstructured environment of a startup, I found that my past work at larger companies gave me the experience to impose my own structure: setting goals for the internship, calling meetings to discuss milestones, etc. It was clear that the decisions of the founding team were similarly influenced by experience with more established companies, both in terms of how to do things and in terms of what to avoid. Because of the small size of the company, I had the opportunity ...

Continue Reading

Expected Surprises | Part IV: Better = Cheaper

Part I: Restoring the Past
Part II: Rebuilding Speech 
Part III: What’s Missing?
Part IV: Better = Cheaper

Part IV: Better = Cheaper

Just in the last week, we realized another important way to leverage the remarkable speech enhancement progress of BabbleLabs. Normally, people think of better speech enhancement as delivering just that — an improved experience. But improvements in quality can sometimes be transmogrified into reductions in cost.  This turns out to be true for communications and storing speech. Communications systems have employed speech coding for decades, to deliver adequate speech quality over narrow bandwidth. However, as the aggressiveness of the encoding increases — squeezing speech into the fewest possible bits per second — the quality of the speech suffers. On top of that, the most ambitious speech coding methods attempt to model the characteristics of human speech production.

Modern speech codecs assume a "source-filter" model of speech production, and typically use two speech components: white Gaussian noise for unvoiced phonemes and periodic pulse train for voiced speech sounds. They use Linear Predictive Coders for the filter that represents resonances of the vocal tract.  These concise models work pretty well in the absence of noise, but non-speech noise doesn’t encode well in these models, so that noisy speech is often distorted by these speech coding methods, especially at lower bit-rates.

Combining state-of-the-art speech codecs with state-of-the-art speech enhancement addresses these limitations quite well. We can use BabbleLabs Clear to remove noise, so that speech codecs (e.g., ...

Continue Reading