Writer and physician Oliver Wendell Holmes once said, “Speak clearly, if you speak at all; carve every word before you let it fall.” He, of course, was campaigning for thoughtfulness in speech, but comprehensibility of speech remains crucially important. We all want dearly to understand and be understood by other people — and increasingly by our devices that connect us to our world.
Speech has always been humans preferred interface method — in fact the emergence of hominid speech a couple of hundred thousand years ago coincided with the origins of the species homo sapiens. But, we live now in sonic chaos — a noisy world that is getting noisier every day. We talk in cars, in restaurants, on the street, in crowds, in echo-y rooms and in busy kitchens — there are always kids yelling, horns honking or loud music playing. What can we do to combat the noise, and make speech and understanding easier?
Part of the emerging answer is speech enhancement, the electronic processing of natural human speech to pare back the noise that makes speech hard to comprehend and unpleasant to listen to. Electronic engineers have been concerned about the speech quality for almost a century, really since the emergence of radio broadcasts. Most of the speech improvement methods have evolved relatively slowly in the last couple of decades, representing gradual refinement of DSP algorithms looking to separate speech sounds from background noise sounds. The emergence of deep neural network ...