The Internet has been abuzz this past week with discussion of a single utterance. People everywhere are talking about “Laurel versus Yanny”, an utterance that sounds more like “Laurel” to some, and more like “Yanny” to others. See the NY Times article. In fact, you can get from something very clearly “Laurel” to something clearly “Yanny” by changing the gain on different frequency bands. “Laurel,” emphasized from the high frequencies are attenuated, and “Yanny” stands out when the high frequencies are boosted. Find a fun comparison tool here. Whether you hear “Laurel” or “Yanny” is partially determined by your auditory systems sensitivity to high frequencies.
This lively but inconsequential debate exposes the tip of an iceberg of more substance to speech experts. People want clearer speech in every situation where intelligibility or comfort are important. Nobody wants to listen to such overwhelming loud noise or such heavy reverberation that all intelligibility is lost. Alexander Graham Bell famously first transmitted voice on a wired system in 1875 and Reginald Aubrey Fessenden transmitted voice by wireless in 1900. Needless to say the voice quality was lousy. Since then, speech and communications engineers working on speech-based systems have developed a range of metrics to allow for better comparison of the level of noise and degree of intelligibility of electronic speech reproduction.
What is “good” sound in speech? Low noise level? High clarity? Good comprehensibility? Speech captured under ideal conditions — good microphones, anechoic recording environment, and zero additive noise — ...