Sunday, October 17, 2004

Speech recognition

"Human beings' recognition of speech consists of many tasks, ranging from the detection of phonemes from speech waveforms to the high-level understanding of messages. We do not actually hear all speech elements; we realize this easily when we try to decipher foreign or uncommon utterances. Instead, we continuously relate fragmentary sensory stimuli to context familiar from various experiences, and we unconsciously test and reiterate our perceptions at different levels of abstraction. In other words, what we believe we hear, we in fact reconstruct in our minds from pieces of received information.
Even in clear speech from the same speaker, distributions of the spectral samples of different phonemes overlap. Their statistical density function are not Gaussian, so they cannot be approximated analytically. The same phonemes spoken by different persons can be confused too; for example, the /ε/ of one speaker might sound like the /n/ of another. For this reason, absolutely speaker-independent detection of phonemes is possible only with relative low accuracy."

(The Neural Phonetic Typewriter, Teuvo Kohonen - IEEE 1988)

No comments: