wavelets: Multimodal Speech Reproduction

Thursday, June 10, 2004

Multimodal Speech Reproduction

Speech is usually understood as an acoustic process, but it has been proved that listeners also acquire visual information during a dialogue [1][2]. Speech perception is a bimodal process, in which both auditory and visual perception play their roles. A striking demonstration of this fact was discovered when Harry McGurk and John MacDonald were studying how infants perceive speech during different stages of development and accidentally created a videotape with the audio syllable /ba/ dubbed onto a visual /ga/. When listeners watched the tape they perceived /da/, which is in articulatory means between these two. This audio-visual illusion has become known as the McGurk effect [3][4].

[1] Q. Summerfield, Use of visual information for phonetic perception, Phonetica, no. 36, pp. 314–331, 1979.
[2] E. Vatikiotis-Bateson, I.M. Eigsti, S. Yano, and K. Munhall, Eye movement of perceivers during audiovisual speech perception, Perception & Psychophysics, 1998.
[3] H. McGurk and J. MacDonald, Hearing lips and seeing voices, Nature, no. 264, pp. 746–748, 1976.
[4] J. MacDonald and H. McGurk, Visual influences on speech perception processes, Perception and Psychophysics, no. 24, pp. 253–257, 1978.

* text extracted from "A System for Multimodal Speech Reproduction" by Nicolau Werneck, Lucas Malta, Leonardo Araujo and Hani Yehia, published in SIBGRAPI 2003.

Thursday, June 10, 2004

Multimodal Speech Reproduction

No comments: