Thursday, June 10, 2004

Multimodal Speech Reproduction

Speech is usually understood as an acoustic process, but it has been proved that listeners also acquire visual information during a dialogue [1][2]. Speech perception is a bimodal process, in which both auditory and visual perception play their roles. A striking demonstration of this fact was discovered when Harry McGurk and John MacDonald were studying how infants perceive speech during different stages of development and accidentally created a videotape with the audio syllable /ba/ dubbed onto a visual /ga/. When listeners watched the tape they perceived /da/, which is in articulatory means between these two. This audio-visual illusion has become known as the McGurk effect [3][4].

[1] Q. Summerfield, Use of visual information for phonetic perception, Phonetica, no. 36, pp. 314–331, 1979.
[2] E. Vatikiotis-Bateson, I.M. Eigsti, S. Yano, and K. Munhall, Eye movement of perceivers during audiovisual speech perception, Perception & Psychophysics, 1998.
[3] H. McGurk and J. MacDonald, Hearing lips and seeing voices, Nature, no. 264, pp. 746–748, 1976.
[4] J. MacDonald and H. McGurk, Visual influences on speech perception processes, Perception and Psychophysics, no. 24, pp. 253–257, 1978.

* text extracted from "A System for Multimodal Speech Reproduction" by Nicolau Werneck, Lucas Malta, Leonardo Araujo and Hani Yehia, published in SIBGRAPI 2003.

No comments: