Read my lips: Using multiple senses in speech perception

When someone speaks to you, do you see what they are saying? We tend to think of speech as being something we hear, but recent studies suggest that we use a variety of senses for speech perception - that the brain treats speech as something we hear, see and even feel. In a new report in Current Directions in Psychological Science, a journal of the Association for Psychological Science, psychologist Lawrence Rosenblum describes research examining how our different senses blend together to help us perceive speech.

We receive a lot of our speech information via visual cues, such as lip-reading, and this type of visual speech occurs throughout all cultures. And it is not just information from lips- when someone is speaking to us, we will also note movements of the teeth, tongue and other non-mouth facial features. It's likely that human speech perception has evolved to integrate many senses together. Put in another way, speech is not meant to be just heard, but also to be seen.

The McGurk Effect is a well-characterized example of the integration between what we see and what we hear when someone is speaking to us. This phenomenon occurs when a sound (such as a syllable or word) is dubbed with a video showing a face making a different sound. For example, the audio may be playing "ba," while the face looks as though it is saying "va." When confronted with this, we will usually hear "va" or a combination of the two sounds, such as "da." Interestingly, when study participants are aware of the dubbing or told to concentrate only on the audio, the McGurk Effect still occurs. Rosenblum suggests that this is evidence that once senses are integrated together, it is not possible to separate them.

Recent studies indicate that this integration occurs very early in the speech process, even before phonemes (the basic units of speech) are established. Rosenblum suggests that physical movement of speech (that is, our mouths and lips moving) create acoustic and visual signals which have a similar form. He argues that as far as the speech brain is concerned, the auditory and visual information are never really separate. This could explain why we integrate speech so readily and in such a way that the audio and visual speech signals become indistinguishable from one another.

Rosenblum concludes that visual-speech research has a number of clinical implications, especially in the areas of autism, brain injury and schizophrenia and that "rehabilitation programs in each of these domains have incorporated visual-speech stimuli."

Source: Association for Psychological Science