Subjects presented with coherent auditory and visual streams generally fuse them into a single percept. This results in enhanced intelligibility in noise, or in visual modification of the auditory percept in the McGurk effect. It is classically considered that processing is done independently in the auditory and visual systems before interaction occurs at a certain representational stage, resulting in an integrated percept. However, some behavioral and neurophysiological data suggest the existence of a two-stage process. A first stage would involve binding together the appropriate pieces of audio and video information before fusion per se in a second stage. Then it should be possible to design experiments leading to unbinding. It is shown here that if a given McGurk stimulus is preceded by an incoherent audiovisual context, the amount of McGurk effect is largely reduced. Various kinds of incoherent contexts (acoustic syllables dubbed on video sentences or phonetic or temporal modifications of the acoustic content of a regular sequence of audiovisual syllables) can significantly reduce the McGurk effect even when they are short (less than 4 s). The data are interpreted in the framework of a two-stage “binding and fusion” model for audiovisual speech perception.

1.
Alais
,
D.
, and
Burr
,
D.
(
2004
). “
The ventriloquist effect results from near-optimal bimodal integration
,”
Curr. Biol.
14
,
257
262
.
2.
Alsius
,
A.
,
Navarra
,
J.
,
Campbell
,
R.
, and
Soto-Faraco
,
S. S.
(
2005
). “
Audiovisual integration of speech falters under high attention demands
,”
Curr. Biol.
15
,
839
843
.
3.
Alsius
,
A.
,
Navarra
,
J.
, and
Soto-Faraco
,
S. S.
(
2007
). “
Attention to touch weakens audiovisual speech integration
,”
Exp. Brain Res.
183
,
399
404
.
4.
Andersen
,
T. S.
,
Tiippana
,
K.
,
Laarni
,
J.
,
Kojo
,
I.
, and
Sams
,
M.
(
2009
). “
The role of visual spatial attention in audiovisual speech perception
,”
Speech Commun.
51
,
184
193
.
5.
Angelaki
,
D. E.
,
Gu
,
Y.
, and
Deangelis
,
G. C.
(
2010
). “
Visual and vestibular cue integration for heading perception in extrastriate visual cortex
,”
J. Physiol.
589
,
825
833
.
6.
Arnal
,
L. H.
,
Morillon
,
B.
,
Kell
,
C. A.
, and
Giraud
,
A. L.
(
2009
). “
Dual neural routing of visual facilitation in speech processing
,”
J. Neurosci.
29
,
13445
13453
.
7.
Barker
,
J. P.
, and
Berthommier
,
F.
(
1999
). “
Evidence of correlation between acoustic and visual features of speech
,” in Proceedings of the 14th International Congress of Phonetic Sciences (ICPhS’99), San Francisco, CA, pp.
199
202
.
8.
Bernstein
,
L. E.
,
Auer
,
E. T.
, and
Moore
,
J. K.
(
2004a
). “
Audiovisual speech binding: convergence or association?
” in
The Handbook of Multisensory Processes
, edited by
G. A.
Calvert
,
C.
Spence
, and
B. E.
Stein
(
The MIT Press
,
Cambridge, MA
), pp.
203
224
.
9.
Bernstein
,
L. E.
,
Auer
,
E. T.
,
Wagner
,
M.
, and
Ponton
,
C. W.
(
2008a
). “
Spatiotemporal dynamics of audiovisual speech processing
,”
Neuroimage
39
,
423
435
.
10.
Bernstein
,
L. E.
,
Lu
,
Z. L.
, and
Jiang
,
J.
(
2008b
). “
Quantified acoustic-optical speech signal incongruity identifies cortical sites of audiovisual speech processing
,”
Brain Res.
1242
,
172
184
.
11.
Bernstein
,
L. E.
,
Takayanagi
,
S.
, and
Auer
,
E. T.
(
2004b
). “
Auditory speech detection in noise enhanced by lipreading
,”
Speech Commun.
44
,
5
18
.
12.
Bertelson
,
P.
,
Vroomen
,
J.
,
De Gelder
,
B.
(
2003
). “
Visual recalibration of auditory speech identification: A McGurk aftereffect
,”
Psychol. Sci.
14
,
592
597
.
13.
Bertelson
,
P.
,
Vroomen
,
J.
,
Wiegeraad
,
G.
, and
de Gelder
,
B.
(
1994
). “
Exploring the relation between McGurk interference and ventriloquism
,” in Proceedings of ICSLP 94, Acoustical Society of Japan, Yokohama, Vol. 2, pp.
559
562
.
14.
Berthommier
,
F.
(
2001
). “
Audio-visual recognition of spectrally reduced speech
,” in Proceedings of AVSP’01, Aalborg, pp.
183
188
.
15.
Berthommier
,
F.
(
2004
). “
A phonetically neutral model of the low-level audiovisual interaction
,”
Speech Commun.
44
,
31
41
.
16.
Besle
,
J.
,
Fort
,
A.
,
Delpuech
,
C.
, and
Giard
,
M.-H.
(
2004
). “
Bimodal speech: Early suppressive visual effects in human auditory cortex
,”
Eur. J. Neurosci.
20
,
2225
2234
.
17.
Bregman
,
A. S.
(
1990
).
Auditory Scene Analysis
(
MIT Press:
Cambridge, MA
).
18.
Brungart
,
D. S.
, and
Simpson
,
B. D.
(
2005
). “
Interference from audio distracters during speechreading
,”
J. Acoust. Soc. Am.
118
,
3889
3902
.
19.
Calvert
,
G. A.
,
Brammer
,
M.
,
Bullmore
,
E.
,
Campbell
,
R.
,
Iversen
,
S. D.
, and
David
,
A.
(
1999
). “
Response amplification in sensory-specific cortices during crossmodal binding
,”
Neuroreport
10
,
2619
2623
.
20.
Calvert
,
G. A.
,
Bullmore
,
E. T.
,
Brammer
,
M. J.
,
Campbell
,
R.
,
Williams
,
S. C. R.
, and
Mcguire
,
P. K.
(
1997
). “
Activation of auditory cortex during silent lipreading
,”
Science
276
,
593
596
.
21.
Calvert
,
G. A.
,
Campbell
,
R.
, and
Brammer
,
M.
(
2000
). “
Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex
,”
Curr. Biol.
10
,
649
657
.
22.
Campbell
,
R.
(
2008
). “
The processing of audio-visual speech: Empirical and neural bases
,”
Philos. Trans. R. Soc. London, Ser. B
363
,
1001
1010
.
23.
Cathiard
,
M. A.
,
Schwartz
,
J. L.
, and
Abry
,
C.
(
2001
). “
Asking a naive question about the McGurk Effect: why does audio [b] give more [d] percepts with visual [g] than with visual [d]?
Proceedings of AVSP-2001, pp.
138
142
.
24.
Chandrasekaran
,
C.
,
Trubanova
,
A.
,
Stillittano
,
S.
,
Caplier
,
A.
, and
Ghazanfar
,
A. A.
(
2009
). “
The natural statistics of audiovisual speech
,”
PLoS Comput. Biol.
5
,
e1000436
.
25.
Colin
,
C.
, and
Radeau
,
M.
(
2003
). “
Les illusions McGurk dans la parole: 25 ans de recherche (The McGurk illusions in speech: 25 years of research)
,”
Annee Psychol.
104
,
497
542
.
26.
Colin
,
C.
,
Radeau
,
M.
,
Soquet
,
A.
,
Demolin
,
D.
,
Colin
,
F.
, and
Deltenre
,
P.
(
2002
). “
Mismatch negativity evoked by the McGurk–MacDonald effect: A phonetic representation within short-term memory
,”
Clin. Neurophysiol.
113
,
495
506
.
27.
Correa
,
A.
,
Lupiáñez
,
J.
,
Madrid
,
E.
, and
Tudela
,
P.
(
2006
).
Temporal attention enhances early visual processing: A review and new evidence from event-related potentials
,”
Brain Res.
1076
,
116
128
.
28.
Coull
,
J. T.
, and
Nobre
,
A. C.
(
1998
).
Where and when to pay attention: The neural systems for directing attention to spatial locations and to time intervals as revealed by both PET and fMRI
,”
J. Neurosci.
18
,
7426
7435
.
29.
Driver
,
J.
, and
Noesselt
,
T.
(
2008
). “
Multisensory interplay reveals crossmodal influences on ‘sensory specific’ brain regions, neural responses, and judgments
,”
Neuron
57
,
11
23
.
30.
Ernst
,
M. O.
, and
Banks
,
M. S.
(
2002
). “
Humans integrate visual and haptic information in a statistically optimal fashion
,”
Nature
415
,
429
433
.
31.
Eskelund
,
K.
,
Tuomainen
,
J.
, and
Andersen
,
T. S.
(
2011
). “
Multistage audiovisual integration of speech: Dissociating identification and detection
,”
Exp. Brain Res.
208
,
447
457
.
32.
Ghazanfar
,
A. A.
, and
Schroeder
,
C. E.
(
2006
). “
Is neocortex essentially multisensory?
Trends Cognit. Sci.
10
,
278
285
.
33.
Gondan
,
M.
,
Niederhaus
,
B.
,
Rösler
,
F.
, and
Röder
,
B.
(
2005
). “
Multisensory processing in the redundant-target effect: A behavioral and event-related potential study
,”
Percept. Psychophys.
67
,
713
726
.
34.
Grant
,
K. W.
, and
Seitz
,
P.
(
2000
). “
The use of visible speech cues for improving auditory detection of spoken sentences
,”
J. Acoust. Soc. Am.
108
,
1197
1208
.
35.
Green
,
K.
,
Kuhl
,
P.
,
Meltzoff
,
A.
, and
Stevens
,
E.
(
1991
). “
Integrating speech information across talkers, gender, and sensory modality: female faces and male voices in the McGurk effect
,”
Percept. Psychophys.
50
,
524
536
.
36.
Hickok
,
G.
, and
Poeppel
,
D.
(
2004
). “
Dorsal and ventral streams: A framework for understanding aspects of the functional anatomy of language
,”
Cognition
92
,
67
99
.
37.
Jiang
,
J.
,
Alwan
,
A.
,
Keating
,
P. A.
,
Auer
,
E. T.
, and
Bernstein
,
L. E.
(
2002
). “
On the relationship between face movements, tongue movements, and speech acoustics
,”
EURASIP J. Adv. Signal Process.
11
,
1174
1188
.
38.
Jones
,
J. A.
, and
Callan
,
D. E.
(
2003
). “
Brain activity during audiovisual speech perception: An fMRI study of the McGurk effect
,”
Neuroreport
14
,
1129
1133
.
39.
Keil
,
J.
,
Müller
,
N.
,
Ihssen
,
N.
, and
Weisz
,
N.
(
2011
). “
On the variability of the McGurk effect: Audiovisual integration depends on pre-stimulus brain states
,”
Cereb. Cortex
, doi: .
40.
Kilner
,
J. M.
,
Friston
,
K. J.
, and
Frith
,
C. D.
(
2007
). “
Predictive coding: An account of the mirror neuron system
,”
Cognit. Process.
8
,
159
166
.
41.
Kim
,
J.
, and
Davis
,
C.
(
2003
). “
Hearing foreign voices: does knowing what is said affect masked visual speech detection?
Perception
32
,
111
120
.
42.
Kim
,
J.
, and
Davis
,
C.
(
2004
). “
Investigating the audio-visual detection advantage
,”
Speech Commun.
44
,
19
30
.
43.
Kondo
,
H. M.
, and
Kashino
,
M.
(
2007
). “
Neural mechanisms of auditory awareness underlying verbal transformations
,”
Neuroimage
36
,
123
130
.
44.
Kuhl
,
P. K.
, and
Meltzoff
,
A. N.
(
1982
). “
The bimodal development of speech in infancy
,”
Science
218
,
1138
1141
.
45.
Kuhl
,
P. K.
, and
Meltzoff
,
A. N.
(
1984
). “
The intermodal representation of speech in infants
,”
Infant Behav. Dev.
7
,
361
381
.
46.
Lallouache
,
M. T.
(
1990
). “
Un poste ‘visage-parole.’ Acquisition et traitement de contours labiaux (A ‘face-speech’ workstation. Acquisition and processing of labial contours)
,” Proceedings XVIII Journées d’Etudes sur la Parole (Montréal), pp.
282
286
.
47.
Manuel
,
S.
,
Repp
,
B. H.
,
Liberman
,
A. M.
, and
Studdert-Kennedy
,
M.
(
1989
). “
Exploring the ‘McGurk effect
,’ ” paper presented at the 24th meeting of the Psychonomic Society, San Diego.
48.
Massaro
,
D. W.
(
1987
).
Speech Perception by Ear and Eye
(
Lawrence Erlbaum Associates, Hillsdale
,
NJ
), p.
320
.
49.
Massaro
,
D. W.
(
1989
). “
Multiple book review of Speech Perception by Ear and Eye: A Paradigm for Psychological Inquiry
,”
Behav. Brain Sci.
12
,
741
794
.
50.
Massaro
,
D. W.
, and
Cohen
,
M. M.
(
1983
). “
Evaluation and integration of visual and auditorial information in speech perception
,”
J. Exp. Psychol.: Hum. Percept. Perform.
9
,
753
771
.
51.
McGrath
,
M.
, and
Summerfield
,
Q.
(
1985
). “
Intermodal timing relations and audio-visual speech recognition by normal-hearing adults
,”
J. Acoust. Soc. Am.
77
,
678
685
.
52.
McGurk
,
H.
, and
MacDonald
,
J.
(
1976
). “
Hearing lips and seeing voices
,”
Nature
265
,
746
748
.
53.
Miller
,
L. M.
, and
D’Esposito
,
M.
(
2005
). “
Perceptual fusion and stimulus coincidence in the cross-modal integration of speech
,”
J. Neurosci.
25
,
5884
5893
.
54.
Munhall
,
K. G.
,
Gribble
,
P.
,
Sacco
,
L.
, and
Ward
,
M.
(
1996
). “
Temporal constraints on the McGurk effect
,”
Percept. Psychophys.
58
,
351
362
.
55.
Munhall
,
K. G.
, and
Vatikiotis-Bateson
,
E
(
1998
). “
The moving face during speech communication
,” in
Hearing by Eye II
, edited by
R.
Campbell
,
B.
Dodd
, and
D.
Burnham
(
Taylor and Francis
,
Sussex
), pp.
123
139
.
56.
Navarra
,
J.
,
Vatakis
,
A.
,
Zampini
,
M.
,
Soto-Faraco
,
S.
,
Humphreys
,
W.
, and
Spence
,
C.
(
2005
). “
Exposure to asynchronous audiovisual speech increases the temporal window for audiovisual integration of non-speech stimuli
,”
Cognit. Brain Res.
25
,
499
507
.
57.
Noppeney
,
U.
,
Ostwald
,
D.
, and
Werner
,
S.
(
2010
). “
Perceptual decisions formed by accumulation of audiovisual evidence in prefrontal cortex
,”
J. Neurosci.
30
,
7434
7446
.
58.
Okada
,
K.
, and
Hickok
,
G.
(
2009
). “
Two cortical mechanisms support the integration of visual and auditory speech: A hypothesis and preliminary data
,”
Neurosci. Lett.
452
,
219
223
.
59.
Ponton
,
C. W.
,
Bernstein
,
L. E.
, and
Auer
,
E. T.
(
2009
). “
Mismatch negativity with visual-only and audiovisual speech
,”
Brain Topogr.
21
,
207
215
.
60.
Sato
,
M.
,
Baciu
,
M.
,
Lœvenbruck
,
H.
,
Schwartz
,
J-L.
,
Cathiard
,
M-A.
,
Segebarth
,
C.
and
Abry
,
C.
(
2004
). “
Multistable perception of speech forms in working memory: An fMRI study of the verbal transformation effect
,”
Neuroimage
23
,
1143
1151
.
61.
Sato
,
M.
,
Basirat
,
A.
, and
Schwartz
,
J. L.
(
2007
). “
Visual contribution to the multistable perception of speech
,”
Percept. Psychophys.
69
,
1360
1372
.
62.
Sato
,
M.
,
Schwartz
,
J.L.
Cathiard
,
M. A.
,
Abry
,
C.
, and
Loevenbruck
,
H.
(
2006
). “
Multistable syllables as enacted percepts: A source of an asymmetric bias in the verbal transformation effect
,”
Percept. Psychophys.
68
,
458
474
.
63.
Schwartz
,
J. L.
(
2010
). “
A reanalysis of McGurk data suggests that audiovisual fusion in speech perception is subject-dependent
,”
J. Acoust. Soc. Am.
127
,
1584
1594
.
64.
Schwartz
,
J. L.
,
Basirat
,
A.
,
Ménard
,
L.
, and
Sato
,
M.
(
2012
). “
The Perception-for-action-control theory (PACT): A perceptuo-motor theory of speech perception
,”
J. Neurolinguist.
25
,
336
354
.
65.
Schwartz
,
J. L.
,
Berthommier
,
F.
, and
Savariaux
,
C.
(
2004
). “
Seeing to hear better: Evidence for early audio-visual interactions in speech identification
,”
Cognition
93
,
B69
B78
.
66.
Schwartz
,
J. L.
,
Robert-Ribes
,
J.
, and
Escudier
,
P.
(
1998
). “
Ten years after Summerfield. A taxonomy of models for audiovisual fusion in speech perception
,” in
Hearing by Eye II. Perspectives and Directions in Research on Audiovisual Aspects of Language Processing
, edited by
R.
Campbell
,
B.
Dodd
, and
D.
Burnham
(
Psychology Press
,
Hove, UK
), pp.
85
108
.
67.
Schwartz
,
J. L.
,
Tiippana
,
K.
, and
Andersen
,
T.
(
2010
). “
Disentangling unisensory from fusion effects in the attentional modulation of McGurk effects: A Bayesian modeling study suggests that fusion is attention-dependent
,” in Proceedings of AVSP2010, Tokyo, Japan, pp.
23
27
.
68.
Senkowski
,
D.
,
Saint-Amour
,
D.
,
Gruber
,
T.
, and
Foxe
,
J. J.
(
2008a
). “
Look who’s talking: The deployment of visuo-spatial attention during multisensory speech processing under noisy environmental conditions
,”
Neuroimage
43
,
379
387
.
69.
Senkowski
,
D.
,
Schneider
,
T. R.
,
Foxe
,
J. J.
, and
Engel
,
A. K.
(
2008b
). “
Crossmodal binding through neural coherence: Implications for multisensory processing
,”
Trends Neurosci.
31
,
401
409
.
70.
Skipper
,
J. I.
,
van Wassenhove
,
V.
,
Nusbaum
,
H. C.
, and
Small
,
S. L.
(
2007
). “
Hearing lips and seeing voices: How cortical areas supporting speech production mediate audiovisual speech perception
,”
Cereb. Cortex
17
,
2387
2399
.
71.
Soto-Faraco
,
S.
, and
Alsius
,
A.
(
2007
). “
Conscious access to the unisensory components of a cross-modal illusion
,”
Neuroreport
18
,
347
350
.
72.
Soto-Faraco
,
S.
, and
Alsius
,
A.
(
2009
). “
Deconstructing the McGurk-MacDonald illusion
,”
J. Exp. Psychol. Hum. Percept. Perform.
35
,
580
587
.
73.
Soto-Faraco
,
S.
,
Navarra
,
J.
, and
Alsius
,
A.
(
2004
). “
Assessing automaticity in audiovisual speech integration: Evidence from the speeded classification task
,”
Cognition
92
,
B13
B23
.
74.
Sumby
,
W. H
, and
Pollack
,
I.
(
1954
). “
Visual contribution to speech intelligibility in noise
,”
J. Acoust. Soc. Am.
26
,
212
215
.
75.
Summerfield
,
Q.
(
1987
). “
Some preliminaries to a comprehensive account of audio-visual speech perception
,” in
Hearing by Eye: The Psychology of Lipreading
, edited by
B.
Dodd
and
R.
Campbell
(
Lawrence Erlbaum
,
New York
), pp.
3
51
.
76.
Summerfield
,
Q.
, and
McGrath
,
M.
(
1984
). “
Detection and resolution of audio-visual incompatibility in the perception of vowel
,”
Q. J. Exp. Psychol.
36A
,
51
74
.
77.
Tanaka
,
A.
,
Sakamoto
,
S.
,
Tsumura
,
K.
, and
Suzuki
,
Y.
(
2009
). “
Visual speech improves the intelligibility of time-expanded auditory speech
,”
Neuroreport
20
,
473
477
.
78.
Teissier
,
P.
,
Robert-Ribes
,
J.
,
Schwartz
,
J. L.
, and
Guérin-Dugué
,
A.
(
1999
). “
Comparing models for audiovisual fusion in a noisy-vowel recognition task
,”
IEEE Trans. Speech Audio Process.
7
,
629
642
.
79.
Tiippana
,
K.
,
Andersen
,
T. S.
, and
Sams
,
M.
(
2004
). “
Visual attention modulates audiovisual speech perception
,”
Eur. J. Cognit. Psychol.
16
,
457
472
.
80.
Van Wassenhove
,
V.
,
Grant
,
K. W.
, and
Poeppel
,
D.
(
2005
). “
Visual speech speeds up the neural processing of auditory speech
,”
Proc. Natl. Acad. Sci. U.S.A.
102
,
1181
1186
.
81.
Van Wassenhove
,
V.
,
Grant
,
K. W.
, and
Poeppel
,
D.
(
2007
). “
Temporal window of integration in bimodal speech
,”
Neuropsychologia
45
,
598
607
.
82.
Vatakis
,
A.
, and
Spence
,
C.
(
2007
). “
Crossmodal binding: Evaluating the ‘unity assumption’ using audiovisual speech stimuli
,”
Percept. Psychophys.
69
,
744
756
.
83.
Vroomen
,
J.
, and
Baart
,
M.
(
2011
). “
Phonetic recalibration in audiovisual speech
,” in
Frontiers in the Neural Basis of Multisensory Processes
, edited by
M. M.
Murray
and
M. T.
Wallace
(
Taylor and Francis
,
Routledge
), pp.
363
379
84.
Vroomen
,
J.
,
Keetels
,
M.
,
de Gelder
,
B.
, and
Bertelson
,
P.
(
2004
). “
Recalibration of temporal order perception by exposure to audio-visual asynchrony
,”
Cognit. Brain Res.
22
,
32
35
.
85.
Yehia
,
H.
,
Rubin
,
P.
, and
Vatikiotis-Bateson
,
E.
(
1998
). “
Quantitative association of vocal-tract and facial behavior
,”
Speech Commun.
26
,
23
43
.
86.
Yu
,
A. J.
,
Dayan
,
P.
, and
Cohen
,
J. D.
(
2009
). “
Dynamics of attentional selection under conflict: Toward a rational Bayesian account
,”
J. Exp. Psychol. Human Percept. Perform.
35
,
700
717
.
You do not currently have access to this content.