While audiovisual interactions in speech perception have long been considered as automatic, recent data suggest that this is not the case. In a previous study, Nahorna et al. [(2012). J. Acoust. Soc. Am. 132, 1061–1077] showed that the McGurk effect is reduced by a previous incoherent audiovisual context. This was interpreted as showing the existence of an audiovisual binding stage controlling the fusion process. Incoherence would produce unbinding and decrease the weight of the visual input in fusion. The present paper explores the audiovisual binding system to characterize its dynamics. A first experiment assesses the dynamics of unbinding, and shows that it is rapid: An incoherent context less than 0.5 s long (typically one syllable) suffices to produce a maximal reduction in the McGurk effect. A second experiment tests the rebinding process, by presenting a short period of either coherent material or silence after the incoherent unbinding context. Coherence provides rebinding, with a recovery of the McGurk effect, while silence provides no rebinding and hence freezes the unbinding process. These experiments are interpreted in the framework of an audiovisual speech scene analysis process assessing the perceptual organization of an audiovisual speech input before decision takes place at a higher processing stage.

1.
Alsius
,
A.
, and
Munhall
,
K.
(
2013
). “
Detection of audiovisual speech correspondences without visual awareness
,”
Psychol. Sci.
24
,
423
431
.
2.
Alsius
,
A.
,
Navarra
,
J.
,
Campbell
,
R.
, and
Soto-Faraco
,
S. S.
(
2005
). “
Audiovisual integration of speech falters under high attention demands
,”
Curr. Biol.
15
,
839
843
.
3.
Alsius
,
A.
,
Navarra
,
J.
, and
Soto-Faraco
,
S.
(
2007
). “
Attention to touch weakens audiovisual speech integration
,”
Exp. Brain Res.
183
,
399
404
.
4.
Alsius
,
A.
, and
Soto-Faraco
,
S.
(
2011
). “
Searching for audiovisual correspondence in multiple speaker scenarios
,”
Exp. Brain Res.
213
,
175
183
.
5.
Andersen
,
T. S.
,
Tiippana
,
K.
,
Laarni
,
J.
,
Kojo
,
I.
, and
Sams
,
M.
(
2009
). “
The role of visual spatial attention in audiovisual speech perception
,”
Speech Commun.
51
,
184
193
.
6.
Andersen
,
T. S.
,
Tiippana
,
K.
,
Lampinen
,
J.
, and
Sams
,
M.
(
2001
). “
Modelling of audiovisual speech perception in noise
,” in
Proceedings of the Fourth International ESCA ETRW Conference on Auditory-Visual Speech Processing
,
Ålborg, Denmark
, pp.
172
176
.
7.
Arnal
,
L. H.
,
Morillon
,
B.
,
Kell
,
C. A.
, and
Giraud
,
A.-L.
(
2009
). “
Dual neural routing of visual facilitation in speech processing
,”
J. Neurosci.
29
,
13445
13 453
.
8.
Benoit
,
C.
,
Mohamadi
,
T.
, and
Kandel
,
S.
(
1994
). “
Effects of phonetic context on audio-visual intelligibility of French
,”
J. Speech Hear. Res.
37
,
1195
1203
.
9.
Bernstein
,
L. E.
,
Auer
,
E. T.
, and
Moore
,
J. K.
(
2004
). “
Audiovisual speech binding: Convergence or association?
,” in
The Handbook of Multisensory Processes
, edited by
G. A.
Calvert
,
C.
Spence
, and
B. E.
Stein
(
The MIT Press
,
Cambridge, MA
), pp.
203
224
.
10.
Bernstein
,
L. E.
,
Lu
,
Z. L.
, and
Jiang
,
J.
(
2008
). “
Quantified acoustic-optical speech signal incongruity identifies cortical sites of audiovisual speech processing
,”
Brain Res.
1242
,
172
184
.
11.
Bertelson
,
P.
,
Vroomen
,
J.
, and
De Gelder
,
B.
(
2003
). “
Visual recalibration of auditory speech identification: A McGurk aftereffect
,”
Psychol. Sci.
14
,
592
597
.
12.
Bertelson
,
P.
,
Vroomen
,
J.
,
Wiegeraad
,
G.
, and
de Gelder
,
B.
(
1994
). “
Exploring the relation between McGurk interference and ventriloquism
,” in
Proceedings of ICSLP 94
(
Acoustical Society of Japan
,
Yokohama, Japan
), Vol.
2
, pp.
559
562
.
13.
Berthommier
,
F.
(
2004
). “
A phonetically neutral model of the low-level audiovisual interaction
,”
Speech Commun.
44
,
31
41
.
14.
Besle
,
J.
,
Fort
,
A.
,
Delpuech
,
C.
, and
Giard
,
M.-H.
(
2004
). “
Bimodal speech: Early suppressive visual effects in human auditory cortex
,”
Eur. J. Neurosci.
20
,
2225
2234
.
15.
Bregman
,
A. S.
(
1990
).
Auditory Scene Analysis
(
MIT Press
,
Cambridge, MA
),
773
pp.
16.
Bregman
,
A. S.
, and
Pinker
,
S.
(
1978
). “
Auditory streaming and the building of timbre
,”
Can. J. Psychol.
32
,
19
31
.
17.
Cathiard
,
M. A.
,
Schwartz
,
J. L.
, and
Abry
,
C.
(
2001
). “
Asking a naive question about the McGurk Effect: Why does audio [b] give more [d] percepts with visual [g] than with visual [d]?
,” in
Proceedings AVSP-2001
, edited by
D. W.
Massaro
,
J.
Light
, and
K.
Geraci
,
Aalborg
,
Denmark
, pp.
138
142
.
18.
Colin
,
C.
,
Radeau
,
M.
,
Soquet
,
A.
,
Demolin
,
D.
,
Colin
,
F.
, and
Deltenre
,
P.
(
2002
). “
Mismatch negativity evoked by the McGurk–MacDonald effect: A phonetic representation within short-term memory
,”
Clin. Neurophysiol.
113
,
495
506
.
19.
Erber
,
N. P.
(
1969
). “
Interaction of audition and vision in the recognition of oral speech stimuli
,”
J. Speech Hear. Res.
12
,
423
425
.
20.
Eskelund
,
K.
,
Tuomainen
,
J.
, and
Andersen
,
T. S.
(
2011
). “
Multistage audiovisual integration of speech: Dissociating identification and detection
,”
Exp. Brain Res.
208
,
447
457
.
21.
Fuster-Duran
,
A.
(
1995
). “
McGurk effect in Spanish and German listeners. Influences of visual cues in the perception of Spanish and German conflicting audio-visual stimuli
,” in
Proceedings of the Eurospeech 95
, edited by
J.
Pardo
,
Madrid Spain
, pp.
295
298
.
22.
Grant
,
K. W.
, and
Seitz
,
P.
(
2000
). “
The use of visible speech cues for improving auditory detection of spoken sentences
,”
J. Acoust. Soc. Am.
108
,
1197
1208
.
23.
Green
,
K.
,
Kuhl
,
P.
,
Meltzoff
,
A.
, and
Stevens
,
E.
(
1991
). “
Integrating speech information across talkers, gender, and sensory modality: Female faces and male voices in the McGurk effect
,”
Percept. Psychophys.
50
,
524
536
.
24.
Heckmann
,
M.
,
Kroschel
,
K.
,
Savariaux
,
C.
, and
Berthommier
,
F.
(
2002
). “
DCT-based video features for audio-visual speech recognition
,” in
Proceedings of ICSLP02
,
Denver, CO
, pp.
1925
1928
.
25.
Hupé
,
J. M.
, and
Pressnitzer
,
D.
(
2012
). “
The initial phase of auditory and visual scene analysis
,”
Philos. Trans. R. Soc. B
367
,
942
953
.
26.
Huyse
,
A.
,
Berthommier
,
F.
, and
Leybaert
,
J.
(
2013
). “
Degradation of labial information modifies audiovisual speech perception in cochlear-implanted children
,”
Ear Hear.
34
,
110
121
.
27.
Keane
,
B. P.
,
Rosenthal
,
O.
,
Chun
,
N. H.
, and
Shams
,
L.
(
2010
). “
Audiovisual integration in high functioning adults with autism
,”
Res. Autism Spectrum Disord.
4
,
276
289
.
28.
Keetels
,
M.
,
Stekelenburg
,
J.
, and
Vroomen
,
J.
(
2007
). “
Auditory grouping occurs prior to intersensory pairing: Evidence from temporal ventriloquism
,”
Exp. Brain Res.
180
,
449
456
.
29.
Kim
,
J.
, and
Davis
,
C.
(
2003
). “
Hearing foreign voices: Does knowing what is said affect masked visual speech detection
,”
Perception
32
,
111
120
.
30.
Kim
,
J.
, and
Davis
,
C.
(
2004
). “
Investigating the audio-visual detection advantage
,”
Speech Commun.
44
,
19
30
.
31.
Lallouache
,
M. T.
(
1990
). “
Un poste ‘visage-parole.’ Acquisition et traitement de contours labiaux” (“A ‘face-speech’ workstation. Acquisition and processing of labial contours”)
, in
Proceedings XVIII Journées d'Etudes sur la Parole
,
Montréal, Quebec, Canada
, pp.
282
286
.
32.
Manuel
,
S.
,
Repp
,
B. H.
,
Liberman
,
A. M.
, and
Studdert-Kennedy
,
M.
(
1989
). “
Exploring the ‘McGurk effecxt,’
 ”
Paper presented at the 24th meeting of the Psychonomic Society
,
San Diego, CA
.
33.
Massaro
,
D. W.
(
1987
).
Speech Perception by Ear and Eye
(
Lawrence Erlbaum Associates
,
Hillsdale, NJ
),
320
p.
34.
Massaro
,
D. W.
(
1989
). “
Multiple Book Review of Speech Perception by Ear and Eye: A Paradigm for Psychological Inquiry
,”
Behav. Brain Sci.
12
,
741
794
.
35.
Massaro
,
D. W.
, and
Cohen
,
M. M.
(
1983
). “
Evaluation and integration of visual and auditorial information in speech perception
,”
J. Exp. Psychol.: Human Percept. Perf.
9
,
753
771
.
36.
Massaro
,
D. W.
,
Tsuzaki
,
M.
,
Cohen
,
M. M.
,
Gesi
,
A.
, and
Heredia
,
R.
(
1993
). “
Bimodal speech perception: An examination across languages
,”
J. Phonetics
21
,
445
478
.
37.
McGurk
,
H.
, and
MacDonald
,
J.
(
1976
). “
Hearing lips and seeing voices
,”
Nature
264
,
746
748
.
38.
Nahorna
,
O.
,
Berthommier
,
F.
, and
Schwartz
,
J. L.
(
2012
). “
Binding and unbinding the auditory and visual streams in the McGurk effect
,”
J. Acoust. Soc. Am.
132
,
1061
1077
.
39.
Noppeney
,
U.
,
Ostwald
,
D.
, and
Werner
,
S.
(
2010
). “
Perceptual decisions formed by accumulation of audiovisual evidence in prefrontal cortex
,”
J. Neurosci.
30
,
7434
7446
.
40.
Ratcliff
,
R.
, and
Rouder
,
J. N.
(
1998
). “
Modeling response times for two-choice decisions
,”
Psychol. Sci.
9
,
347
356
.
41.
Sanabria
,
D.
,
Soto-Faraco
,
S.
,
Chan
,
J. S.
, and
Spence
,
C.
(
2005
). “
Intramodal perceptual grouping modulates multisensory integration: Evidence from the crossmodal congruency task
,”
Neurosci. Lett.
377
,
59
64
.
42.
Schwartz
,
J. L.
(
2006
). “
Bayesian model selection: The 0/0 problem in the fuzzy-logical model of perception
,”
J. Acoust. Soc. Am.
120
,
1795
1798
.
43.
Schwartz
,
J. L.
(
2010
). “
A reanalysis of McGurk data suggests that audiovisual fusion in speech perception is subject-dependent
,”
J. Acoust. Soc. Am.
127
,
1584
1594
.
44.
Schwartz
,
J. L.
,
Berthommier
,
F.
, and
Savariaux
,
C.
(
2004
). “
Seeing to hear better: Evidence for early audio-visual interactions in speech identification
,”
Cognition
93
,
B69
B78
.
45.
Schwartz
,
J. L.
,
Robert-Ribes
,
J.
, and
Escudier
,
P.
(
1998
). “
Ten years after Summerfield. A taxonomy of models for audiovisual fusion in speech perception
,” in
Hearing by Eye II. Perspectives and Directions in Research on Audiovisual Aspects of Language Processing
, edited by
R.
Campbell
,
B.
Dodd
, and
D.
Burnham
(
Psychology Press
,
Hove, UK
), pp.
85
108
.
46.
Schwartz
,
J. L.
,
Tiippana
,
K.
, and
Andersen
,
T.
(
2010
). “
Disentangling unisensory from fusion effects in the attentional modulation of McGurk effects: A Bayesian modeling study suggests that fusion is attention-dependent
,” in
Proceedings AVSP2010
,
Tokyo, Japan
, pp.
23
27
.
47.
Sekiyama
,
K.
, and
Burnham
,
D.
(
2008
). “
Impact of language on development of auditory-visual speech perception
,”
Dev. Sci.
11
,
306
320
.
48.
Sekiyama
,
K.
, and
Tohkura
,
Y.
(
1991
). “
McGurk effect in non-English listeners: Few visual effects for Japanese subjects hearing Japanese syllables of high auditory intelligibility
,”
J. Acoust. Soc. Am.
90
,
1797
1805
.
49.
Sekiyama
,
K.
, and
Tohkura
,
Y.
(
1993
). “
Inter-language differences in the influence of visual cues in speech perception
,”
J. Phonetics
21
,
427
444
.
50.
Smith
,
P. L.
, and
Ratcliff
,
R.
(
2004
). “
Psychology and neurobiology of simple decisions
,”
Trends Neurosci.
27
,
161
168
.
51.
Sodoyer
,
D.
,
Girin
,
L.
,
Jutten
,
C.
, and
Schwartz
,
J. L.
(
2004
). “
Further experiments on audio-visual speech source separation
,”
Speech Commun.
44
,
113
125
.
52.
Soto-Faraco
,
S.
, and
Alsius
,
A.
(
2007
). “
Conscious access to the unisensory components of a cross-modal illusion
,”
Neuroreport
18
,
347
350
.
53.
Soto-Faraco
,
S.
, and
Alsius
,
A.
(
2009
). “
Deconstructing the McGurk-MacDonald illusion
,”
J. Exp. Psychol.: Human Percept. Perf.
35
,
580
587
.
54.
Soto-Faraco
,
S.
,
Navarra
,
J.
, and
Alsius
,
A.
(
2004
). “
Assessing automaticity in audiovisual speech integration: Evidence from the speeded classification task
,”
Cognition
92
,
B13
B23
.
55.
Sumby
,
W.
, and
Pollack
,
I.
(
1954
). “
Visual contribution to speech intelligibility in noise
,”
J. Acoust. Soc. Am.
26
,
212
215
.
56.
Summerfield
,
Q.
(
1987
). “
Some preliminaries to a comprehensive account of audio-visual speech perception
,” in
Hearing by Eye: The Psychology of Lipreading
, edited by
B.
Dodd
and
R.
Campbell
(
Lawrence Erlbaum Associates
,
New York
), pp.
3
51
.
57.
Summerfield
,
Q.
, and
McGrath
,
M.
(
1984
). “
Detection and resolution of audio-visual incompatibility in the perception of vowel
,”
Q. J. Exp. Psychol. A
36
,
51
74
.
58.
Tiippana
,
K.
,
Andersen
,
T. S.
, and
Sams
,
M.
(
2004
). “
Visual attention modulates audiovisual speech perception
,”
Eur. J. Cognit. Psychol.
16
,
457
472
.
59.
Tiippana
,
K.
,
Puharinen
,
H.
,
Möttönen
,
R.
, and
Sams
,
M.
(
2011
). “
Sound location can influence audiovisual speech perception when spatial attention is manipulated
,”
Seeing Perceiving
24
,
67
90
.
60.
van Maanen
,
L.
,
Grasman
,
R. P. P. P.
,
Forstmann
,
B. U.
, and
Wagenmakers
,
E.-J.
(
2012
), “
Piéron's Law and optimal behavior in perceptual decision-making
,”
Front. Decision Neurosci.
5
,
143
.
61.
van Wassenhove
,
V.
,
Grant
,
K. W.
, and
Poeppel
,
D.
(
2005
). “
Visual speech speeds up the neural processing of auditory speech
,”
Proc. Natl. Acad. Sci.
102
,
1181
1186
.
62.
van Wassenhove
,
V.
,
Grant
,
K. W.
, and
Poeppel
,
D.
(
2007
). “
Temporal window of integration in bimodal speech
,”
Neuropsychologia
45
,
598
607
.
63.
Vroomen
,
J.
, and
Baart
,
M.
(
2011
). “
Phonetic recalibration in audiovisual speech
,” in
Frontiers in the Neural Basis of Multisensory Processes
, edited by
M. M.
Murray
and
M. T.
Wallace
(
Taylor and Francis
,
Oxford, UK
), pp.
363
379
.
64.
Yu
,
A. J.
,
Dayan
,
P.
, and
Cohen
,
J. D.
(
2009
). “
Dynamics of attentional selection under conflict: Toward a rational Bayesian account
,”
J. Exp. Psychol.: Human Percept. Perf.
35
,
700
717
.
You do not currently have access to this content.