Classic accounts of the benefits of speechreading to speech recognition treat auditory and visual channels as independent sources of information that are integrated fairly early in the speech perception process. The primary question addressed in this study was whether visible movements of the speech articulators could be used to improve the detection of speech in noise, thus demonstrating an influence of speechreading on the ability to detect, rather than recognize, speech. In the first experiment, ten normal-hearing subjects detected the presence of three known spoken sentences in noise under three conditions: auditory-only (A), auditory plus speechreading with a visually matched sentence (AVM) and auditory plus speechreading with a visually unmatched sentence (AVUM). When the speechread sentence matched the target sentence, average detection thresholds improved by about 1.6 dB relative to the auditory condition. However, the amount of threshold reduction varied significantly for the three target sentences (from 0.8 to 2.2 dB). There was no difference in detection thresholds between the AVUM condition and the A condition. In a second experiment, the effects of visually matched orthographic stimuli on detection thresholds was examined for the same three target sentences in six subjects who participated in the earlier experiment. When the orthographic stimuli were presented just prior to each trial, average detection thresholds improved by about 0.5 dB relative to the A condition. However, unlike the AVM condition, the detection improvement due to orthography was not dependent on the target sentence. Analyses of correlations between area of mouth opening and acoustic envelopes derived from selected spectral regions of each sentence (corresponding to the wide-band speech, and first, second, and third formant regions) suggested that AVM threshold reduction may be determined by the degree of auditory-visual temporal coherence, especially between the area of lip opening and the envelope derived from mid- to high-frequency acoustic energy. Taken together, the data (for these sentences at least) suggest that visual cues derived from the dynamic movements of the fact during speech production interact with time-aligned auditory cues to enhance sensitivity in auditory detection. The amount of visual influence depends in part on the degree of correlation between acoustic envelopes and visible movement of the articulators.

1.
American National Standards Institute (1989). “Specifications for audiometers,” ANSI S3.6-1989, American National Standards Institute, New York.
2.
Braida
,
L. D.
(
1991
). “
Crossmodal integration in the identification of consonant segments
,”
Q. J. Exp. Psych.
43
,
647
677
.
3.
Bruning, J. L., and Kintz, B. L. (1968). Computational Handbook of Statistics (Scott, Foresman, Glenview, IL).
4.
Calvert
,
G. A.
,
Bullmore
,
E. T.
,
Brammer
,
M. J.
,
Campbell
,
R.
,
Williams
,
S. C.
,
McGuire
,
P. K.
,
Woodruff
,
P. W.
,
Iversen
,
S. D.
, and
David
,
A. S.
(
1997
). “
Activation of auditory cortex during silent speechreading
,”
Science
276
,
593
596
.
5.
Egan
,
J. P.
,
Greenberg
,
G. Z.
, and
Shulman
,
A. I.
(
1961a
). “
Interval of time uncertainty in auditory detection
,”
J. Acoust. Soc. Am.
33
,
771
778
.
6.
Egan
,
J. P.
,
Schulman
,
A. I.
, and
Greenberg
,
G. Z.
(
1961b
). “
Memory for waveform and time uncertainty in auditory detection
,”
J. Acoust. Soc. Am.
33
,
779
781
.
7.
Fairbanks
,
G.
(
1950
). “
A physiologic correlative of vowel intensity
,”
Speech Monogr.
17
,
390
395
.
8.
Frost
,
R.
(
1991
). “
Phoentic recoding of print and its effect on the detection of concurrent speech in amplitude-modulated noise
,”
Cognition
39
,
195
214
.
9.
Frost
,
R.
,
Repp
B. H.
, and
Katz
,
L.
(
1988
). “
Can speech perception be influenced by simultaneous presentation of print?
J. Mem. Lang.
27
,
741
755
.
10.
Gordon
,
P. C.
(
1997a
). “
Coherence masking protection in brief noise complexes: Effects of temporal patterns
,”
J. Acoust. Soc. Am.
102
,
2276
2283
.
11.
Gordon
,
P. C.
(
1997b
). “
Coherence masking protection in speech sounds: The role of formant synchrony
,”
Percept. Psychophys.
59
,
232
242
.
12.
Gordon
,
P. C.
(
2000
). “
Masking protection in the perception of auditory objects
,”
Speech Commun.
30
,
197
206
.
13.
Grant
,
K. W.
, and
Braida
,
L. D.
(
1991
). “
Evaluating the Articulation Index for audiovisual input
,”
J. Acoust. Soc. Am.
89
,
2952
2960
.
14.
Grant
,
K. W.
,
Braida
,
L. D.
, and
Renn
,
R. J.
(
1991
). “
Single-band amplitude envelope cues as an aid to speechreading
,”
Q. J. Exp. Psych.
43
,
621
645
.
15.
Grant
,
K. W.
,
Braida
,
L. D.
, and
Renn
,
R. J.
(
1994
). “
Auditory supplements to speechreading: Combining amplitude envelope cues from different spectral regions of speech
,”
J. Acoust. Soc. Am.
95
,
1065
1073
.
16.
Grant
,
K. W.
, and
Seitz
,
P. F.
(
1998a
). “
Measures of auditory-visual integration in nonsense syllables and sentences
,”
J. Acoust. Soc. Am.
104
,
2438
2450
.
17.
Grant
,
K. W.
, and
Seitz
,
P. F.
(
1998b
). “
The use of visible speech cues (speechreading) for directing auditory attention: Reducing temporal and spectral uncertainty in auditory detection of spoken sentences
,”
J. Acoust. Soc. Am.
103
,
3018
(A).
18.
Grant
,
K. W.
, and
Walden
,
B. E.
(
1996
). “
Evaluating the Articulation Index for auditory-visual consonant recognition
,”
J. Acoust. Soc. Am.
100
,
2415
2424
.
19.
Grant, K. W., and Walden, B. E. (1995). “Predicting auditory-visual speech recognition in hearing-impaired listeners,” presented at the XIIIth International Congress of Phonetic Sciences, Stockholm, Sweden, August 13–19, 1995, Vol. 3, pp. 122–129.
20.
Grant
,
K. W.
,
Walden
,
B. E.
, and
Seitz
,
P. F.
(
1998
). “
Auditory-visual speech recognition by hearing-impaired subjects: Consonant recognition, sentence recognition, and auditory-visual integration
,”
J. Acoust. Soc. Am.
103
,
2677
2690
.
21.
Hall
III,
J. W.
,
Haggard
,
M. P.
, and
Fernandes
,
M. A.
(
1984
). “
Detection in noise by spectrotemporal pattern analysis
,”
J. Acoust. Soc. Am.
76
,
50
56
.
22.
IEEE (1969). “IEEE recommended practice for speech quality measurements,” Institute of Electrical and Electronic Engineers, New York.
23.
Levitt
,
H.
(
1971
). “
Transformed up-down methods in psychoacoustics
,”
J. Acoust. Soc. Am.
49
,
467
477
.
24.
Luce, R. D. (1963). “Detection and recognition,” in Handbook of Mathematical Psychology, edited by R. D. Luce, R. R. Bush, and E. Galanter (Wiley, New York).
25.
Marshall
,
L.
, and
Jesteadt
,
W.
(
1986
). “
Comparison of pure-tone audibility thresholds obtained with audiological and two-interval forced-choice procedures
,”
J. Speech Hear. Res.
29
,
82
91
.
26.
Massaro, D. W. (1987). Speech Perception by Ear and Eye: A Paradigm for Psychological Inquiry (Lawrence Earlbaum, Hillsdale, NJ).
27.
Massaro, D.W. (1998). Perceiving Talking Faces: From Speech Perception to a Behavioral Principle (MIT Press, Cambridge, MA).
28.
McGrath
,
M.
, and
Summerfield
,
Q.
(
1985
). “
Intermodal timing relations and audio-visual speech recognition by normal-hearing adults
,”
J. Acoust. Soc. Am.
77
,
678
685
.
29.
Meredith
,
M. A.
,
Nemitz
,
J. W.
, and
Stein
,
B. E.
(
1987
). “
Determinants of multisensory integration in superior colliculus neurons. I. Temporal factors
,”
J. Neurosci.
10
,
3727
3742
.
30.
Miller
,
G. A.
,
Heise
,
G. A.
, and
Lichten
,
W.
(
1951
). “
The intelligibility of speech as a function of the context of the speech materials
,”
J. Exp. Psychol.
41
,
329
335
.
31.
Miller
,
G. A.
, and
Nicely
,
P. E.
(
1955
). “
An analysis of perceptual confusions among some English consonants
,”
J. Acoust. Soc. Am.
27
,
338
352
.
32.
Repp
,
B. H.
,
Frost
,
R.
, and
Zsiga
,
E.
(
1992
). “
Lexical mediation between sight and sound in speechreading
,”
Q. J. Exp. Psych.
45
,
1
20
.
33.
Richards
,
V. M.
(
1987
). “
Monaural envelope correlation perceptions
,”
J. Acoust. Soc. Am.
82
,
1621
1630
.
34.
Sams
,
M.
,
Aulanko
,
R.
,
Hamalainen
,
M.
,
Hari
,
R.
,
Lounasmaa
,
O. V.
,
Lu
,
S. T.
, and
Simola
,
J.
(
1991
). “
Seeing speech: Visual information from lip movements modifies activity in the human auditory cortex
,”
Neurosci. Lett.
127
,
141
145
.
35.
Seitz, P. F., and Grant, K. W. (1999). “Modality effects on perceptual encoding and memory representations of spoken words,” Percept. Psychophys. (submitted).
36.
Spence
,
C.
,
Ranson
,
J.
, and
Driver
,
J.
(
2000
). “
Cross-modal selective attention: On the difficulty of ignoring sounds at the locus of visual attention
,”
Percept. Psychophys.
62
,
410
424
.
37.
Stein, B. E., and Meredith, M. A. (1993). The Merging of the Senses (MIT Press, Cambridge, MA).
38.
Stevens
,
K. N.
, and
House
,
A. S.
(
1955
). “
Development of a quantitative description of vowel articulation
,”
J. Acoust. Soc. Am.
27
,
484
493
.
39.
Stevens
,
K. N.
, and
House
,
A. S.
(
1961
). “
An acoustical theory of vowel production and some of its implications
,”
J. Speech Hear. Res.
4
,
303
320
.
40.
Sumby
,
W. H.
, and
Pollack
,
I.
(
1954
). “
Visual contribution to speech intelligibility in noise
,”
J. Acoust. Soc. Am.
26
,
212
215
.
41.
Summerfield, Q. (1987). “Some preliminaries to a comprehensive account of audio-visual speech perception,” in Hearing by Eye: The Psychology of Lip-Reading, edited by B. Dodd and R. Campbell (Lawrence Erlbaum, Hillsdale, NJ).
42.
Walden
,
B. E.
,
Prosek
,
R. A.
, and
Worthington
,
D.W.
(
1975
). “
Auditory and audiovisual feature transmission in hearing-impaired adults
,”
J. Speech Hear. Res.
18
,
272
280
.
43.
Watson
,
C. S.
, and
Nichols
,
T. L.
(
1976
). “
Detectability of auditory signals presented without defined observation intervals
,”
J. Acoust. Soc. Am.
59
,
655
668
.
44.
Westbury, J. R. (1994). X-Ray Microbeam Speech Production Database User's Handbook (University of Wisconsin, Madison, WI).
45.
Yehia
,
H.
,
Rubin
,
P.
, and
Vatikiotis-Bateson
,
E.
(
1998
). “
Quantitative association of vocal-tract and facial behavior
,”
Speech Commun.
26,
23
43
.
This content is only available via PDF.
You do not currently have access to this content.