This paper investigates the influence of visual cues in the perception of the /r/-/w/ contrast in Anglo-English. Audio-visual perception of Anglo-English /r/ warrants attention because productions are increasingly non-lingual, labiodental (e.g., [ʋ]), possibly involving visual prominence of the lips for the post-alveolar approximant [ɹ]. Forty native speakers identified [ɹ] and [w] stimuli in four presentation modalities: auditory-only, visual-only, congruous audio-visual, and incongruous audio-visual. Auditory stimuli were presented in noise. The results indicate that native Anglo-English speakers can identify [ɹ] and [w] from visual information alone with almost perfect accuracy. Furthermore, visual cues dominate the perception of the /r/-/w/ contrast when auditory and visual cues are mismatched. However, auditory perception is ambiguous because participants tend to perceive both [ɹ] and [w] as /r/. Auditory ambiguity is related to Anglo-English listeners' exposure to acoustic variation for /r/, especially to [ʋ], which is often confused with [w]. It is suggested that a specific labial configuration for Anglo-English /r/ encodes the contrast with /w/ visually, compensating for the ambiguous auditory contrast. An audio-visual enhancement hypothesis is proposed, and the findings are discussed with regard to sound change.

1.
Adachi
,
T.
,
Akahane-Yamada
,
R.
, and
Ueda
,
K.
(
2006
). “
Intelligibility of English phonemes in noise for native and non-native listeners
,”
Acoust. Sci. Tech.
27
(
5
),
285
289
.
2.
Aloufy
,
S.
,
Lapidot
,
M.
, and
Myslobodsky
,
M.
(
1996
). “
Differences in susceptibility to the ‘blending illusion’ among native Hebrew and English speakers
,”
Brain Lang.
53
(
1
),
51
57
.
3.
Bates
,
D.
,
Mächler
,
M.
,
Bolker
,
B.
, and
Walker
,
S.
(
2015
). “
Fitting linear mixed-effects models using lme4
,”
J. Stat. Softw.
67
(
1
),
1
48
.
4.
Boersma
,
P.
, and
Weenink
,
D.
(
2019
). “
Praat: Doing phonetics by computer (version 6.0.50) [computer program]
,” http://www.praat.org (Last viewed December 10, 2019).
5.
Boyce
,
S. E.
, and
Espy-Wilson
,
C. Y.
(
1997
). “
Coarticulatory stability in American English /r/
,”
J. Acoust. Soc. Am.
101
(
6
),
3741
3753
.
54.
Dalcher
,
C. V.
,
Knight
,
R.-A.
, and
Jones
,
M. J.
(
2008
). “
Cue switching in the perception of approximants: Evidence from two English dialects
,”
Univ. Penn. Work. Pap. Linguist.
14
(
2
),
9
.
6.
Delattre
,
P. C.
, and
Freeman
,
D. C.
(
1968
). “
A dialect study of American r's by x-ray motion picture
,”
Linguistics
6
(
44
),
29
68
.
7.
Diehl
,
R. L.
, and
Kluender
,
K. R.
(
1989
). “
On the objects of speech perception
,”
Ecol. Psychol.
1
(
2
),
121
144
.
8.
Docherty
,
G.
, and
Foulkes
,
P.
(
2001
). “
Variability in (r) production—Instrumental perspectives
,” in
'r-Atics: Sociolinguistic, Phonetic and Phonological Characteristics of /r/
, edited by
H.
Van de Velde
and
R.
van Hout
(
Université Libre de Bruxelles
,
Brussels, Belgium
), pp.
173
184
.
9.
Dohen
,
M.
(
2009
). “
Speech through the ear, the eye, the mouth and the hand
,” in
Multimodal Signals: Cognitive and Algorithmic Issues
, edited by
A.
Esposito
,
A.
Hussain
,
M.
Marinaro
, and
R.
Martone
(
Springer
,
Berlin, Germany
), pp.
24
39
.
10.
Foulkes
,
P.
, and
Docherty
,
G. J.
(
2000
). “
Another chapter in the story of /r/: ‘Labiodental’ variants in British English
,”
J. Sociolinguistics
4
(
1
),
30
59
.
11.
Gimson
,
A. C.
(
1980
).
An Introduction to the Pronunciation of English
, 3rd ed. (
Arnold
,
London
).
12.
Grant
,
K. W.
,
Walden
,
B. E.
, and
Seitz
,
P.-F.
(
1998
). “
Auditory-visual speech recognition by hearing-impaired subjects: Consonant recognition, sentence recognition, and auditory-visual integration
,”
J. Acoust. Soc. Am.
103
(
5
),
2677
2690
.
13.
Harrington
,
J.
,
Kleber
,
F.
, and
Reubold
,
U.
(
2011
). “
The contributions of the lips and the tongue to the diachronic fronting of high back vowels in Standard Southern British English
,”
J. Int. Phon. Assoc.
41
(
2
),
137
156
.
14.
Havenhill
,
J.
, and
Do
,
Y.
(
2018
). “
Visual speech perception cues constrain patterns of articulatory variation and sound change
,”
Front. Psychol.
9
,
728
.
15.
Heyne
,
M.
,
Wang
,
X.
,
Derrick
,
D.
,
Dorreen
,
K.
, and
Watson
,
K.
(
2018
). “
The articulation of /ɹ/ in New Zealand English
,”
J. Int. Phonetic Assoc.
50
,
366
388
.
16.
Hornsby
,
D.
(
2014
).
Linguistics: A Complete Introduction
(
Teach Yourself
,
London
).
17.
Irwin
,
J. R.
,
Frost
,
S. J.
,
Mencl
,
W. E.
,
Chen
,
H.
, and
Fowler
,
C. A.
(
2011
). “
Functional activation for imitation of seen and heard speech
,”
J. Neurolinguistics
24
(
6
),
611
618
.
18.
Jones
,
D.
(
1972
).
An Outline of English Phonetics
, 9th ed. (
Cambridge University
,
Cambridge, UK
).
19.
Jongman
,
A.
,
Wang
,
Y.
, and
Kim
,
B. H.
(
2003
). “
Contributions of semantic and facial information to perception of nonsibilant fricatives
,”
J. Speech Lang. Hear. Res.
46
(
6
),
1367
1377
.
20.
King
,
H.
, and
Ferragne
,
E.
(
2020a
). “
Labiodentals /r/ here to stay: Deep learning shows us why
,”
Anglophonia
2020
,
30
.
21.
King
,
H.
, and
Ferragne
,
E.
(
2020b
). “
Loose lips and tongue tips: The central role of the /r/-typical labial gesture in Anglo-English
,”
J. Phon.
80
,
100978
.
22.
Kuznetsova
,
A.
,
Brockhoff
,
P. B.
, and
Christensen
,
R. H. B.
(
2017
). “
lmerTest package: Tests in linear mixed effects models
,”
J. Stat. Softw.
82
(
13
),
1
26
.
23.
Lalonde
,
K.
, and
Werner
,
L. A.
(
2019
). “
Infants and adults use visual cues to improve detection and discrimination of speech in noise
,”
J. Speech Lang. Hear. Res.
62
(
10
),
3860
3875
.
24.
Lawson
,
E.
,
Scobbie
,
J. M.
, and
Stuart-Smith
,
J.
(
2011
). “
The social stratification of tongue shape for postvocalic /r/ in Scottish English
,”
J. Sociolinguistics
15
(
2
),
256
268
.
25.
Lawson
,
E.
,
Scobbie
,
J. M.
, and
Stuart-Smith
,
J.
(
2014
). “
A socio-articulatory study of Scottish rhoticity
,” in
Sociolinguistics in Scotland
, edited by
R.
Lawson
(
Palgrave Macmillan UK
,
London
), pp.
53
78
.
26.
Lawson
,
E.
,
Stuart-Smith
,
J.
, and
Scobbie
,
J. M.
(
2018
). “
The role of gesture delay in coda /r/ weakening: An articulatory, auditory and acoustic study
,”
J. Acoust. Soc. Am.
143
(
3
),
1646
1657
.
27.
Lee
,
A.
(
2000
). “
Virtual Dub (version 1.10.4)
,” http://www.virtualdub.org (Last viewed December 10, 2019).
28.
Lenth
,
R. V.
(
2021
). “
emmeans: Estimated marginal means, aka least-squares means
,” R package version 1.5.5-1, https://CRAN.R-project.org/package=emmeans (Last viewed July 27, 2020).
29.
Llamas
,
C.
(
1998
). “
Language variation and innovation in Middlesborough: A pilot study
,”
Leeds Work. Pap. Linguist. Phon.
6
,
97
114
.
30.
Macmillan
,
N. A.
, and
Creelman
,
C. D.
(
2005
).
Detection Theory: A User's Guide
, 2nd ed. (
Lawrence Erlbaum Associates
,
Mahwah, NJ, USA
).
31.
Marsden
,
S.
(
2006
). “
A sociophonetic study of labiodental /r/ in Leeds
,”
Leeds Work. Pap. Linguist. Phon.
2006
(
11
),
153
172
32.
Massaro
,
D. W.
(
1987
).
Speech Perception by Ear and Eye: A Paradigm for Psychological Inquiry
(
Lawrence Erlbaum Associates
,
Hillsdale, NJ
).
33.
Massaro
,
D. W.
(
1998
).
Perceiving Talking Faces: From Speech Perception to a Behavioral Principle
(
MIT
,
Cambridge, MA
).
34.
Mattheyses
,
W.
, and
Verhelst
,
W.
(
2015
). “
Audiovisual speech synthesis: An overview of the state-of-the-art
,”
Speech Commun.
66
,
182
217
.
35.
McCloy
,
D.
(
2013
). “
Mix speech with noise
” Praat script licensed under the GNU General Public Licence v3.0, https://github.com/drammock/praat-semiauto/blob/master/MixSpeechNoise.praat (Last viewed December 10, 2019).
36.
McGuire
,
G.
, and
Babel
,
M.
(
2012
). “
A cross-modal account for synchronic and diachronic patterns of /f/ and /θ/ in English
,”
Lab. Phonol.
3
(
2
),
251
272
.
37.
McGurk
,
H.
, and
Macdonald
,
J.
(
1976
). “
Hearing lips and seeing voices
,”
Nature
264
(
5588
),
746
748
.
38.
Mielke
,
J.
,
Baker
,
A.
, and
Archangeli
,
D.
(
2016
). “
Individual-level contact limits phonological complexity: Evidence from bunched and retroflex /ɹ/
,”
Language
92
(
1
),
101
140
.
39.
O'Connor
,
J. D.
,
Gerstman
,
L. J.
,
Liberman
,
A. M.
,
Delattre
,
P. C.
, and
Cooper
,
F. S.
(
1957
). “
Acoustic cues for the perception of initial /w, j, r, l/ in English
,”
Word
13
(
1
),
24
43
.
40.
Ohala
,
J. J.
(
1981
). “
The listener as a source of sound change
,” in
Papers from the Parasession on Language and Behavior
, edited by
C.
Masek
and
R.
Hendrick
(
Chicago Linguistic Society
,
Chicago, IL)
, pp.
178
203
.
41.
Ohala
,
J. J.
(
1996
). “
Speech perception is hearing sounds, not tongues
,”
J. Acoust. Soc. Am.
99
(
3
),
1718
1725
.
42.
Peirce
,
J. W.
(
2007
). “
PsychoPy—Psychophysics software in Python
,”
J. Neurosci. Methods
162
(
1
),
8
13
.
43.
R Core Team
(
2018
). “
R: A language and environment for statistical computing
” (
R Foundation for Statistical Computing
,
Vienna, Austria
).
44.
Ross
,
L. A.
,
Saint-Amour
,
D.
,
Leavitt
,
V. M.
,
Javitt
,
D. C.
, and
Foxe
,
J. J.
(
2007
). “
Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments
,”
Cereb. Cortex
17
(
5
),
1147
1153
.
45.
Schneider
,
C. A.
,
Rasband
,
W. S.
, and
Eliceiri
,
K. W.
(
2012
). “
NIH Image to ImageJ: 25 years of image analysis
,”
Nat. Methods
9
(
7
),
671
675
.
46.
Singmann
,
H.
,
Bolker
,
B.
,
Westfall
,
J.
, and
Aust
,
F.
(
2015
). “
afex: Analysis of factorial experiments
,” R package version 0.13-145, http://CRAN.R-project.org/package=afex (Last viewed July 27, 2020).
47.
Stevens
,
K. N.
(
1998
).
Acoustic Phonetics
(
MIT
,
Cambridge, MA
).
48.
Sumby
,
W.
, and
Pollack
,
I.
(
1954
). “
Visual contribution to speech intelligibility in noise
,”
J. Acoust. Soc. Am.
26
(
2
),
212
215
.
49.
Summerfield
,
Q.
,
Bruce
,
V.
,
Cowey
,
A.
,
Ellis
,
A. W.
, and
Perrett
,
D.
(
1992
). “
Lipreading and audio-visual speech perception
,”
Philos. Trans. R. Soc. London, Ser. B: Biol. Sci.
335
(
1273
),
71
78
.
50.
Tiede
,
M. K.
,
Boyce
,
S. E.
,
Holland
,
C. K.
, and
Choe
,
K. A.
(
2004
). “
A new taxonomy of American English /r/ using MRI and ultrasound
,”
J. Acoust. Soc. Am.
115
(
5
),
2633
2634
.
51.
Trudgill
,
P.
(
1974
).
The Social Differentiation of English in Norwich
(
Cambridge University
,
Cambridge, UK
).
52.
Van Engen
,
K. J.
,
Xie
,
Z.
, and
Chandrasekaran
,
B.
(
2017
). “
Audiovisual sentence recognition not predicted by susceptibility to the McGurk effect
,”
Atten. Percept. Psychophys.
79
(
2
),
396
403
.
53.
van Heuven
,
W. J. B.
,
Mandera
,
P.
,
Keuleers
,
E.
, and
Brysbaert
,
M.
(
2014
). “
Subtlex-UK: A new and improved word frequency database for British English
,”
Q. J. Exp. Psychol.
67
(
6
),
1176
1190
.
55.
Watson
,
C. S.
,
Qiu
,
W. W.
,
Chamberlain
,
M. M.
, and
Li
,
X.
(
1996
). “
Auditory and visual speech perception: Confirmation of a modality-independent source of individual differences in speech recognition
,”
J. Acoust. Soc. Am.
100
(
2
),
1153
1162
.
56.
Wells
,
J. C.
(
1982
).
Accents of English
(
Cambridge University
,
Cambridge, UK
).
57.
Werker
,
J. F.
,
Frost
,
P. E.
, and
McGurk
,
H.
(
1992
). “
La langue et les lèvres: Cross-language influences on bimodal speech perception
,”
Can. J. Psychol.
46
(
4
),
551
568
.
58.
Williams
,
A.
, and
Kerswill
,
P.
(
1999
). “
Dialect levelling: Change and continuity in Milton Keynes, Reading and Hull
,” in
Urban Voices: Accent Studies in the British Isles
, edited by
P.
Foulkes
and
G. J.
Docherty
(
Arnold
,
London
), pp.
141
162
.
59.
Zhou
,
X.
,
Espy-Wilson
,
C. Y.
,
Boyce
,
S. E.
,
Tiede
,
M. K.
,
Holland
,
C.
, and
Choe
,
A.
(
2008
). “
A magnetic resonance imaging-based articulatory and acoustic study of ‘retroflex’ and ‘bunched’ American English /r/
,”
J. Acoust. Soc. Am.
123
(
6
),
4466
4481
.

Supplementary Material

You do not currently have access to this content.