This article reports on vowel clarity metrics based on spectrotemporal modulations of speech signals. Motivated by previous findings on the relevance of modulation-based metrics for speech intelligibility assessment and pathology classification, the current study used factor analysis to identify regions within a bi-dimensional modulation space, the magnitude power spectrum, as in Elliott and Theunissen [(2009). PLoS Comput. Biol. 5(3), e1000302] by relating them to a set of conventional acoustic metrics of vowel space area and vowel distinctiveness. Two indices based on the energy ratio between high and low modulation rates across temporal and spectral dimensions of the modulation space emerged from the analyses. These indices served as input for measurements of central tendency and classification analyses that aimed to identify vowel-related speech impairments in French native speakers with head and neck cancer (HNC) and Parkinson dysarthria (PD). Following the analysis, vowel-related speech impairment was identified in HNC speakers, but not in PD. These results were consistent with findings based on subjective evaluations of speech intelligibility. The findings reported are consistent with previous studies indicating that impaired speech is associated with attenuation in energy in higher spectrotemporal modulation bands.

1.
Balaguer
,
M.
,
Pommée
,
T.
,
Farinas
,
J.
,
Pinquier
,
J.
,
Woisard
,
V.
, and
Speyer
,
R.
(
2020
). “
Effects of oral and oropharyngeal cancer on speech intelligibility using acoustic analysis: Systematic review
,”
Head Neck
42
(
1
),
111
130
.
2.
Basilakos
,
A.
,
Yourganov
,
G.
,
den Ouden
,
D.-B.
,
Fogerty
,
D.
,
Rorden
,
C.
,
Feenaughty
,
L.
, and
Fridriksson
,
J.
(
2017
). “
A multivariate analytic approach to the differential diagnosis of apraxia of speech
,”
J. Speech Lang. Hear. Res.
60
,
3378
3392
.
3.
Bechet
,
F.
(
2001
). “
LIA_PHON: Un système complet de phonétisation de textes” (“LIA_PHON: A complete text phonetization system”)
,
Traitement Automatique Langues
42
(
1
),
47
67
.
4.
Bořil
,
T.
, and
Skarnitzl
,
R.
(
2016
). “
Tools rPraat and mPraat: Interfacing phonetic analyses with signal processing
,” in
Text, Speech, and Dialogue. TSD 2016. Lecture Notes in Computer Science
, edited by
P.
Sojka
,
A.
Horák
,
I.
Kopeček
, and
K.
Pala
(
Springer
,
Cham, Switzerland
), pp.
367
374
.
5.
Chi
,
T.
,
Gao
,
Y.
,
Guyton
,
M. C.
,
Ru
,
P.
, and
Shamma
,
S.
(
1999
). “
Spectro-temporal modulation transfer functions and speech intelligibility
,”
J. Acoust. Soc. Am.
106
(
5
),
2719
2732
.
6.
Cummins
,
F.
(
2012
). “
Oscillators and syllables: A cautionary note
,”
Front. Psychol.
3
,
364
.
7.
De Bruijn
,
M. J.
,
Ten Bosch
,
L.
,
Kuik
,
D. J.
,
Quené
,
H.
,
Langendijk
,
J. A.
,
Leemans
,
C. R.
, and
Verdonck-De Leeuw
,
I. M.
(
2009
). “
Objective acoustic-phonetic speech analysis in patients treated for oral or oropharyngeal cancer
,”
Folia Phoniatr. Logop.
61
(
3
),
180
187
.
8.
Dellwo
,
V.
, and
Wagner
,
P.
(
2003
). “
Relationships between rhythm and speech rate
,” in
Proceedings of the 15th International Congress of the Phonetic Sciences
, August 3–9, Barcelona, Spain, pp.
471
474
.
9.
Dudley
,
H.
(
1940
). “
The carrier nature of speech
,”
Bell Syst. Tech. J.
19
,
495
515
.
10.
Dusan
,
S.
(
2007
). “
On the relevance of some spectral and temporal patterns for vowel classification
,”
Speech Commun.
49
(
1
),
71
82
.
11.
Edwards
,
J. R.
, and
Bagozzi
,
R. P.
(
2000
). “
On the nature and direction of relationships between constructs and measures
,”
Psychol. Methods
5
(
2
),
155
174
.
12.
Elliott
,
T. M.
, and
Theunissen
,
F. E.
(
2009
). “
The modulation transfer function for speech intelligibility
,”
PLoS Comput. Biol.
5
(
3
),
e1000302
.
13.
Fahn
,
S.
,
Elton
,
R. L.
, and
Members of the UPDRS Development Committee
(
1987
). “
Unified Parkinson's disease rating scale
,” in
Recent Developments in Parkinson's Disease
, edited by
S.
Fahn
,
C. D.
Marsden
,
D. M.
Calne
,
A.
Lieberman
, and
M.
Goldstein
(
MacMillan Health Care Information
,
Florham Park, NJ
), pp.
153
163
.
14.
Falk
,
T. H.
,
Chan
,
W. Y.
, and
Shein
,
F.
(
2012
). “
Characterization of atypical vocal source excitation, temporal dynamics and prosody for objective measurement of dysarthric word intelligibility
,”
Speech Commun.
54
(
5
),
622
631
.
15.
Fant
,
G.
(
1960
).
Acoustic Theory of Speech Production
(
Mouton de Gruyter
,
The Hague
).
16.
Fisher
,
R. A.
(
1936
). “
The use of multiple measurements in taxonomic problems
,”
Ann. Eugenics
7
(
2
),
179
188
.
17.
Flinker
,
A.
,
Doyle
,
W. K.
,
Mehta
,
A. D.
,
Devinsky
,
O.
, and
Poeppel
,
D.
(
2019
). “
Spectrotemporal modulation provides a unifying framework for auditory cortical asymmetries
,”
Nat. Hum. Behav.
3
(
4
),
393
405
.
18.
Fogerty
,
D.
(
2014
). “
Importance of envelope modulations during consonants and vowels in segmentally interrupted sentences
,”
J. Acoust. Soc. Am.
135
(
3
),
1568
1576
.
19.
Fogerty
,
D.
(
2019
). “
The perceptual contribution of consonants and vowels to sentence recognition: Effect of dialect variation in American English
,” in
Proceedings of the 19th International Congress of Phonetic Sciences
, August 5–9, Melbourne, Australia, pp.
3240
3244
.
20.
Fogerty
,
D.
, and
Humes
,
L. E.
(
2012
). “
The role of vowel and consonant fundamental frequency, envelope, and temporal fine structure cues to the intelligibility of words and sentences
,”
J. Acoust. Soc. Am.
131
(
2
),
1490
1501.
21.
Fogerty
,
D.
, and
Kewley-Port
,
D.
(
2009
). “
Perceptual contributions of the consonant-vowel boundary to sentence intelligibility
,”
J. Acoust. Soc. Am.
126
(
2
),
847
857
.
22.
Ghio
,
A.
,
Pouchoulin
,
G.
,
Teston
,
B.
,
Pinto
,
S.
,
Fredouille
,
C.
,
De Looze
,
C.
,
Robert
,
D.
,
Viallet
,
F.
, and
Giovanni
,
A.
(
2012
). “
How to manage sound, physiological and clinical data of 2500 dysphonic and dysarthric speakers?
,”
Speech Commun.
54
(
5
),
664
679
.
23.
Giraud
,
A. L.
, and
Poeppel
,
D.
(
2012
). “
Cortical oscillations and speech processing: Emerging computational principles and operations
,”
Nat. Neurosci.
15
(
4
),
511
517
.
24.
Goldman
,
J.-P.
(
2011
). “
EasyAlign: A quasi-automatic phonetic alignment tool under Praat
,” in
Proceedings of INTERSPEECH
, August 27–31, Firenze, Italy.
25.
Howard
,
S.
, and
Heselwood
,
B.
(
2015
). “
The contribution of phonetics to the study of vowel development and disorders
,” in
Handbook of Vowels and Vowel Disorders
(
Psychology
,
New York
).
26.
Jung
,
S.
, and
Lee
,
S.
(
2011
). “
Exploratory factor analysis for small samples
,”
Behav. Res.
43
(
3
),
701
709
.
27.
Karlsson
,
F.
, and
van Doorn
,
J.
(
2012
). “
Vowel formant dispersion as a measure of articulation proficiency
,”
J. Acoust. Soc. Am.
132
(
4
),
2633
2641
.
28.
Kent
,
R. D.
, and
Kim
,
Y. J.
(
2003
). “
Toward an acoustic typology of motor speech disorders
,”
Clin. Linguist. Phon.
17
(
6
),
427
445
.
29.
Kuhn
,
M.
(
2008
). “
Building predictive models in R using the caret package
,”
J. Stat. Softw.
28
(
5
),
1
26
.
30.
Lalain
,
M.
,
Ghio
,
A.
,
Giusti
,
L.
,
Robert
,
D.
,
Fredouille
,
C.
, and
Woisard
,
V.
(
2020
). “
Design and development of a speech intelligibility test based on pseudowords in French: Why and how?
,”
J. Speech Lang. Hear. Res.
63
(
7
),
2070
2083
.
31.
Lansford
,
K.
, and
Liss
,
J. M.
(
2014a
). “
Vowel acoustics in dysarthria: Mapping to perception
,”
J. Speech Lang. Hear. Res.
57
(
1
),
68
80
.
32.
Lansford
,
K.
, and
Liss
,
J. M.
(
2014b
). “
Vowel acoustics in dysarthria: Speech disorder diagnosis and classification
,”
J. Speech Lang. Hear. Res.
57
(
1
),
57
67
.
33.
Liss
,
J. M.
,
Le Gendre
,
S.
, and
Lotto
,
A.
(
2010
). “
Discriminating dysarthria type from envelope modulation spectra
,”
J. Speech. Lang. Hear. Res.
53
(
5
),
1246
1255
.
34.
Liu
,
C.
, and
Eddins
,
D. A.
(
2008
). “
Effects of spectral modulation filtering on vowel identification
,”
J. Acoust. Soc. Am.
124
(
3
),
1704
1715
.
35.
Luo
,
H.
, and
Poeppel
,
D.
(
2007
). “
Phase patterns of neuronal responses reliably discriminate speech in human auditory cortex
,”
Neuron
54
,
1001
1010
.
36.
Marczyk
,
A.
,
Belley
,
É.
,
Savard
,
C.
,
Roy
,
J.-P.
,
Vaillancourt
,
J.
, and
Tremblay
,
P.
(
2022
). “
Learning transfer from singing to speech: Insights from vowel analyses in aging amateur singers and non-singers
,”
Speech Commun.
141
,
28
39
.
37.
MATLAB
(
2016
).
MATLAB (version 2016b)
(
MathWorks Inc.
,
Natick, MA
).
38.
Meunier
,
C.
, and
Ghio
,
A.
(
2018
). “
Caractériser la distinctivité du système vocalique des locuteurs” (“Characterizing the distinctiveness of the vocalic system of speakers”)
,
Proc. XXXIIe J. Etudes Parole
1
(
1
),
469
477
.
39.
Miller
,
N.
(
2013
). “
Measuring up to speech intelligibility
,”
Int. J. Lang. Commun. Disord.
48
(
6
),
601
612
.
40.
Moro-Velázquez
,
L.
,
Gómez-García
,
J. A.
,
Godino-Llorente
,
J. I.
, and
Andrade-Miranda
,
G.
(
2015
). “
Modulation spectra morphological parameters: A new method to assess voice pathologies according to the GRBAS scale
,”
Biomed. Res. Int.
2015
,
259239
.
41.
Nycz
,
J.
, and
Hall-Lew
,
L.
(
2013
). “
Best practices in measuring vowel merger
,”
Proc. Mtgs. Acoust.
20
(
1
),
060008
.
42.
Osborne
,
J. W.
(
2015
). “
What is rotating in exploratory factor analysis?
,”
Pract. Assess. Res. Eval.
20
,
2
.
43.
Osborne
,
J. W.
, and
Costello
,
A. B.
(
2004
). “
Sample size and subject to item ratio in principal components analysis
,”
Pract. Assess. Res. Eval.
9
,
11
.
44.
Perron
,
M.
,
Vaillancourt
,
J.
, and
Tremblay
,
P.
(
2022
). “
Amateur singing benefits speech perception in aging under certain conditions of practice: Behavioural and neurobiological mechanisms
,”
Brain Struct. Funct.
227
(
3
),
943
962
.
45.
Perron
,
M.
,
Theaud
,
G.
,
Descoteaux
,
M.
, and
Tremblay
,
P.
(
2021
). “
The fronto-temporal organization of the arcuate fasciculus and its relationship with speech perception in young and older amateur singers and non-singers
,”
Human Brain Mapp.
42
(
10
),
3058
3076
.
46.
Porter
,
R. J.
(
1986
). “
Speech messages, modulations, and motions
,”
J. Phon.
14
(
1
),
83
88
.
47.
R Core Development Team
(
2013
).
R: A Language and Environment for Statistical Computing
(
R Foundation
,
Vienna, Austria
).
48.
Revelle
,
W.
(
2020
). “
psych: Procedures for Psychological, Psychometric, and Personality Research [software]
,” https://cran.r-project.org/web/packages/psych/index.html (Last viewed October 24, 2022).
49.
Rosen
,
S.
(
1992
). “
Temporal information in speech: Acoustic, auditory and linguistic aspects
,”
Philos. Trans. R Soc. London, Ser. B: Biol. Sci.
336
(
1278
),
367
373
.
50.
Rusz
,
J.
,
Cmejla
,
R.
,
Tykalova
,
T.
,
Ruzickova
,
H.
,
Klempir
,
J.
,
Majerova
,
V.
,
Picmausova
,
J.
,
Roth
,
J.
, and
Ruzicka
,
E.
(
2013
). “
Imprecise vowel articulation as a potential early marker of Parkinson's disease: Effect of speaking task
,”
J. Acoust. Soc. Am.
134
(
3
),
2171
2181
.
51.
Sapir
,
S.
,
Ramig
,
L. O.
,
Spielman
,
J. L.
, and
Fox
,
C.
(
2010
). “
Formant centralization ratio (FCR): A proposal for a new acoustic measure of dysarthric speech
,”
J. Speech. Lang. Hear. Res.
53
(
1
),
114
125
.
52.
Steiner
,
M. D.
, and
Grieder
,
S.
(
2020
). “
EFAtools: An R package with fast and flexible implementations of exploratory factor analysis tools
,”
J. Open Source Softw.
5
(
53
),
2521
.
53.
Stilp
,
C. E.
, and
Kluender
,
K. R.
(
2010
). “
Cochlea-scaled entropy, not consonants, vowels, or time, best predicts speech intelligibility
,”
Proc. Natl. Acad. Sci. U.S.A.
107
(
27
),
12387
12392
.
54.
Ten Berge
,
J.
,
Krijnen
,
W.
,
Wansbeek
,
T.
, and
Shapiro
,
A.
(
1999
). “
Some new results on correlation-preserving factor scores prediction methods
,”
Linear Algebra Appl.
289
,
311
318
.
55.
Teston
,
B.
, and
Galindo
,
B.
(
1995
). “
A diagnostic and rehabilitation aid workstation for speech and voice pathologies
,” in
Proceedings of the 4th European Conference on Speech Communication and Technology, Eurospeech
, September 18–21, Madrid, Spain, pp.
1883
1886
.
56.
Thiele
,
C.
, and
Hirschfeld
,
G.
(
2021
). “
Cutpointr: Improved estimation and validation of optimal cutpoints in R
,”
J. Stat. Softw.
98
(
11
),
1
21
.
57.
Thoret
,
E.
,
Caramiaux
,
B.
,
Depalle
,
P.
, and
McAdams
,
S.
(
2021
). “
Learning metrics on spectrotemporal modulations reveals the perception of musical instrument timbre
,”
Nat. Hum. Behav.
5
(
3
),
369
377
.
58.
Thoret
,
E.
,
Depalle
,
P.
, and
McAdams
,
S.
(
2016
). “
Perceptually salient spectrotemporal modulations for recognition of sustained musical instruments
,”
J. Acoust. Soc. Am.
140
(
6
),
EL478
EL483
.
59.
Tremblay
,
P.
, and
Perron
,
M.
(
2022
). “
Auditory cognitive aging in amateur singers and non-singers
,”
Cognition
(in press).
60.
Varnet
,
L.
,
Ortiz-Barajas
,
M. C.
,
Erra
,
R. G.
,
Gervain
,
J.
, and
Lorenzi
,
C.
(
2017
). “
A cross-linguistic study of speech modulation spectra
,”
J. Acoust. Soc. Am.
142
(
4
),
1976
1989
.
61.
Venezia
,
J. H.
,
Martin
,
A. G.
,
Hickok
,
G.
, and
Richards
,
V. M.
(
2019
). “
Identification of the spectrotemporal modulations that support speech intelligibility in hearing-impaired and normal-hearing listeners
,”
J. Speech Lang. Hear. Res.
62
(
4
),
1051
1067
.
62.
Weismer
,
G.
,
Jeng
,
J. Y.
,
Laures
,
J. S.
,
Kent
,
R. D.
, and
Kent
,
J. F.
(
2001
). “
Acoustic and intelligibility characteristics of sentence production in neurogenic speech disorders
,”
Folia Phoniatr. Logop.
53
(
1
),
1
18
.
63.
Woisard
,
V.
,
Astésano
,
C.
,
Balaguer
,
M.
,
Farinas
,
J.
,
Fredouille
,
C.
,
Gaillard
,
P.
,
Ghio
,
A.
,
Giusti
,
L.
,
Laaridh
,
I.
,
Lalain
,
M.
,
Lepage
,
B.
,
Mauclair
,
J.
,
Nocaudie
,
O.
,
Pinquier
,
J.
,
Pouchoulin
,
G.
,
Puech
,
M.
,
Robert
,
D.
, and
Roger
,
V.
(
2021
). “
C2SI corpus: A database of speech disorder productions to assess intelligibility and quality of life in head and neck cancers
,”
Lang. Resour. Eval.
55
,
173
190
.
64.
Yoho
,
S. E.
,
Borrie
,
S. A.
,
Barrett
,
T. S.
, and
Whittaker
,
D. B.
(
2019
). “
Are there sex effects for speech intelligibility in American English? Examining the influence of talker, listener, and methodology
,”
Atten. Percept. Psychophys.
81
(
2
),
558
570
.
65.
Zwiner
,
P.
, and
Barnes
,
G. R.
(
1992
). “
Vocal tract steadiness: A measure of phonatory and upper airway motor control during phonation in dysarthria
,”
J. Speech. Lang. Hear. Res.
35
,
761
768
.
You do not currently have access to this content.