This study investigates rhythmic features based on the short-time energy function of speech signals with the aim of finding robust, speaker-independent features that indicate speaker intoxication. Data from the German Alcohol Language Corpus, which comprises read, spontaneous, and command&control speech uttered by 162 speakers of both genders and various age groups when sober and intoxicated, were analyzed. Energy contours are compared directly (Root Mean Squared Error, statistical correlation, or the Euclidean distance in the spectral space of the contour) and by parameterization of the contour using the Discrete Cosine Transform (DCT) and the first and second moments of the lower DCT spectrum. Contours are also analyzed by Principal Components Analysis aiming at fundamental “eigen contour” changes that might encode intoxication. Energy contours differ significantly with intoxication in terms of distance measures, the second and fourth DCT coefficients, and the first and second moments of the lower DCT spectrum. Principal Components Analysis did not yield interpretable “eigen contours” that could be used in distinguishing intoxicated from sober contours.

1.
Aldermann
,
G. A.
,
Hollien
,
H.
,
Martin
,
C.
, and
DeJong
,
G.
(
1995
). “
Shifts in fundamental frequency and articulation resulting from intoxication
,”
J. Acoust. Soc. Am.
97
,
3363
3364
.
2.
Baayen
,
R. H.
(
2008
).
Analysing Linguistic Data: A Practical Introduction to Statistics Using R
(
Cambridge University Press
,
Cambridge, UK
), pp.
263
328
.
3.
Baumeister
,
B.
,
Heinrich
,
C.
, and
Schiel
,
F.
(
2012
). “
The influence of alcoholic intoxication on the fundamental frequency of female and male speakers
,”
J. Acoust. Soc. Am.
132
,
442
451
.
4.
Behne
,
D. M.
,
Rivera
,
S. M.
, and
Pisoni
,
D. B.
(
1991
). “
Effects of alcohol on speech: Durations of isolated words, sentences and passages
,”
Res. Speech Percept.
17
,
285
301
.
5.
Braun
,
A.
(
1991
). “
Speaking while intoxicated: Phonetic and forensic aspects
,” in
Proceedings of the XIIth International Congress of Phonetic Sciences, Aix-en-Provence
(ICPhS Organizing Committee, Aix-en-Provence, France), pp.
146
149
.
6.
Cassidy
,
S.
, and
Harrington
,
J.
(
2001
). “
Multi-level annotation in the EMU speech database management system
,”
Speech Commun.
33
(
1–2
),
61
77
.
7.
Chin
,
S. B.
, and
Pisoni
,
D. B.
(
1997
).
Alcohol and Speech
(
Academic Press
,
San Diego, CA
), pp.
258
269
.
8.
Cooney
,
O. M.
,
McGuigan
,
K. G.
, and
Murphy
,
P. J. P.
(
1998
). “
Acoustic analysis of the effects of alcohol on the human voice
,”
J. Acoust. Soc. Am.
103
,
2895
2895
.
9.
Cummings
,
K. E.
,
Chin
,
S. B.
, and
Pisoni
,
D. B.
(
1995
). “
Acoustic and glottal excitation analyses of sober vs. intoxicated speech: A first report
,”
Res. Spoken Language Process. Prog. Rep.
20
,
359
386
.
10.
Dekens
,
T.
,
Demol
,
M.
,
Verhelst
,
W.
, and
Verhoeve
,
P.
(
2007
). “
A comparative study of speech rate estimation techniques
,” in
Proceedings of Interspeech 2007
(International Speech Communication Association, Antwerp, Belgium), pp.
510
513
.
11.
Dellwo
,
V.
(
2006
). “
Rhythm and speech rate: A variation coefficient for DeltaC
,” in
Language and Language-processing. Proceedings of the 38th linguistic Colloquium
, edited by
P.
Karnowski
, and
I.
Szigeti
(
Peter Lang, Frankfurt am Main
,
Germany
), pp.
231
241
.
12.
Folk
,
L.
, and
Schiel
,
F.
(
2011
). “
The Lombard Effect in spontaneous dialog speech
,” in
Proceedings of Interspeech 2011
(International Speech Communication Association, Florence, Italy), pp.
2701
2704
.
13.
Grabe
,
E.
, and
Low
,
E. L.
(
2002
). “
Durational variability in speech and the rhythm class hypothesis
,” in
Papers in Laboratory Phonology
, edited by
C.
Gussenhoven
, and
N.
Warner
(
Cambridge University Press
,
Cambridge, UK
), Vol. 7, pp.
515
546
.
14.
Hansen
,
J. H. L.
, and
Patil
,
S.
(
2007
). “
Speech under stress: analysis, modeling and recognition
,” in
LNAI 4343 Speaker Classification I
, edited by
C.
Mueller
(
Springer
,
New York
), pp.
108
137
.
15.
Harrington
,
J.
(
2010
).
Phonetic Analysis of Speech Corpora
(
Wiley-Blackwell
,
Chichester, UK
), pp.
297
316
.
16.
Heinrich
,
C.
, and
Schiel
,
F.
(
2011
). “
Estimating speaking rate by means of rhythmicity parameters
,” in
Proceedings of Interspeech 2011
(International Speech Communication Association, Florence, Italy), pp.
1873
1876
.
17.
Hollien
,
H.
,
DeJong
,
G.
,
Martin
,
C. A.
,
Schwartz
,
R.
, and
Liljegren
,
K.
(
2001
). “
Effects of ethanol intoxication on speech suprasegmentals
,”
J. Acoust. Soc. Am.
110
,
3198
3206
.
18.
Klingholz
,
F.
,
Penning
,
R.
, and
Liebhardt
,
E.
(
1988
). “
Recognition of low-level alcohol intoxication from speech signal
,”
J. Acoust. Soc. Am.
84
,
929
935
.
19.
Künzel
,
H. J.
, and
Braun
,
A.
(
2003
). “
The effect of alcohol on speech prosody
,” in
Proceedings of the ICPhS 2003
, Barcelona, Spain, pp.
2645
2648
.
20.
Levit
,
M.
,
Huber
,
R.
,
Batliner
,
A.
, and
Noeth
,
E.
(
2001
). “
Use of prosodic speech characteristics for automated detection of alcohol intoxication
,” in
ISCA Tutorial and Research Workshop on Prosody in Speech Recognition and Understanding, Workshop on Prosody and Speech Recognition 2001
, edited by
M.
Bacchiani
,
J.
Hirschberg
,
D.
Litman
, and
M.
Ostendorf
(
Red Bank
,
NJ
), pp.
103
106
.
21.
Martin
,
C. S.
, and
Yuchtman
,
M.
(
1986
). “
Using speech as an index of alcohol-intoxication
,”
Res. Speech Percept.
12
,
413
426
.
22.
Mathon
,
S.
, and
de Abreu
,
S.
(
2007
). “
Emotion from speakers to listeners: Perception and prosodic characterization of affective speech
,” in
LNAI 4441 Speaker Classification II
, edited by
C.
Mueller
(
Springer
,
New York
), pp.
70
82
.
23.
Morgan
,
N.
, and
Fosler-Lussier
,
E.
(
1998
). “
Combining multiple estimators of speaking rate
,” in
Proceedings of 1998 IEEE International Conference on Acoustics, Speech and Signal Processing
(Institute of Electrical and Electronics Engineers, Seattle, WA), Vol. 2, pp.
729
732
.
24.
Pearson
,
K.
(
1901
). “
On lines and planes of closest fit to systems of points in space
,”
Philos. Mag.
2
(
11
),
559
572
.
25.
Pfau
,
T.
, and
Ruske
,
G.
(
1998
). “
Estimating the speaking rate by vowel detection
,” in
Proceedings of 1998 IEEE International Conference on Acoustics, Speech and Signal Processing
(Institute of Electrical and Electronics Engineers, Seattle, WA), Vol. 2, pp.
945
948
.
26.
Pisoni
,
D. B.
,
Hathaway
,
S. N.
, and
Yuchtman
,
M.
(
1985
). “
Effects of alcohol on the acoustic-phonetic properties of speech: Final report to GM research laboratories
,”
Res. Speech Percept. Prog. Rep.
11
,
109
171
.
27.
Ramus
,
F.
,
Nespor
,
M.
, and
Mehler
,
J.
(
1999
). “
Correlates of linguistic rhythm in the speech signal
,”
Cognition
73
(
3
),
265
292
.
28.
Reubold
,
U.
,
Harrington
,
J.
, and
Kleber
,
F.
(
2010
). “
Vocal aging effects on f0 and the first formant: A longitudinal analysis in adult speakers
,”
Speech Commun.
52
,
638
651
.
29.
Schiel
,
F.
(
2011
). “
Perception of alcoholic intoxication in speech
,” in
Proceedings of the Interspeech 2011
, Florence, Italy, pp.
3281
3284
.
30.
Schiel
,
F.
, and
Heinrich
,
C.
(
2009
). “
Laying the foundation for in-car alcohol detection by speech
,” in
Proceedings of the Interspeech 2009
(International Speech Communication Association, Brighton, UK), pp.
983
986
.
31.
Schiel
,
F.
,
Heinrich
,
C.
, and
Barfüsser
,
S.
(
2012
). “
Alcohol language corpus: The first public corpus of alcoholized German speech
,”
Lang. Resour. Eval.
46
,
503
521
.
32.
Schiel
,
F.
,
Heinrich
,
C.
, and
Neumeyer
,
V.
(
2010
). “
Rhythm and formant features for automatic alcohol detection
,” in
Proceedings of the Interspeech 2010
(International Speech Communication Association, Makuhari, Japan), pp.
458
461
.
33.
Schuller
,
B.
,
Steidl
,
S.
,
Batliner
,
A.
,
Schiel
,
F.
,
Krajevski
,
J.
,
Weniger
,
F.
, and
Eyben
,
F.
(
2012
). “
Medium-term speaker states—A review on intoxication, sleepiness and the first challenge
,”
Comput. Speech Lang.
28
,
346
374
.
34.
Sigmund
,
M.
, and
Zelinka
,
P.
(
2011
). “
Analysis of voiced speech excitation due to alcohol intoxication
,”
Inf. Technol. Control
40
,
145
150
.
35.
Sobell
,
L. C.
,
Sobell
,
M. B.
, and
Coleman
,
R. F.
(
1982
). “
Alcohol-induced dysfluency in nonalcoholics
,”
Folia Phoniatr.
34
,
316
323
.
36.
Trojan
,
F.
, and
Kryspin-Exner
,
K.
(
1968
). “
The decay of articulation under the influence of alcohol and paraldehyde
,”
Folia Phoniatr.
20
,
217
238
.
37.
Wagner
,
P.
, and
Dellwo
,
V.
(
2004
). “
Introducing YARD (yet another rhythm determination) and re-introducing isochrony to rhythm research
,” in
Proceedings of Speech Prosody 2004
(School of Frontier Sciences, University of Tokyo, Nara, Japan), pp.
227
230
.
38.
Xie
,
Z.
, and
Niyogi
,
P.
(
2006
). “
Robust acoustic-based syllable detection
,” in
Proceedings of Interspeech 2006
(International Speech Communication Association, Pittsburgh, PA), pp.
1571
1574
.
39.
Yildirim
,
S.
,
Bulut
,
M.
,
Lee
,
C.
,
Kazemzadeh
,
A.
,
Busso
,
C.
,
Deng
,
Z.
,
Lee
,
S.
, and
Narayanan
,
S.
(
2004
). “
An acoustic study of emotions expressed in speech
,” in
Proceedings of International Conference on Speech and Language Processing 2004
(International Speech Communication Association, Jeju Island, Korea), pp.
2193
2196
.
You do not currently have access to this content.