Non-contemporaneous speech samples from 27 male speakers of Australian English were compared in a forensic likelihood-ratio framework. Parametric curves (polynomials and discrete cosine transforms) were fitted to the formant trajectories of the diphthongs ∕aɪ∕, ∕eɪ∕, ∕oʊ∕, ∕aʊ∕, and ∕ɔɪ∕. The estimated coefficient values from the parametric curves were used as input to a generative multivariate-kernel-density formula for calculating likelihood ratios expressing the probability of obtaining the observed difference between two speech samples under the hypothesis that the samples were produced by the same speaker versus under the hypothesis that they were produced by different speakers. Cross-validated likelihood-ratio results from systems based on different parametric curves were calibrated and evaluated using the log-likelihood-ratio cost function (Cllr). The cross-validated likelihood ratios from the best-performing system for each vowel phoneme were fused using logistic regression. The resulting fused system had a very low error rate, thus meeting one of the requirements for admissibility in court.

1.
Aitken
,
C. G. G.
, and
Lucy
,
D.
(
2004
). “
Evaluation of trace evidence in the form of multivariate data
,”
Appl. Stat.
54
,
109
122
.
2.
Aitken
,
C. G. G.
, and
Stoney
,
D. A.
(
1991
).
The Use of Statistics in Forensic Science
(
Horwood
,
Chichester, UK
).
3.
Aitken
,
C. G. G.
, and
Taroni
,
F.
(
2004
).
Statistics and the Evaluation of Evidence for Forensic Scientists
(
Wiley
,
Chichester, UK
).
4.
Balding
,
D. J.
(
2005
).
Weight of Evidence for Forensic DNA Profiles
(
Wiley
,
Chichester, UK
).
5.
Boersma
,
P.
(
1993
). “
Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound
,”
Proceedings of the Institute of Phonetic Sciences
(
Institute of Phonetic Sciences
,
Amsterdam
), Vol.
17
, pp.
97
110
.
6.
Broeders
,
A. P. A.
(
1999
). “
Some observations on the use of probability scales in forensic identification
,”
Forensic Ling.
6
,
1350
1771
.
7.
Broeders
,
A. P. A.
(
2004
). “
Forensic speech and audio analysis forensic linguistics: A review 2001 to 2004
,” in
14th International Forensic Science Symposium Review Papers
, edited by
N.
Nic Daéid
(
Interpol
,
Lyon, France
), pp.
170
188
.
8.
Brümmer
,
N.
,
Burget
,
L.
,
Cernocký
,
J. H.
,
Glembek
,
O.
,
Grézl
,
F.
,
Karafiát
,
M.
,
van Leeuwen
,
D. A.
,
Matejka
,
P.
,
Schwarz
,
P.
, and
Strasheim
,
A.
(
2007
). “
Fusion of heterogenous speaker recognition systems in the STBU submission for the NIST SRE 2006
,”
IEEE Trans. Audio, Speech, Lang. Process.
15
,
2072
2084
.
9.
Brümmer
,
N.
, and
du Preez
,
J.
(
2006
). “
Application independent evaluation of speaker detection
,”
Comput. Speech Lang.
20
,
230
275
.
10.
Cox
,
F.
, and
Palethorpe
,
S.
(
2007
). “
Australian English
,”
J. Int. Phonetic Assoc.
37
,
341
350
.
11.
Eriksson
,
E. J.
,
Cepeda
,
L. F.
,
Rodman
,
R. D.
,
McAllister
,
D. F.
,
Bitzer
,
D.
, and
Arroway
,
P.
(
2004
). “
Cross-language speaker identification using spectral moments
,” in
Proceedings of FONETIK 2004: The XVIIth Swedish Phonetics Conference
, edited by
P.
Branderud
and
H.
Traunmüller
(
Department of Linguistics
,
Stockholm University, Sweden
), pp.
76
79
.
12.
Eriksson
,
E. J.
,
Cepeda
,
L. F.
,
Rodman
,
R. D.
,
Sullivan
,
K. P. H.
,
McAllister
,
D. F.
,
Bitzer
,
D.
, and
Arroway
,
P.
(
2004
). “
Robustness of spectral moments: A study using voice imitations
,” in
Proceedings of the Tenth Australian International Conference on Speech Sciences and Technology
, edited by
S.
Cassidy
,
F.
Cox
,
R.
Mannell
, and
S.
Palethorpe
(
Australian Speech Science and Technology Association
,
Canberra, Australia
), pp.
259
264
.
13.
Evett
,
I. W.
(
1998
). “
Towards a uniform framework for reporting opinions in forensic science casework
,”
Sci. Justice
38
,
198
202
.
14.
Friedman
,
R. D.
(
1996
). “
Assessing evidence
,”
Mich. Law Rev.
94
,
1810
1838
.
15.
Goldstein
,
U. G.
(
1976
). “
Speaker-identifying features based on formant tracks
,”
J. Acoust. Soc. Am.
59
,
176
182
.
16.
González-Rodríguez
,
J.
,
Drygajlo
,
A.
,
Ramos
,
D.
,
García-Gomar
,
M.
, and
Ortega-García
,
J.
(
2006
). “
Robust estimation, interpretation and assessment of likelihood ratios in forensic speaker recognition
,”
Comput. Speech Lang.
20
,
331
355
17.
González-Rodríguez
,
J.
,
Rose
,
P.
,
Ramos
,
D.
,
Toledano
,
D. T.
, and
Ortega-García
,
J.
(
2007
). “
Emulating DNA: Rigorous quantification of evidential weight in transparent and testable forensic speaker recognition
,”
IEEE Trans. Audio, Speech, Lang. Process.
15
,
2104
2115
.
18.
Gottfried
,
M.
,
Miller
,
J. D.
, and
Meyer
,
D. J.
(
1993
). “
Three approaches to the classification of American English diphthongs
,”
J. Phonetics
21
,
205
229
.
19.
Greisbach
,
R.
,
Esser
,
O.
, and
Weinstock
,
C.
(
1995
). “
Speaker identification by formant contours
,” in
Studies in Forensic Phonetics
, edited by
A.
Braun
and
J.-P.
Köster
(
Wissenschaftlicher
,
Trier, Germany
), pp.
49
55
.
20.
Hillenbrand
,
J. M.
,
Clark
,
M. J.
, and
Nearey
,
T. N.
(
2001
). “
Effect of consonant environment on vowel formant patterns
,”
J. Acoust. Soc. Am.
109
,
748
763
.
21.
Hodgson
,
D.
(
2002
). “
A lawyer looks at Bayes’ theorem
,”
The Aust. Law J.
76
,
109
118
.
22.
Ingram
,
J. C. L.
,
Prandolini
,
R.
, and
Ong
,
S.
(
1996
). “
Formant trajectories as indices of speaker identification
,”
Forensic Ling. The Int. J. of Speech Lang. and the Law
3
,
129
145
.
23.
Ishihara
,
S.
, and
Kinoshita
,
Y.
(
2008
). “
How many do we need? Exploration of the population size effect on the performance of forensic speaker classification
,”
Proceedings of Interspeech 2008 Incorporating SST 2008
, International Speech Communication Association, pp.
1941
1944
.
24.
Jessen
,
M.
(
2008
). “
Forensic phonetics
,”
Lang. and Ling. Compass
2
,
671
711
.
25.
Kasuya
,
H.
,
Tan
,
X.
, and
Yang
,
C.-S.
(
1994
). “
Voice source and vocal tract characteristics associated with speaker individuality
,”
Proceedings of the Third International Conference on Spoken-Language Processing, Yokohama
, pp.
1459
1462
.
26.
Kinoshita
,
Y.
, and
Osanai
,
T.
(
2006
). “
Within speaker variation in diphthongal dynamics: What can we compare
?” in
Proceedings of the 11th Australasian International Conference on Speech Science and Technology, Auckland, New Zealand
, edited by
P.
Warren
and
C. I.
Watson
(
Australasian Speech Science and Technology Association
,
Canberra, Australia
), pp.
112
117
.
27.
Lucy
,
D.
(
2005
).
Introduction to Statistics for Forensic Scientists
(
Wiley
,
Chichester, UK
).
28.
Markel
,
J. D.
, and
Gray
,
A. H.
(
1976
).
Linear Prediction of Speech
(
Springer-Verlag
,
Berlin
).
29.
McDougall
,
K.
(
2004
). “
Speaker-specific formant dynamics: An experiment on Australian English ∕aɪ∕
,”
Int. J. of Speech, Lang. and the Law
11
,
103
130
.
30.
McDougall
,
K.
(
2006
). “
Dynamic features of speech and the characterization of speakers
,”
Int. J. of Speech, Lang. and the Law
13
,
89
126
.
31.
McDougall
,
K.
, and
Nolan
,
F.
(
2007
). “
Discrimination of speakers using the formant dynamics of ∕u∕ in British English
,” in
Proceedings of the 16th International Congress on Phonetic Sciences, Saarbrücken
, edited by
J.
Trouvain
and
W. J.
Barry
(
Universität des Saarlandes
,
Saarbrücken, Germany
), pp.
1825
1828
.
32.
Menard
,
S.
(
2002
).
Applied Logistic Regression Analysis
, 2nd ed. (
Sage
,
Thousand Oaks, CA
).
33.
Morrison
,
G. S.
(
2008
). “
Forensic voice comparison using likelihood ratios based on polynomial curves fitted to the formant trajectories of Australian English ∕aɪ∕
,”
Int. J. of Speech, Lang. and the Law
15
, pp.
247
264
.
34.
Morrison
,
G. S.
, and
Kinoshita
,
Y.
(
2008
). “
Automatic-type calibration of traditionally derived likelihood ratios: Forensic analysis of Australian English ∕o∕ formant trajectories
,”
Proceedings of Interspeech 2008 Incorporating SST 2008
,
International Speech Communication Association
, pp.
1501
1504
.
35.
Nearey
,
T. M.
, and
Assmann
,
P. F.
(
1986
). “
Modeling the role of vowel inherent spectral change in vowel identification
,”
J. Acoust. Soc. Am.
80
,
1297
1308
.
36.
Nearey
,
T. M.
,
Assmann
,
P. F.
, and
Hillenbrand
,
J. M.
(
2002
). “
Evaluation of a strategy for automatic formant tracking
,”
J. Acoust. Soc. Am.
112
,
2323
.
37.
Nolan
,
F.
(
1997
). “
Speaker recognition and forensic phonetics
,” in
The Handbook of Phonetic Sciences
, edited by
W. J.
Hardcastle
and
J.
Laver
(
Blackwell
,
Oxford
), pp.
744
767
.
38.
Nolan
,
F.
, and
Grigoras
,
C.
(
2005
). “
A case for formant analysis in forensic speaker identification
,”
Int. J. of Speech, Lang. and the Law
12
,
144
173
.
39.
Pigeon
,
S.
,
Druyts
,
P.
, and
Verlinde
,
P.
(
2000
). “
Applying logistic regression to the fusion of the NIST’99 1-speaker submissions
,”
Digit. Signal Process.
10
,
237
248
.
40.
Ramos Castro
,
D.
(
2007
). “
Forensic evaluation of the evidence using automatic speaker recognition systems
,” Ph.D. dissertation,
Universidad Autónoma de Madrid
, Madrid, Spain.
41.
Rodman
,
R.
,
McAllister
,
D.
,
Bitzer
,
D.
,
Cepeda
,
L.
, and
Abbitt
,
P.
(
2002
). “
Forensic speaker identification based on spectral moments
,”
Int. J. of Speech, Lang. and the Law
9
,
22
43
.
42.
Rose
,
P.
(
2002
).
Forensic Speaker Identification
(
Taylor & Francis
,
London
).
43.
Rose
,
P.
(
2003
).
Expert Evidence, Issue 99: The Technical Comparison of Forensic Voice Samples
(
Thomson Lawbook
,
Sydney, Australia
).
44.
Rose
,
P.
(
2005
). “
Forensic speaker recognition at the beginning of the twenty-first century: An overview and a demonstration
,”
Aus. J. of Forensic Sci.
37
,
49
72
.
45.
Rose
,
P.
(
2006a
). “
Accounting for correlation in linguistic-acoustic likelihood ratio-based forensic speaker discrimination
,” in
Speaker and Language Recognition Workshop, 2006. IEEE Odyssey 2006
, edited by
K.
Berkling
and
P. A.
Torres-Carrasquillo
, pp.
1
8
.
46.
Rose
,
P.
(
2006b
). “
Technical forensic speaker recognition: Evaluation, types and testing of evidence
,”
Comput. Speech Lang.
20
,
159
191
.
47.
Rose
,
P.
(
2006c
). “
The intrinsic forensic discriminatory power of diphthongs
,” in
Proceedings of the 11th Australasian International Conference on Speech Science and Technology, Auckland, New Zealand
, edited by
P.
Warren
and
C. I.
Watson
(
Australasian Speech Science and Technology Association
,
Canberra, Australia
), pp.
64
69
.
48.
Rose
,
P.
(
2007
). “
Forensic speaker discrimination with Australian English vowel acoustics
,” in
Proceedings of the 16th International Congress on Phonetic Sciences, Saarbrücken
, edited by
J.
Trouvain
and
W. J.
Barry
(
Universität des Saarlandes
,
Saarbrücken, Germany
), pp.
1817
1820
.
49.
Rose
,
P.
,
Kinoshita
,
Y.
, and
Alderman
,
T.
(
2006
). “
Realistic extrinsic forensic speaker discrimination with the diphthong ∕aɪ∕
,” in
Proceedings of the 11th Australasian International Conference on Speech Science and Technology, Auckland, New Zealand
, edited by
P.
Warren
and
C. I.
Watson
(
Australasian Speech Science and Technology Association
,
Canberra, Australia
), pp.
329
334
.
50.
Rose
,
P.
,
Kinoshita
,
Y.
, and
Ishihara
,
S.
(
2008
). “
Beyond the long-term mean: Exploring the potential of F0 distribution parameters in traditional forensic speaker recognition
,”
Proceedings of the Odyssey Speaker and Language Recognition Workshop 2008
,
Stellenbosch, South Africa
, pp.
329
334
.
51.
van Leeuwen
,
D. A.
, and
Brümmer
,
N.
(
2007
). “
An introduction to application-independent evaluation of speaker recognition systems
,” in
Speaker Classification I: Selected Projects
, edited by
C.
Müller
(
Springer
,
Heidelberg, Germany
), pp.
330
353
.
52.
Watson
,
C. I.
, and
Harrington
,
J.
(
1999
). “
Acoustic evidence of dynamic formant trajectories in Australian English vowels
,”
J. Acoust. Soc. Am.
106
,
458
468
.
53.
Zahorian
,
S. A.
, and
Jagharghi
,
A. J.
(
1991
). “
Speaker normalization of static and dynamic vowel spectral features
,”
J. Acoust. Soc. Am.
90
,
67
75
.
54.
Zahorian
,
S. A.
, and
Jagharghi
,
A.
(
1993
). “
Spectral-shape features versus formants as acoustic correlates for vowels
,”
J. Acoust. Soc. Am.
94
,
1966
1982
.
55.
Zhang
,
C.
,
Morrison
,
G. S.
, and
Rose
,
P.
(
2008
). “
Forensic speaker recognition in Chinese: A multivariate likelihood ratio discrimination on ∕ɪ∕ and ∕y∕
,”
Proceedings of Interspeech 2008 Incorporating SST 2008
,
International Speech Communication Association
, pp.
1937
1940
.
You do not currently have access to this content.