An audio-visual corpus has been collected to support the use of common material in speech perception and automatic speech recognition studies. The corpus consists of high-quality audio and video recordings of 1000 sentences spoken by each of 34 talkers. Sentences are simple, syntactically identical phrases such as “place green at B 4 now.” Intelligibility tests using the audio signals suggest that the material is easily identifiable in quiet and low levels of stationary noise. The annotated corpus is available on the web for research use.

1.
Ainsworth
,
W. A.
, and
Meyer
,
G. F.
(
1994
). “
Recognition of plosive syllables in noise: Comparison of an auditory model with human performance
,”
J. Acoust. Soc. Am.
96
,
687
694
.
2.
ANSI
(
1997
). ANSI S3.5-1997, “
American National Standard Methods for the Calculation of the Speech Intelligibility Index
” (American National Standards Institute, New York).
3.
Bolia
,
R. S.
,
Nelson
,
W. T.
,
Ericson
,
M. A.
, and
Simpson
,
B. D.
(
2000
). “
A speech corpus for multitalker communications research
,”
J. Acoust. Soc. Am.
107
,
1065
1066
.
4.
Brungart
,
D. S.
,
Simpson
,
B. D.
,
Ericson
,
M. A.
, and
Scott
,
K. R.
(
2001
). “
Informational and energetic masking effects in the perception of multiple simultaneous talkers
,”
J. Acoust. Soc. Am.
100
,
2527
2538
.
5.
Cooke
,
M. P.
(
2006
). “
A glimpsing model of speech perception in noise
,”
J. Acoust. Soc. Am.
119
,
1562
1573
.
6.
French
,
N.
, and
Steinberg
,
J.
(
1947
). “
Factors governing the intelligibility of speech sounds
,”
J. Acoust. Soc. Am.
19
,
90
119
.
7.
Ghitza
,
O.
(
1993
). “
Adequacy of auditory models to predict human internal representation of speech sounds
,”
J. Acoust. Soc. Am.
93
,
2160
2171
.
8.
Holube
,
I.
, and
Kollmeier
,
B.
(
1996
). “
Speech intelligibility prediction in hearing-impaired listeners based on a psychoacoustically motivated perception model
,”
J. Acoust. Soc. Am.
100
,
1703
1716
.
9.
ITU-T
(
1993
). “
Objective measurement of active speech level
,” ITU-T Recommendation P.
56
.
10.
Moore
,
T.
(
1981
). “
Voice communication jamming research
,” in
AGARD Conference Proceedings 331: Aural Communication in Aviation
,
Neuilly-Sur-Seine
, France,
2
1
2
6
.
11.
Potamianos
,
G.
,
Neti
,
C.
,
Gravier
,
G.
,
Garg
,
A.
, and
Senior
,
A. W.
(
2003
). “
Recent advances in the automatic recognition of audiovisual speech
,”
Proc. IEEE
91
,
1306
1326
.
12.
Rosenblum
,
L. D.
(
2002
). “
The perceptual basis for audiovisual integration
,” in
Proceedings International Conference on Spoken Language Processing
,
1461
1464
.
13.
Steeneken
,
H.
, and
Houtgast
,
T.
(
1980
). “
A physical method for measuring speech-transmission quality
,”
J. Acoust. Soc. Am.
67
,
318
326
.
14.
Young
,
S.
,
Kershaw
,
D.
,
Odell
,
J.
,
Ollason
,
D.
,
Valtchev
,
V.
, and
Woodland
,
P.
(
1999
).
The HTK Book 2.2
(
Entropic
, Cambridge).
You do not currently have access to this content.