An audio-visual corpus has been collected to support the use of common material in speech perception and automatic speech recognition studies. The corpus consists of high-quality audio and video recordings of 1000 sentences spoken by each of 34 talkers. Sentences are simple, syntactically identical phrases such as “place green at B 4 now.” Intelligibility tests using the audio signals suggest that the material is easily identifiable in quiet and low levels of stationary noise. The annotated corpus is available on the web for research use.
REFERENCES
1.
Ainsworth
, W. A.
, and Meyer
, G. F.
(1994
). “Recognition of plosive syllables in noise: Comparison of an auditory model with human performance
,” J. Acoust. Soc. Am.
96
, 687
–694
.2.
ANSI
(1997
). ANSI S3.5-1997, “American National Standard Methods for the Calculation of the Speech Intelligibility Index
” (American National Standards Institute, New York).3.
Bolia
, R. S.
, Nelson
, W. T.
, Ericson
, M. A.
, and Simpson
, B. D.
(2000
). “A speech corpus for multitalker communications research
,” J. Acoust. Soc. Am.
107
, 1065
–1066
.4.
Brungart
, D. S.
, Simpson
, B. D.
, Ericson
, M. A.
, and Scott
, K. R.
(2001
). “Informational and energetic masking effects in the perception of multiple simultaneous talkers
,” J. Acoust. Soc. Am.
100
, 2527
–2538
.5.
Cooke
, M. P.
(2006
). “A glimpsing model of speech perception in noise
,” J. Acoust. Soc. Am.
119
, 1562
–1573
.6.
French
, N.
, and Steinberg
, J.
(1947
). “Factors governing the intelligibility of speech sounds
,” J. Acoust. Soc. Am.
19
, 90
–119
.7.
Ghitza
, O.
(1993
). “Adequacy of auditory models to predict human internal representation of speech sounds
,” J. Acoust. Soc. Am.
93
, 2160
–2171
.8.
Holube
, I.
, and Kollmeier
, B.
(1996
). “Speech intelligibility prediction in hearing-impaired listeners based on a psychoacoustically motivated perception model
,” J. Acoust. Soc. Am.
100
, 1703
–1716
.9.
ITU-T
(1993
). “Objective measurement of active speech level
,” ITU-T Recommendation P. 56
.10.
Moore
, T.
(1981
). “Voice communication jamming research
,” in AGARD Conference Proceedings 331: Aural Communication in Aviation
, Neuilly-Sur-Seine
, France, 2
–1
2
–6
.11.
Potamianos
, G.
, Neti
, C.
, Gravier
, G.
, Garg
, A.
, and Senior
, A. W.
(2003
). “Recent advances in the automatic recognition of audiovisual speech
,” Proc. IEEE
91
, 1306
–1326
.12.
Rosenblum
, L. D.
(2002
). “The perceptual basis for audiovisual integration
,” in Proceedings International Conference on Spoken Language Processing
, 1461
–1464
.13.
Steeneken
, H.
, and Houtgast
, T.
(1980
). “A physical method for measuring speech-transmission quality
,” J. Acoust. Soc. Am.
67
, 318
–326
.14.
Young
, S.
, Kershaw
, D.
, Odell
, J.
, Ollason
, D.
, Valtchev
, V.
, and Woodland
, P.
(1999
). The HTK Book 2.2
(Entropic
, Cambridge).© 2006 Acoustical Society of America.
2006
Acoustical Society of America
You do not currently have access to this content.