Several algorithms have been shown to generate a metric corresponding to the Speech Transmission Index (STI) using speech as a probe stimulus [e.g., Goldsworthy and Greenberg, J. Acoust. Soc. Am. 116, 3679–3689 (2004)]. The time-domain approaches work well on long speech segments and have the added potential to be used for short-time analysis. This study investigates the performance of the Envelope Regression (ER) time-domain STI method as a function of window length, in acoustically degraded environments with multiple talkers and speaking styles. The ER method is compared with a short-time Theoretical STI, derived from octave-band signal-to-noise ratios and reverberation times. For windows as short as 0.3 s, the ER method tracks short-time Theoretical STI changes in stationary speech-shaped noise, fluctuating restaurant babble and stationary noise plus reverberation. The metric is also compared to intelligibility scores on conversational speech and speech articulated clearly but at normal speaking rates (Clear/Norm) in stationary noise. Correlation between the metric and intelligibility scores is high and, consistent with the subject scores, the metrics are higher for Clear/Norm speech than for conversational speech and higher for the first word in a sentence than for the last word.

1.
ANSI
(
1997
). ANSI-S3.5-1997,
Methods for Calculation of the Speech Intelligibility Index
(
American National Standards Institute
,
New York)
.
2.
Boldt
,
J. B.
, and
Ellis
,
D. P. W.
(
2009
). “
A simple correlation-based model of intelligibility for nonlinear speech enhancement and separation
,” in
Proc. 17th European Sig. Process. Conf.
(EURASIP, Lausanne, Switzerland), pp.
1849
1853
.
3.
Doubbelboer
,
F.
, and
Houtgast
,
T.
(
2007
). “
A detailed study on the effects of noise on speech intelligibility
,”
J. Acoust. Soc. Am.
122
,
2865
2871
.
4.
Drullman
,
R.
(
1995
). “
Temporal envelope and fine structure cues for speech intelligibility
,”
J. Acoust. Soc. Am.
97
,
585
592
.
5.
Drullman
,
R.
,
Festen
,
J. M.
, and
Plomp
,
R.
(
1994
). “
Effect of reducing slow temporal modulations on speech reception
,”
J. Acoust. Soc. Am.
95
,
2670
2680
.
6.
Falk
,
T. H.
,
Zheng
,
C.
, and
Chan
,
W.-Y.
(
2010
). “
A non-intrusive quality and intelligibility measure of reverberant and dereverberated speech
,”
IEEE Trans. Audio Speech Lang. Process.
18
,
1766
1774
.
7.
Gallun
,
F.
, and
Souza
,
P.
(
2008
). “
Exploring the role of the modulation spectrum in phoneme recognition
,”
Ear Hear.
29
,
800
813
.
8.
George
,
E. L. J.
,
Festen
,
J. M.
, and
Houtgast
,
T.
(
2008
). “
The combined effects of reverberation and nonstationary noise on sentence intelligibility
,”
J. Acoust. Soc. Am.
124
,
1269
1277
.
9.
Goldsworthy
,
R. L.
, and
Greenberg
,
J. E.
(
2004
). “
Analysis of speech-based speech transmission index methods with implications for nonlinear operations
,”
J. Acoust. Soc. Am.
116
,
3679
3689
.
10.
Houtgast
,
T.
, and
Steeneken
,
H. J. M.
(
1973
). “
The modulation transfer function in room acoustics as a predictor of speech intelligibility
,”
Acustica
28
,
66
73
.
11.
Houtgast
,
T.
, and
Steeneken
,
H. J. M.
(
1980
). “
Predicting speech intelligibility in rooms from the modulation transfer function. I. General room acoustics
,”
Acustica
46
,
60
72
.
12.
Houtgast
,
T.
, and
Steeneken
,
H. J. M.
(
1985
). “
A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria
,”
J. Acoust. Soc. Am.
77
,
1069
1077
.
13.
IEC
(
1998
). “
Part 16: Objective rating of speech intelligibility by speech transmission index
(2nd edition),” in
IEC 60268 Sound System Equipment
(
Int. Electrotech. Commiss.
,
Geneva, Switzerland
).
14.
IEC
(
2003
). “
Part 16: Objective rating of speech intelligibility by speech transmission index
(3rd edition),” in
IEC 60268 Sound System Equipment
(
Int. Electrotech. Commiss.
,
Geneva, Switzerland
).
15.
IEC
(
2011
). “
Part 16: Objective rating of speech intelligibility by speech transmission index
(4th edition),” in
IEC 60268 Sound System Equipment
(
Int. Electrotech. Commiss.
,
Geneva, Switzerland
).
16.
Kates
,
J. M.
(
1987
). “
The short-time articulation index
,”
J. Rehab. Res. Develop.
24
,
271
276
.
17.
Krause
,
J. C.
(
2001
). “
Properties of naturally produced clear speech at normal rates and implications for intelligibility enhancement
,” Ph.D. dissertation, Dep. Electon. Eng. Comput. Sci. (
Mass. Inst. Technol., Cambridge, MA
).
18.
Krause
,
J. C.
, and
Braida
,
L. D.
(
2002
). “
Investigating alternative forms of clear speech: The effects of speaking rate and speaking mode on intelligibility
,”
J. Acoust. Soc. Am.
112
,
2165
2172
.
19.
Krause
,
J. C.
, and
Braida
,
L. D.
(
2004
). “
Acoustic properties of naturally produced clear speech at normal speaking rates
,”
J. Acoust. Soc. Am.
115
,
362
378
.
20.
Ludvigsen
,
C.
(
1993
). “
The use of objective methods to predict the intelligibility of hearing aid processing speech
,” in
Recent Developments in Hearing Instrument Technology
, edited by
J.
Beilin
and
G. R.
Jensen
(
Danavox/Stougaard Jensen
,
Copenhagen, Denmark
), pp.
81
94
.
21.
Ludvigsen
,
C.
,
Elberling
,
C.
, and
Keidser
,
G.
(
1993
). “
Evaluation of a noise reduction method-comparison between observed scores and scores predicted from STI
,”
Scand. Audiol.
22
,
50
55
.
22.
Ludvigsen
,
C.
,
Elberling
,
C.
,
Keidser
,
G.
, and
Poulsen
,
T.
(
1990
). “
Prediction of intelligibility of non-linearly processed speech
,”
Acta Otolaryngol. Suppl.
469
,
190
195
.
23.
Ma
,
J.
,
Hu
,
Y.
, and
Loizou
,
P.
(
2009
). “
Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions
,”
J. Acoust. Soc. Am.
125
,
3387
3405
.
24.
Payton
,
K. L.
, and
Braida
,
L. D.
(
1999
). “
A method to determine the speech transmission index from speech waveforms
,”
J. Acoust. Soc. Am.
106
,
3637
3648
.
25.
Payton
,
K. L.
,
Braida
,
L. D.
,
Chen
,
S.
,
Rosengard
,
P.
, and
Goldsworthy
,
R.
(
2002
). “
Computing the STI using speech as a probe stimulus
,” in
Past, Present and Future of the Speech Transmission Index
, edited by
S. J. v.
Wijngaarden
(
TNO Human Factors, Soesterburg
,
The Netherlands
), Chap. 11, pp.
125
138
.
26.
Payton
,
K. L.
, and
Shrestha
,
M.
(
2008a
). “
Analysis of short-time speech transmission index algorithms
,” in
Acoustics'08 Paris
,
633
638
.
27.
Payton
,
K. L.
, and
Shrestha
,
M.
(
2008b
). “
Evaluation of short-time speech-based intelligibility metrics
,” in
9th Internat. Congress on Noise as a Public Health Problem
, edited by
B.
Griefahn
(
IfADo
,
Dortmund, Germany
), pp.
243
251
.
28.
Payton
,
K. L.
,
Uchanski
,
R. M.
, and
Braida
,
L. D.
(
1994
). “
Intelligibility of conversational and clear speech in noise and reverberation for listeners with normal and impaired hearing
,”
J. Acoust. Soc. Am.
95
,
1581
1592
.
29.
Peterson
,
P. M.
(
1986
). “
Simulating the response of multiple microphones to a single acoustic source in a reverberant room
,”
J. Acoust. Soc. Am.
80
,
1527
1529
.
30.
Picheny
,
M. A.
,
Durlach
,
N. I.
, and
Braida
,
L. D.
(
1985
). “
Speaking clearly for the hard of hearing I: Intelligibility differences between clear and conversational speech
,”
J. Speech Hear. Res.
28
,
96
103
.
31.
Rhebergen
,
K. S.
, and
Versfeld
,
N. J.
(
2005
). “
A speech intelligibility index–based approach to predict the speech reception threshold for sentences in fluctuating noise for normal-hearing listeners
,”
J. Acoust. Soc. Am.
117
,
2181
2192
.
32.
Rhebergen
,
K. S.
,
Versfeld
,
N. J.
, and
Dreschler
,
W. A.
(
2006
). “
Extended speech intelligibility index for the prediction of the speech reception threshold in fluctuating noise
,”
J. Acoust. Soc. Am.
120
,
3988
3997
.
33.
Schlesinger
,
A.
(
2012
). “
Transient-based speech transmission index for predicting intelligibility in nonlinear speech enhancement processors
,” in
Proc. IEEE Internat. Conf. Acoust. Speech Sig. Process.
(IEEE, Piscataway, NJ), pp.
3993
3996
.
34.
Taal
,
C. H.
,
Hendriks
,
R. C.
,
Heudens
,
R.
, and
Jensen
,
J.
(
2011
). “
An algorithm for intelligibility prediction of time-frequency weighted noisy speech
,”
IEEE Trans. Audio Speech Lang. Process.
19
,
2125
2136
.
You do not currently have access to this content.