An objective metric that predicts speech intelligibility under different types of noise and distortion would be desirable in voice communication. To date, the majority of studies concerning speech intelligibility metrics have focused on predicting the effects of individual noise or distortion mechanisms. This study proposes an objective metric, the spectrogram orthogonal polynomial measure (SOPM), that attempts to predict speech intelligibility for people with normal hearing under adverse conditions. The SOPM metric is developed by extracting features from the spectrogram using Krawtchouk moments. The metric's performance is evaluated for several types of noise (steady-state and fluctuating noise), distortions (peak clipping, center clipping, and phase jitters), ideal time-frequency segregation, and reverberation conditions both in quiet and noisy environments. High correlation (0.97–0.996) is achieved with the proposed metric when evaluated with subjective scores by normal-hearing subjects under various conditions.

1.
Akter
,
K.
, and
Mamun
,
N.
(
2019
). “
Predicting speech intelligibility with the regeneration of envelope from TFS cues for hearing impaired listeners
,” in
Proceedings of the 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE)
, February 7–9, Cox's Bazar, Bangladesh.
2.
Anderson
,
B. W.
, and
Kalb
,
J. T.
(
1987
). “
English verification of the STI method for estimating speech intelligibility of a communications channel
,”
J. Acoust. Soc. Am.
81
,
1982
1985
.
3.
ANSI
(
1997
). ANSI S3.5-1997,
Methods for the Calculation of the Speech Intelligibility Index
(
American National Standards Institute
,
New York
).
4.
Bellamy
,
J.
(
1991
).
Digital Telephony
(
Wiley
,
New York
).
5.
Brungart
,
D. S.
,
Chang
,
P. S.
,
Simpson
,
B. D.
, and
Wang
,
D.
(
2006
). “
Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation
,”
J. Acoust. Soc. Am.
120
,
4007
4018
.
6.
Cui
,
X.
, and
Alwan
,
A.
(
2005
). “
Noise robust speech recognition using feature compensation based on polynomial regression of utterance SNR
,”
Speech Audio Process., IEEE Trans.
13
,
1161
1172
.
7.
Donohue
,
K.
(
2009
). “
Audio systems array processing toolbox (computer program)
,” Audio Systems Laboratory, Department of Electrical and Computer Engineering, University of Kentucky, http://web.engr.uky.edu/∼donohue/audio/Arrays/MAToolbox.htm (Last viewed January 2021).
8.
Elhilali
,
M.
,
Chi
,
T.
, and
Shamma
,
S. A.
(
2003
). “
A spectro-temporal modulation index (STMI) for assessment of speech intelligibility
,”
Speech Commun.
41
,
331
348
.
9.
Falk
,
T. H.
,
Zheng
,
C.
, and
Chan
,
W.-Y.
(
2010
). “
A non-intrusive quality and intelligibility measure of reverberant and dereverberated speech
,”
Audio Speech Lang. Process. IEEE Trans.
18
,
1766
1774
.
10.
French
,
N.
, and
Steinberg
,
J.
(
1947
). “
Factors governing the intelligibility of speech sounds
,”
J. Acoust. Soc. Am.
19
,
90
119
.
11.
Goldsworthy
,
R. L.
, and
Greenberg
,
J. E.
(
2004
). “
Analysis of speech-based speech transmission index methods with implications for nonlinear operations
,”
J. Acoust. Soc. Am.
116
,
3679
3689
.
12.
Hines
,
A.
, and
Harte
,
N.
(
2012
). “
Speech intelligibility prediction using a neurogram similarity index measure
,”
Speech Commun.
54
,
306
320
.
13.
Houtgast
,
T.
, and
Steeneken
,
H. J.
(
1985
). “
A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria
,”
J. Acoust. Soc. Am.
77
,
1069
1077
.
14.
Jørgensen
,
S.
, and
Dau
,
T.
(
2013
). “
Modelling speech intelligibility in adverse conditions
,” in
Basic Aspects of Hearing
(
Springer
,
New York
), pp.
343
-
351
.
15.
Kates
,
J. M.
, and
Arehart
,
K. H.
(
2005
). “
Coherence and the speech intelligibility index
,”
J. Acoust. Soc. Am.
117
,
2224
2237
.
16.
Kates
,
J. M.
, and
Arehart
,
K. H.
(
2014
). “
The hearing-aid speech perception index (HASPI)
,”
Speech Commun.
65
,
75
93
.
17.
Khotanzad
,
A.
, and
Hong
,
Y. H.
(
1990
). “
Invariant image recognition by Zernike moments
,”
IEEE Trans. Pattern Anal. Mach. Intell.
12
,
489
497
.
18.
Kjems
,
U.
,
Boldt
,
J. B.
,
Pedersen
,
M. S.
,
Lunner
,
T.
, and
Wang
,
D.
(
2009
). “
Role of mask pattern in intelligibility of ideal binary-masked noisy speech
,”
J. Acoust. Soc. Am.
126
,
1415
1426
.
19.
Kryter
,
K. D.
(
1962
). “
Methods for the calculation and use of the articulation index
,”
J. Acoust. Soc. Am.
34
,
1689
1697
.
20.
Lee
,
E.
, and
Messerschmitt
,
D. G.
(
1994
).
Digital Communication
(
Kluwer Academic
,
Boston, MA
).
21.
Liao
,
S. X.
, and
Pawlak
,
M.
(
1996
). “
On image analysis by moments
,”
IEEE Trans. Pattern Anal. Mach. Intell.
18
,
254
266
.
22.
Licklider
,
J. C.
(
1946
). “
Effects of amplitude distortion upon the intelligibility of speech
,”
J. Acoust. Soc. Am.
18
,
429
434
.
23.
Ludvigsen
,
C.
,
Elberling
,
C.
, and
Keidser
,
G.
(
1993
). “
Evaluation of a noise reduction method: Comparison between observed scores and scores predicted from STI
,”
Scand. Audiol.
22
,
50
55
.
24.
Ma
,
J.
, and
Loizou
,
P. C.
(
2011
). “
SNR loss: A new objective measure for predicting the intelligibility of noise-suppressed speech
,”
Speech Commun.
53
,
340
354
.
25.
Mamun
,
N.
,
Jassim
,
W.
, and
Zilany
,
M. S.
(
2015
). “
Prediction of speech intelligibility using a neurogram orthogonal polynomial measure (NOPM)
,”
IEEE/ACM Trans. Audio Speech Lang. Process.
23
,
760
773
.
26.
Nabelek
,
A. K.
, and
Letowski
,
T. R.
(
1985
). “
Vowel confusions of hearing-impaired listeners under reverberant and nonreverberant conditions
,”
J. Speech Hear. Disord.
50
,
126
131
.
27.
Nilsson
,
M.
,
Soli
,
S. D.
, and
Sullivan
,
J. A.
(
1994
). “
Development of the Hearing in Noise Test for the measurement of speech reception thresholds in quiet and in noise
,”
J. Acoust. Soc. Am.
95
,
1085
1099
.
28.
Recio
,
A.
,
Rich
,
N. C.
,
Narayan
,
S. S.
, and
Ruggero
,
M. A.
(
1998
). “
Basilar-membrane responses to clicks at the base of the chinchilla cochlea
,”
J. Acoust. Soc. Am.
103
,
1972
1989
.
29.
Recio-Spinoso
,
A.
,
Narayan
,
S. S.
, and
Ruggero
,
M. A.
(
2009
). “
Basilar membrane responses to noise at a basal site of the chinchilla cochlea: Quasi-linear filtering
,”
J. Assoc. Res. Otolaryngol.
10
,
471
484
.
30.
Rhebergen
,
K. S.
, and
Versfeld
,
N. J.
(
2005
). “
A speech intelligibility index-based approach to predict the speech reception threshold for sentences in fluctuating noise for normal-hearing listeners
,”
J. Acoust. Soc. Am.
117
,
2181
2192
.
31.
Rhebergen
,
K. S.
,
Versfeld
,
N. J.
, and
Dreschler
,
W. A.
(
2006
). “
Extended speech intelligibility index for the prediction of the speech reception threshold in fluctuating noise
,”
J. Acoust. Soc. Am.
120
,
3988
3997
.
32.
Rothauser
,
E.
(
1969
). “
IEEE recommended practice for speech quality measurements
,”
IEEE Trans. Audio Electroacoust.
17
,
225
246
.
33.
Steeneken
,
H. J. M.
, and
Houtgast
,
T.
(
1980
). “
A physical method for measuring speech‐transmission quality
,”
J. Acoust. Soc. Am.
67
,
318
326
.
34.
Steeneken
,
H. J. M.
, and
Houtgast
,
T.
(
1982
). “
Some applications of the speech transmission index (STI) in auditoria
,”
Acta Acust. united Acust.
51
,
229
234
.
35.
Studebaker
,
G. A.
,
Sherbecoe
,
R. L.
, and
Gilmore
,
C.
(
1993
). “
Frequency-importance and transfer functions for the Auditec of St. Louis recordings of the NU-6 word test
,”
J. Speech Lang. Hear. Res.
36
,
799
807
.
36.
Studebaker
,
G. A.
,
Sherbecoe
,
R. L.
,
McDaniel
,
D. M.
, and
Gwaltney
,
C. A.
(
1999
). “
Monosyllabic word recognition at higher-than-normal speech and noise levels
,”
J. Acoust. Soc. Am.
105
,
2431
2444
.
37.
Taal
,
C. H.
,
Hendriks
,
R. C.
,
Heusdens
,
R.
, and
Jensen
,
J.
(
2011
). “
An algorithm for intelligibility prediction of time–frequency weighted noisy speech
,”
IEEE Trans. Audio Speech Lang. Process.
19
,
2125
2136
.
38.
Tyagi
,
V.
,
Bourlard
,
H.
, and
Wellekens
,
C.
(
2006
). “
On variable-scale piecewise stationary spectral analysis of speech signals for ASR
,”
Speech Commun.
48
,
1182
1191
.
39.
Van Kuyk
,
S.
,
Kleijn
,
W. B.
, and
Hendriks
,
R. C.
(
2018
). “
An evaluation of intrusive instrumental intelligibility metrics
,”
IEEE/ACM Trans. Audio Speech Lang. Process.
26
,
2153
2166
.
40.
Wallin
,
Å.
, and
Kubler
,
O.
(
1995
). “
Complete sets of complex Zernike moment invariants and the role of the pseudoinvariants
,”
IEEE Trans. Pattern Anal. Mach. Intell.
17
,
1106
1110
.
41.
Yacullo
,
W. S.
, and
Hawkins
,
D. B.
(
1987
). “
Speech recognition in noise and reverberation by school-age children
,”
Audiology
26
,
235
246
.
42.
Yap
,
P.-T.
,
Paramesran
,
R.
, and
Ong
,
S.-H.
(
2003
). “
Image analysis by Krawtchouk moments
,”
IEEE Trans. Image Process.
12
,
1367
1377
.
43.
Yap
,
P.
, and
Raveendran
,
P.
(
2004
). “
Image focus measure based on Chebyshev moments
,”
IEE Proc. Vision Image Signal Process.
151
,
128
136
.
44.
Zhu
,
H.
(
2012
). “
Image representation using separable two-dimensional continuous and discrete orthogonal moments
,”
Pattern Recognit.
45
,
1540
1558
.
You do not currently have access to this content.