The harmonic structure can be easily recognized in time-frequency representation of speech signals in adverse environment. The harmonicity is a measure of the completeness of a harmonic structure. This paper presents a new harmonic structure measure that extends the conventional harmonicity to a set of harmonicities. They are expressed in terms of the grid harmonicity, the temporal harmonicity, the segment-spectral harmonicity, and the segmental harmonicity. The grid harmonicity measures the completeness of individual harmonics in each frame. The grid harmonicities in a frame are summed up to form a temporal harmonicity for representing the strength of harmonicity. The segment-spectral harmonicity, computed by summing specific grid harmonicity over a segment, evaluates the integrity of individual harmonics across a segment. The segmental harmonicity evaluates the total strength of harmonic structure within a segment. This set of harmonicities is available for a systematic analysis of the harmonic structure and effective to several speech processing tasks. The applications to speech distortion analysis, robust fundamental frequency estimation, robust voicing detection, and speech enhancement are demonstrated.

1.
Arifianto
,
D.
, and
Kobayashi
,
T.
(
2003
). “
IFAS-based voiced/unvoiced classification of speech signal
,”
Proc. IEEE-ICASSP
, Vol.
I
, pp.
812
815
.
2.
Berthommier
,
F.
, and
Glotin
,
H.
(
1999
). “
A measure of speech and pitch reliability from voicing
,” in
Proc. Int. Joint Conf. on Artificial Intelligence (IJCAI)
, edited by
F.
Klassner
, Computational Auditory Scene Analysis (CASA) workshop, pp.
61
70
.
3.
Boersma
,
P.
(
1993
). “
Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound
,”
Proceedings of the Institute of Phonetic Sciences
17
, pp.
97
110
.
4.
de Cheveigné
,
A.
, and
Kawahara
,
H.
(
2001
). “
Comparative evaluation of F0 estimation algorithms
,” in
Proc. European Conference on Speech Communication and Technology. Eurospeech 2001
.
5.
Ephraim
,
Y.
, and
Malah
,
D.
(
1984
). “
Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator
,”
IEEE Trans. Acoust., Speech, Signal Process.
ASSP-32
,
1109
1121
.
6.
Ephraim
,
Y.
, and
Malah
,
D.
(
1985
). “
Speech enhancement using a minimum mean-square error log-spectral amplitude estimator
,”
IEEE Trans. Acoust., Speech, Signal Process.
ASSP-33
,
443
445
.
7.
Gerven
,
S. V.
, and
Xie
,
F.
(
1997
). “
A comparative study of speech detection methods
,” Eurospeech-97, pp.
1095
1098
.
8.
Haigh
,
J. A.
, and
Mason
,
J. S.
(
1993
). “
Robust voice activity detection using cepstral features
,” in
Proc. IEEE TENCON
, China, pp.
321
324
.
9.
Hess
,
W.
(
1983
). “
Pitch determination of speech signals
,” in
Springer Series of Information Sciences
(
Springer-Verlag
, Berlin).
10.
Junqua
,
J. C.
,
Reaves
,
B.
, and
Mak
,
B.
(
1991
). “
A study of endpoint detection algorithms in adverse conditions: Incidence on a DTW and HMM recognize
,” in
Proc. Eurospeech
, pp.
1371
1374
.
11.
Kasi
,
K.
, and
Zahorian
,
S. A.
(
2002
). “
Yet another algorithm for pitch tracking
,”
Proc. IEEE-ICASSP
(
IEEE
, New York).
12.
Kobatake
,
H.
,
Tawa
,
K.
, and
Ishida
,
A.
(
1989
). “
Speech/non-speech discrimination for speech recognition system under real life noise environments
,”
Proc. IEEE-ICASSP
(
IEEE
, New York), p.
365
.
13.
Krom
,
G. de
(
1993
). “
A cepstrum based technique for determining an harmonics-to-noise ratio in speech signals
,”
J. Speech Hear. Res.
36
,
254
265
.
14.
Lim
,
J. S.
,
Oppenheim
,
A. V.
, and
Braida
,
L. D.
(
1974
). “
Evaluation of an adaptive comb filtering method for enhancing speech degraded by white noise addition
,” in
Speech enhancement
,
Alan V. Oppenheim series
, edited by
J. S.
Lim
(
Prentice-Hall
, Englewood Cliffs, NJ), pp.
88
92
.
15.
Martin
,
R.
(
1994
). “
Spectral subtraction based on minimum statistics
,”
Proc. of the Seventh European Signal Processing Conference, EUSIPCO-94
, September, pp.
1182
1185
.
16.
McAulay
,
R. J.
, and
Quatieri
,
T. F.
(
1986
). “
Speech Analysis/Synthesis Based on a Sinusoidal Representation
,”
IEEE Trans. Acoust., Speech, Signal Process.
ASSP-34
(
4
), pp.
744
754
.
17.
McKinley
,
B. L.
, and
Whipple
,
G. H.
(
1997
). “
Model based speech pause detection
,”
Proc. IEEE-ICASSP
(
IEEE
, New York), pp.
1179
1182
.
18.
Murphy
,
P. J.
(
2000
). “
A cepstrum-based harmonics-to-noise ratio in voice signals
,”
Proc. ICSLP
, Vol.
4
, pp.
672
675
.
19.
Nehorai
,
A.
, and
Porat
,
B.
(
1986
). “
Adaptive comb filtering for harmonic signal enhancement
,”
IEEE Trans. Acoust., Speech, Signal Process.
ASSP-34
(
5
),
1124
1138
.
20.
Noll
,
A. M.
(
1967
). “
Cepstrum pitch determination
,”
J. Acoust. Soc. Am.
14
,
293
309
.
21.
Oppenheim
,
A. V.
, and
Schafer
,
R. W.
(
1989
).
Discrete-Time Signal Processing
(
Prentice-Hall
, Englewood Cliffs, NJ).
22.
Qi
,
Y.
, and
Hillman
,
R. E.
(
1997
). “
Temporal and spectral estimations of harmonics-to-noise ratio human voice signals
,”
J. Acoust. Soc. Am.
102
(
1
),
537
543
.
23.
Qian
,
X.
, and
Kimaresan
,
R.
(
1996
). “
A variable frame pitch estimator and test results
,”
Proc. IEEE-ICASSP
, Atlanta, GA, Vol.
1
, pp.
228
231
.
24.
Rabiner
,
L. R.
(
1977
). “
On the use of autocorrelation analysis for pitch detection
,”
IEEE Trans. Acoust., Speech, Signal Process.
ASSP-25
,
24
33
.
25.
Rabiner
,
L. R.
, and
Sambur
,
M. R.
(
1997
). “
Voiced-unvoiced-silence detection using the Itakura LPC distance measure
,”
Proc. IEEE- ICASSP
, pp.
323
326
.
26.
Tsoukalas
,
E.
,
Mourjopoulos
,
J.
, and
Kokkinakis
,
G.
(
1997
). “
Speech enhancement based on audible noise suppression
,”
IEEE Trans. Speech Audio Process.
5
(
6
),
497
514
.
27.
Tucker
,
R.
(
1992
). “
Vad using a periodicity measure
,”
IEE Proceedings, Communications, Speech and Vision
, Vol.
139
, No.
4
, pp.
377
380
.
28.
Varga
,
A.
,
Steenneken
,
H. J. M.
,
Tomlinson
,
M.
, and
Jones
,
D.
(
1992
). “
The NOISEX-92 study on the effect of additive noise on automatic speech recognition
,” documentation included in the NOISEX-92 CD-ROMs.
29.
Wise
,
J. D.
,
Caprio
,
J. R.
, and
Parks
,
T. W.
(
1976
). “
Maximum-likelihood pitch estimation
,”
IEEE Trans. Acoust., Speech, Signal Process.
ASSP-24
,
418
423
.
30.
Wolfe
,
J.
, and
Godsill
,
S.
(
2000
). “
Towards a perceptually optimal spectral amplitude estimator for audio signal enhancement
,”
Proc. IEEE-ICASSP
, Vol.
2
, pp.
821
824
.
31.
Yoma
,
N. B.
,
McInnes
,
F.
, and
Jack
,
M.
(
1996
). “
Robust speech pulse-detection using adaptive noise modeling
,”
Electron. Lett.
32
,
1350
1352
.
You do not currently have access to this content.