This paper deals with study of formant and harmonic contours by processing the group delay (GD) spectrograms of speech signals. The GD spectrum is the negative derivative of the phase spectrum with respect to frequency. Recent study shows that the GD spectrogram can be obtained without phase wrapping. Formant frequency contours can be observed in the display of the peaks of the instantaneous wideband equivalent GD spectrogram, derived using the modified single frequency filtering (SFF) analysis of speech signals. Harmonic frequency contours can be observed in the display of the peaks of the instantaneous narrowband equivalent GD spectrogram, derived using the modified SFF analysis of speech signals. For synthetic speech signals, the observed formant contours match the ground truth formant contours from which the signal is derived. For natural speech signals, the observed formant contours match approximately with the given ground truth formant contours mostly in the voiced regions. The results are illustrated for several randomly selected utterances from the TIMIT database. While this study helps to observe the contours of formants in the display, automatic extraction of the formant frequencies needs further processing, requiring logic for eliminating the spurious points, without forcing the number of formants.

1.
Anand
,
J.
,
Guruprasad
,
S.
, and
Yegnanarayana
,
B.
(
2006
). “
Extracting formants from short segments of speech using group delay functions
,” in
Interspeech 2006—ICSLP
, Pittsburgh, PA, pp.
1009
1012
.
2.
Aneeja
,
G.
, and
Yegnanarayana
,
B.
(
2015
). “
Single frequency filtering approach for discriminating speech and nonspeech
,”
IEEE/ACM Trans. Audio Speech Lang. Process.
23
(
4
),
705
717
.
3.
Auger
,
F.
, and
Flandrin
,
P.
(
1995
). “
Improving the readability of time-frequency and time-scale representations by the reassignment method
,”
IEEE Trans. Signal Process.
43
(
5
),
1068
1089
.
4.
Bozkurt
,
B.
, and
Couvreur
,
L.
(
2005
). “
On the use of phase information for speech recognition
,” in
2005 13th European Signal Processing Conference
, Antalya, Turkey (
IEEE
,
New York
), pp.
1
4
.
5.
Bozkurt
,
B.
,
Couvreur
,
L.
, and
Dutoit
,
T.
(
2007
). “
Chirp group delay analysis of speech signals
,”
Speech Commun.
49
(
3
),
159
176
.
6.
Bozkurt
,
B.
,
Doval
,
B.
,
d'Alessandro
,
C.
, and
Dutoit
,
T.
(
2004
). “
Zeros of z-transform (zzt) decomposition of speech for source-tract separation
,” in
Interspeech 2004—ICSLP
, Jeju Island South Korea.
7.
Cheng
,
Y. M.
, and
O'Shaughnessy
,
D.
(
1989
). “
Automatic and reliable estimation of glottal closure instant and period
,”
IEEE Trans. Acoust. Speech Signal Process.
37
(
12
),
1805
1815
.
8.
Deng
,
L.
,
Cui
,
X.
,
Pruvenok
,
R.
,
Huang
,
J.
,
Momen
,
S.
,
Chen
,
Y.
, and
Alwan
,
A.
(
2006
). “
A database of vocal tract resonance trajectories for research in speech processing
,” in
2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings
, Toulouse, France (
IEEE
,
New York
), Vol.
1
, pp.
I-I
.
9.
Drugman
,
T.
, and
Stylianou
,
Y.
(
2015
). “
Fast and accurate phase unwrapping
,” in
Interspeech 2015
, September 6–10, Dresden, Germany, pp.
1171
1175
.
10.
Drugman
,
T.
,
Thomas
,
M.
,
Gudnason
,
J.
,
Naylor
,
P.
, and
Dutoit
,
T.
(
2012
). “
Detection of glottal closure instants from speech signals: A quantitative review
,”
IEEE Trans. Audio Speech. Lang. Process.
20
(
3
),
994
1006
.
11.
Fitz
,
K.
, and
Haken
,
L.
(
2002
). “
On the use of time: Frequency reassignment in additive sound modeling
,”
J. Audio Eng. Soc.
50
(
11
),
879
893
.
12.
Fitz
,
K. R.
, and
Fulop
,
S. A.
(
2009
). “
A unified theory of time-frequency reassignment
,” arXiv:0903.3080 (Last viewed October 6, 2024).
13.
Flandrin
,
P.
,
Auger
,
F.
, and
Chassande-Mottin
,
E.
(
2018
). “
Time-frequency reassignment: From principles to algorithms
,” in
Applications in Time-Frequency Signal Processing
(
CRC Press
,
Boca Raton, FL
), pp.
179
204
.
14.
Fujimura
,
O.
,
Honda
,
K.
,
Kawahara
,
H.
,
Konparu
,
Y.
,
Morise
,
M.
, and
Williams
,
J. C.
(
2009
). “
Noh voice quality
,”
Logoped. Phoniatr. Vocol.
34
(
4
),
157
170
.
15.
Fulop
,
S. A.
(
2011
). “
The reassigned spectrogram
,” in
Speech Spectrum Analysis
(
Springer
,
New York
), pp.
127
165
.
16.
Fulop
,
S. A.
, and
Fitz
,
K.
(
2006
). “
Algorithms for computing the time-corrected instantaneous frequency (reassigned) spectrogram, with applications
,”
J. Acoust. Soc. Am.
119
(
1
),
360
371
.
17.
Gerkmann
,
T.
,
Krawczyk-Becker
,
M.
, and
Le Roux
,
J.
(
2015
). “
Phase processing for single-channel speech enhancement: History and recent advances
,”
IEEE Signal Process. Mag.
32
(
2
),
55
66
.
18.
Gowda
,
D. N.
,
Bollepalli
,
B.
,
Kadiri
,
S. R.
, and
Alku
,
P.
(
2021
). “
Formant tracking using quasi-closed phase forward-backward linear prediction analysis and deep neural networks
,”
IEEE Access
9
,
151631
151640
.
19.
Hedelin
,
P.
(
1988
). “
Phase compensation in all-pole speech analysis
,” in
ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing
, New York, NY (
IEEE Computer Society
,
New York
), pp.
339
340
.
20.
Hegde
,
R. M.
,
Murthy
,
H. A.
, and
Gadde
,
V. R. R.
(
2004
). “
Continuous speech recognition using joint features derived from the modified group delay function and MFCC
,” in
Eighth International Conference on Spoken Language Processing, Interspeech 2004—ICSLP
, Jeju Island, South Korea.
21.
Kadiri
,
S. R.
, and
Yegnanarayana
,
B.
(
2020
). “
Determination of glottal closure instants from clean and telephone quality speech signals using single frequency filtering
,”
Comput. Speech Lang.
64
,
101097
.
22.
Kane
,
J.
, and
Gobl
,
C.
(
2013
). “
Evaluation of glottal closure instant detection in a range of voice qualities
,”
Speech Commun.
55
(
2
),
295
314
.
23.
Mahale
,
P. M. B.
,
Saeidi
,
R.
, and
Stylianou
,
Y.
(
2016
). “
Advances in phase-aware signal processing in speech communication
,”
Speech Commun.
81
,
1
29
.
24.
Makhoul
,
J.
(
1975
). “
Linear prediction: A tutorial review
,”
Proc. IEEE
63
(
4
),
561
580
.
25.
McCowan
,
I.
,
Dean
,
D.
,
McLaren
,
M.
,
Vogt
,
R.
, and
Sridharan
,
S.
(
2011
). “
The delta-phase spectrum with application to voice activity detection and speaker recognition
,”
IEEE Trans. Audio Speech Lang. Process.
19
(
7
),
2026
2038
.
26.
Murthy
,
H. A.
, and
Yegnanarayana
,
B.
(
2011
). “
Group delay functions and its applications in speech technology
,”
Sadhana
36
,
745
782
.
27.
Nayak
,
S.
,
Bhati
,
S.
, and
Murty
,
K. S. R.
(
2017
). “
An investigation into instantaneous frequency estimation methods for improved speech recognition features
,” in
Global Conference on Signal and Information Processing (GlobalSIP)
, November 14–16, Montreal, Canada (
IEEE
,
New York
), pp.
363
367
.
28.
Nelson
,
D. J.
(
2004
). “
Cross-spectral based formant estimation and alignment
,” in
2004 IEEE International Conference on Acoustics, Speech, and Signal Processing
, Montreal, Canada (
IEEE
,
New York
), Vol.
2
, pp.
ii–621
.
29.
Paliwal
,
K.
,
Wójcicki
,
K.
, and
Shannon
,
B.
(
2011
). “
The importance of phase in speech enhancement
,”
Speech Commun.
53
(
4
),
465
494
.
30.
Quatieri
,
T.
(
1979
). “
Minimum and mixed phase speech analysis-synthesis by adaptive homomorphic deconvolution
,”
IEEE Trans. Acoust. Speech Signal Process.
27
(
4
),
328
335
.
31.
Smits
,
R.
, and
Yegnanarayana
,
B.
(
1995
). “
Determination of instants of significant excitation in speech using group delay function
,”
IEEE Trans. Speech Audio Process.
3
(
5
),
325
333
.
32.
Stark
,
A.
, and
Paliwal
,
K.
(
2009
). “
Group-delay-deviation based spectral analysis of speech
,” in
Interspeech 2009
, September 6–10, Brighton, UK, pp.
1083
1086
.
33.
Stark
,
A. P.
, and
Paliwal
,
K. K.
(
2008
). “
Speech analysis using instantaneous frequency deviation
,” in
Interspeech 2008
, September 22–26, Brisbane, Australia, pp.
2602
2605
.
34.
Story
,
B. H.
, and
Bunton
,
K.
(
2016
). “
Formant measurement in children's speech based on spectral filtering
,”
Speech Commun.
76
,
93
111
.
35.
Sun
,
X.
,
Plante
,
F.
,
Cheetham
,
B. M.
, and
Wong
,
K. W.
(
1997
). “
Phase modelling of speech excitation for low bit-rate sinusoidal transform coding
,” in
1997 IEEE International Conference on Acoustics, Speech, and Signal Processing
, Munich, Germany (
IEEE
,
New York
), Vol.
3
, pp.
1691
1694
.
36.
Vijayan
,
K.
,
Kumar
,
V.
, and
Murty
,
K. S. R.
(
2014a
). “
Allpass modelling of Fourier phase for speaker verification
,” in
Odyssey 2014
, Joensuu, Finland.
37.
Vijayan
,
K.
,
Kumar
,
V.
, and
Murty
,
K. S. R.
(
2014b
). “
Feature extraction from analytic phase of speech signals for speaker verification
,” in
Fifteenth Annual Conference of the International Speech Communication Association, Interspeech 2014
, Singapore.
38.
Vijayan
,
K.
,
Murty
,
K. S. R.
, and
Li
,
H.
(
2019
). “
Allpass modeling of phase spectrum of speech signals for formant tracking
,” in
Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC
), November 18–21, Lanzhou, China (
IEEE
,
New York
), pp.
1190
1196
.
39.
Wang
,
D.
, and
Lim
,
J.
(
1982
). “
The unimportance of phase in speech enhancement
,”
IEEE Trans. Acoust. Speech Signal Process.
30
(
4
),
679
681
.
40.
Yegnanarayana
,
B.
(
1978
). “
Formant extraction from linear-prediction phase spectra
,”
J. Acoust. Soc. Am.
63
(
5
),
1638
1640
.
41.
Yegnanarayana
,
B.
(
2022a
). “
Analysis of phase derivatives of speech signals
,”
J. Acoust. Soc. Am.
152
(
3
),
1721
1736
.
42.
Yegnanarayana
,
B.
(
2022b
). “
Group delay spectrogram of speech signals without phase wrapping
,”
J. Acoust. Soc. Am.
151
(
3
),
2181
2191
.
43.
Yegnanarayana
,
B.
,
Joseph
,
A.
, and
Pannala
,
V.
(
2020
). “
Enhancing formant information in spectrographic display of speech
,” in
Interspeech 2020
, Shanghai, China, pp.
165
169
.
44.
Yegnanarayana
,
B.
,
Saikia
,
D.
, and
Krishnan
,
T.
(
1984
). “
Significance of group delay functions in signal reconstruction from spectral magnitude or phase
,”
IEEE Trans. Acoust. Speech Signal Process.
32
(
3
),
610
623
.
45.
Zhu
,
D.
, and
Paliwal
,
K.
(
2004
). “
Product of power spectrum and group delay function for speech recognition
,” in
2004 IEEE International Conference on Acoustics, Speech, and Signal Processing
, Montreal, Canada (
IEEE
,
New York
), Vol.
1
, pp.
I–125
.
You do not currently have access to this content.