Ao is a Tibeto-Burman language spoken in Nagaland, India. It is a low resource, tonal language with three lexical tones, namely, high, mid, and low. However, tone assignment on lexical words may differ among the three dialects of Ao, namely, Chungli, Mongsen, and Changki. In this work, an acoustic study is conducted on the three tones in the three dialects of Ao. It was found that the acoustic characteristics of the tones in the Changki dialect are markedly different from that of the Chungli and the Mongsen dialects. Hence, in the latter part of the work, automatic dialect identification (DID) in the Ao dialects is attempted with Mel Frequency Cepstral Coefficients, Shifted Delta Cepstral coefficients, and F0 features using the Gaussian Mixture models. It is confirmed that in both text-dependent and text-independent DID, the F0 features improve the accuracy of classification.

1.
Agrawal
,
S. S.
,
Jain
,
A.
, and
Sinha
,
S.
(
2016
). “
Analysis and modeling of acoustic information for automatic dialect classification
,”
Int. J. Speech Technol.
19
(
3
),
593
609
.
2.
Bates
,
D.
,
Mächler
,
M.
,
Bolker
,
B.
, and
Walker
,
S.
(
2015
). “
Fitting linear mixed-effects models using lme4
,”
J. Stat. Softw.
67
(
1
),
1
48
..
3.
Biadsy
,
F.
, and
Hirschberg
,
J.
(
2009
). “
Using prosody and phonotactics in Arabic dialect identification
,” in
Proceedings of the Tenth Annual Conference of the International Speech Communication Association
, September 6–10, Brighton, UK.
4.
Biadsy
,
F.
,
Hirschberg
,
J.
, and
Habash
,
N.
(
2009
). “
Spoken Arabic dialect identification using phonotactic modeling
,” in
Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages
, March 31, Athens, Greece, pp.
53
61
.
5.
Boersma
,
P.
(
2001
). “
Praat, a system for doing phonetics by computer
,”
Glot Int.
5
(
9
),
341
345
.
6.
Bruhn
,
D.
(
2009
). “
The tonal classification of Chungli Ao verbs
,” UC Berkeley PhonLab Annual Report.
7.
Chambers
,
J. K.
, and
Trudgill
,
P.
(
1998
).
Dialectology
, 2nd ed. (
Cambridge University Press
,
Cambridge, UK
), pp.
4
9
.
8.
Chang
,
W.-W.
, and
Tsai
,
W.-H.
(
2000
). “
Chinese dialect identification using segmental and prosodic features
,”
J. Acoust. Soc. Am.
108
(
4
),
1906
1913
.
9.
Chittaragi
,
N. B.
, and
Koolagudi
,
S. G.
(
2019
). “
Acoustic-phonetic feature based Kannada dialect identification from vowel sounds
,”
Int. J. Speech Technol.
22
(
4
),
1099
1113
.
10.
Chittaragi
,
N. B.
,
Limaye
,
A.
,
Chandana
,
N.
,
Annappa
,
B.
, and
Koolagudi
,
S. G.
(
2019
). “
Automatic text-independent Kannada dialect identification system
,” in
Information Systems Design and Intelligent Applications
(
Springer
,
New York
), pp.
79
87
.
11.
Chitturi
,
R.
, and
Hansen
,
J. H.
(
2007
). “
Multi-stream dialect classification using svm-gmm hybrid classifiers
,” in
Proceeding of the IEEE Workshop on Automatic Speech Recognition & Understanding
, December 9–13, Kyoto, Japan, pp.
431
436
.
12.
Clark
,
M. M.
(
1893
).
The Ao Naga Grammar
(
Assam Secretariat Printing Department
,
Assam, India
).
13.
Clark
,
E. W.
(
1911
).
Ao-Naga Dictionary
(
Baptist Mission Press
,
Calcutta, India
).
14.
Coupe
,
A. R.
(
1998
). “
The Acoustic and Perceptual Features of Tone in the Tibeto-Burman Language Ao Naga
,” in
Proceedings of the Fifth International Conference on Spoken Language Processing
, November 30–December 4, Sydney, Australia.
15.
Coupe
,
A. R.
(
2007
).
A Grammar of Mongsen Ao
, Vol.
39
(
Walter de Gruyter
,
Berlin, Germany
).
16.
Coupe
,
A. R.
(
2003
).
A Phonetic and Phonological Description of Ao: A Tibeto-Burman Language of Nagaland, North-East India
(
Pacific Linguistics, Research School of Pacific and Asian Studies
,
Manoa, HI
).
17.
Dempster
,
A. P.
,
Laird
,
N. M.
, and
Rubin
,
D. B.
(
1977
). “
Maximum likelihood from incomplete data via the EM algorithm
,”
J. R. Stat. Soc. Ser. B
39
(
1
),
1
22
.
18.
Directorate of Census Operation Nagaland
(
2011
). “
District Census Handbook Mokokchung, Nagaland
,” https://www.censusindia.gov.in/2011census/dchb/1302_PART_B_DCHB_MOKOKCHUNG.pdf (Last viewed 4/21/2021).
19.
Etman
,
A.
, and
Louis
,
A.
(
2015
). “
American dialect identification using phonotactic and prosodic features
,” in
Proceedings of the 2015 SAI Intelligent Systems Conference (IntelliSys)
, November 10–11, London, UK, pp.
963
970
.
20.
Fox
,
J.
, and
Weisberg
,
S.
(
2019
).
An R Companion to Applied Regression
, 3rd ed. (
Sage
,
Thousand Oaks, CA
).
21.
Furui
,
S.
(
1981
). “
Cepstral analysis technique for automatic speaker verification
,”
IEEE Trans. Acoust. Speech Signal Process.
29
(
2
),
254
272
.
22.
Gowda
,
K. S. G.
(
1972
).
Ao-Naga Phonetic Reader
(
Central Institute of Indian Languages
,
Mysuru, India
).
23.
Gowda
,
K. S. G.
(
1975
).
Ao Grammar
(
Central Institute of Indian Languages
,
Mysuru, India
).
24.
Grierson
,
G. A.
(
1903
).
The Linguistic Survey of India
(
Office of the Superintendent of Government printing
,
Calcutta, India
).
25.
Huang
,
R.
, and
Hansen
,
J. H.
(
2007
). “
Unsupervised discriminative training with application to dialect classification
,”
IEEE Trans. Audio Speech Lang. Process.
15
(
8
),
2444
2453
.
26.
Hung
,
P. N.
,
Ha
,
N. T.
,
Van Loan
,
T.
,
Thang
,
V. X.
, and
Chien
,
N. D.
(
2019
). “
Vietnamese dialect identification on embedded system
,”
UTEHY J. Sci. Technol.
24
,
82
87
.
27.
Kakouros
,
S.
,
Hiovain
,
K.
,
Vainio
,
M.
, and
Šimko
,
J.
(
2020
). “
Dialect identification of spoken North Sámi language varieties using prosodic features
,” arXiv:2003.10183.
28.
Lei
,
Y.
, and
Hansen
,
J. H. L.
(
2011
). “
Dialect classification via text-independent training and testing for Arabic, Spanish, and Chinese
,”
IEEE Trans. Audio Speech Lang. Process.
19
,
85
96
.
29.
Lin
,
W.
,
Madhavi
,
M.
,
Das
,
R. K.
, and
Li
,
H.
(
2020
). “
Transformer-based Arabic dialect identification
,” in
Proceedings of the 2020 International Conference on Asian Language Processing (IALP)
, December 4–6, Singapore, pp.
192
196
.
30.
Ma
,
B.
,
Zhu
,
D.
, and
Tong
,
R.
(
2006
). “
Chinese dialect identification using tone features based on pitch flux
,” in
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing
, May 14–19, Toulouse, France.
31.
Mingliang
,
G.
, and
Yuguo
,
X.
(
2008
). “
Chinese dialect identification using clustered support vector machine
,” in
Proceedings of the 2008 International Conference on Neural Networks and Signal Processing
, June 7–11, Zhenjiang, China, pp.
396
399
.
32.
Mingliang
,
G.
,
Yuguo
,
X.
, and
Yiming
,
Y.
(
2008
). “
Semi-supervised learning based Chinese dialect identification
,” in
Proceedings of the 9th International Conference on Signal Processing
, October 26–29, Beijing, China, pp.
1608
1611
.
33.
Murty
,
K. S. R.
, and
Yegnanarayana
,
B.
(
2008
). “
Epoch extraction from speech signals
,”
IEEE Trans. Audio Speech Lang. Process.
16
(
8
),
1602
1613
.
34.
R Core Team
(
2019
).
R: A Language and Environment for Statistical Computing
,
R Foundation for Statistical Computing
,
Vienna, Austria
, https://www.R-project.org/ (Last viewed 4/21/2021).
35.
Rao
,
K. S.
, and
Koolagudi
,
S. G.
(
2011
). “
Identification of Hindi dialects and emotions using spectral and prosodic features of speech
,”
IJSCI: Int. J. Syst. Cybern. Inf.
9
(
4
),
24
33
.
36.
Reynolds
,
D. A.
, and
Rose
,
R. C.
(
1995
). “
Robust text-independent speaker identification using gaussian mixture speaker models
,”
IEEE Trans. Speech Audio Process.
3
(
1
),
72
83
.
37.
Rose
,
P.
(
1987
). “
Considerations in the normalisation of the fundamental frequency of linguistic tone
,”
Speech Commun.
6
(
4
),
343
352
.
38.
Rouas
,
J.-L.
(
2007
). “
Automatic prosodic variations modeling for language and dialect discrimination
,”
IEEE Trans. Audio Speech Lang. Process.
15
(
6
),
1904
1911
.
39.
Sarma
,
M.
, and., and
Sarma
,
K. K.
(
2016
). “
Dialect identification from Assamese speech using prosodic features and a neuro fuzzy classifier
,” in
Proceedings of the 3rd International Conference on Signal Processing and Integrated Networks (SPIN)
, February 11–12, Noida, India, pp.
127
132
.
40.
Shon
,
S.
,
Ali
,
A.
,
Samih
,
Y.
,
Mubarak
,
H.
, and
Glass
,
J.
(
2020
). “
ADI17: A fine-grained Arabic dialect identification dataset
,” in
Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
, May 4–8, Barcelona, Spain, pp.
8244
8248
.
41.
Singh
,
R.
, and
Sharma
,
A.
(
2018
). “
Identification system for different Punjabi dialects using random forest
,”
Int. J. Comput. Sci. Eng.
6
,
254
259
.
42.
Sinha
,
S.
,
Jain
,
A.
, and
Agrawal
,
S.
(
2015
). “
Fusion of multi-stream speech features for dialect classification
,”
CSI Trans. ICT
2
(
4
),
243
252
.
43.
Temsunungsang
,
T.
(
2003
). “
The structure of Mongsen: Phonology and morphology
,” M.Phil thesis,
Hyderabad Central University
, Hyderabad, India.
44.
Temsunungsang
,
T.
(
2016
). “
Tonal correspondences in Ao languages of Nagaland
,” in
Proceedings of the 22nd Himalayan Languages Symposium
, June 8–10, Guwahati, India.
45.
Torres-Carrasquillo
,
P. A.
,
Gleason
,
T. P.
, and
Reynolds
,
D. A.
(
2004
). “
Dialect identification using gaussian mixture models
,” in
Proceedings of Odyssey 2004
, May 31–June 3, Toledo, Spain.
46.
Torres-Carrasquillo
,
P. A.
,
Singer
,
E.
,
Kohler
,
M. A.
,
Greene
,
R. J.
,
Reynolds
,
D. A.
, and
Deller
,
J. R.
, Jr.
(
2002
). “
Approaches to language identification using Gaussian mixture models and shifted delta cepstral features
,” in
Proceedings of the Seventh International Conference on Spoken Language Processing
, September 16–20, Denver, CO.
47.
Tsai
,
W.-H.
, and
Chang
,
W.-W.
(
1999
). “
Chinese dialect identification using an acoustic-phonotactic model
,” in
Proceedings of the Sixth European Conference on Speech Communication and Technology
, September 5–9, Budapest, Hungary.
48.
Tsai
,
W.-H.
, and
Chang
,
W.-W.
(
2002
). “
Discriminative training of gaussian mixture bigram models with application to Chinese dialect identification
,”
Speech Commun.
36
(
3
),
317
326
.
49.
Tzudir
,
M.
,
Sarmah
,
P.
, and
Prasanna
,
S. M.
(
2017
). “
Tonal feature based dialect discrimination in two dialects in Ao
,” in
Proceedings of the Region 10 Conference, TENCON 2017-2017
, November 5–8, Penang, Malaysia, pp.
1795
1799
.
50.
Tzudir
,
M.
,
Sarmah
,
P.
, and
Prasanna
,
S. R. M.
(
2018
). “
Dialect identification using tonal and spectral features in two dialects of Ao
,” in
Proceedings of SLTU
, August 29–31, Gurugram, India.
51.
Yegnanarayana
,
B.
, and
Murty
,
K. S. R.
(
2009
). “
Event-based instantaneous fundamental frequency estimation from speech signals
,”
IEEE Trans. Audio Speech Lang. Process.
17
(
4
),
614
624
.
52.
Yip
,
M.
(
2002
).
Tone
(
Cambridge University Press
,
Cambridge, UK
).
53.
Zhang
,
Q.
, and., and
Hansen
,
J. H.
(
2017
). “
Dialect recognition based on unsupervised bottleneck features
,” in
Proceedings of Interspeech 2017
, August 20–24, Stockholm, Sweden, pp.
2576
2580
.
54.
Zhang
,
Q.
, and
Hansen
,
J. H.
(
2018
). “
Language/dialect recognition based on unsupervised deep learning
,”
IEEE/ACM Trans. Audio Speech Lang. Process.
26
(
5
),
873
882
.
55.
Zissman
,
M. A.
,
Gleason
,
T. P.
,
Rekart
,
D.
, and
Losiewicz
,
B. L.
(
1996
). “
Automatic dialect identification of extemporaneous conversational, Latin American Spanish speech
,” in
Proceedings of ICASSP1996
, May 7–10, Atlanta, GA.
You do not currently have access to this content.