Several approaches to Chinese dialect identification based on segmental and prosodic features of speech are described in this paper. When using segmental information only, the system performs phonotactic analysis after speech utterances have been tokenized into sequences of broad phonetic classes. The second scheme comprises prosodic models which are trained to capture tone sequence information for individual dialects. Also proposed is a novel approach that examines differences between Chinese dialects at broad phonetic and prosodic levels. These algorithms were evaluated via a multispeaker read-speech mode. Simulation results indicate that the combined use of segmental and prosodic features allows the proposed system to discriminate among three major Chinese dialects spoken in Taiwan with 93.0% accuracy.

1.
Y. K.
Muthusamy
,
E.
Barnard
, and
R. A.
Cole
, “
Reviewing automatic language identification
,”
IEEE Signal Process. Mag.
4
,
33
41
(
1994
).
2.
A. S.
House
and
E. P.
Neuburg
, “
Toward automatic identification of the language of an utterance. I. Preliminary methodological considerations
,”
J. Acoust. Soc. Am.
62
,
708
713
(
1977
).
3.
T. J.
Hazen
and
V. W.
Zue
, “
Segment-based automatic language identification
,”
J. Acoust. Soc. Am.
101
,
2323
2331
(
1997
).
4.
M. A.
Zissman
, “
Comparison of four approaches to automatic language identification of telephone speech
,”
IEEE Trans. Speech Audio Process.
SAP-4
,
31
44
(
1996
).
5.
L. F. Lamel and J. L. Gauvain, “Language identification using phone-based acoustic likelihoods,” in Proceedings of the 1994 International Conference on Acoustics, Speech, and Signal Processing (IEEE, Piscataway, NJ, 1994), pp. 293–296.
6.
L. S.
Lee
, “
Voice dictation of Mandarin Chinese
,”
IEEE Signal Process. Mag.
14
,
63
101
(
1997
).
7.
S. R. Ramsey, The Languages of China (Princeton University Press, Princeton, NJ, 1987).
8.
Y. R. Chao, A Grammar of Spoken Chinese (University of California, Berkeley, CA, 1968).
9.
Y. K. Muthusamy, K. Berkling, T. Arai, R. A. Cole, and E. Barnard, “A comparison of approaches to automatic language identification using telephone speech,” in Proceedings of 3rd European Conference on Speech Communication and Technology (European Speech Communication Association, Grenoble, France, 1993), pp. 1307–1310.
10.
L. R.
Rabiner
,
J. G.
Wilpon
, and
B. H.
Juang
, “
A segmental k-means algorithm training procedure for connected word recognition
,”
AT&T Tech. J.
65
,
21
32
(
1986
).
11.
L. E.
Baum
,
T.
Petrie
,
G.
Soules
, and
N.
Weiss
, “
A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains
,”
Ann. Math. Stat.
41
,
164
171
(
1970
).
12.
C. C.
Cheng
, “
Quantifying affinity among Chinese dialects
,”
J. Chin. Linguist.
3
,
78
112
(
1991
).
13.
B. H.
Juang
and
L. R.
Rabiner
, “
A probabilistic distance measure for hidden Markov models
,”
AT&T Tech. J.
64
,
391
408
(
1985
).
14.
Y. K. Muthusamy, “Segmental approach to automatic language identification,” Ph.D. thesis, Oregon Graduate Institute of Science & Technology (1993).
15.
A. Thyme-Gobbel and S. E. Hutchins, “On using prosodic cues in automatic language identification,” in Proceedings of the 1996 International Conference on Spoken Language Processing (Philadelphia, PA, 1996), pp. 1768–1771.
16.
J. D.
Markel
, “
The SIFT algorithm for fundamental frequency estimation
,”
IEEE Trans. Audio Electroacoust.
AU-20
,
367
377
(
1972
).
17.
T. J. Hazen and V. W. Zue, “Automatic language identification using a segment based approach,” in Proceedings of the 3rd European Conference on Speech Communication and Technology (European Speech Communication Assoc., Grenoble, France, 1993), pp. 1303–1306.
18.
R. Vergin, A. Farhat, and D. O’Shaughnessy, “Robust gender-dependent acoustic–phonetic modelling in continuous speech recognition based on a new automatic male/female classification,” in Proceedings of the 1996 International Conference on Spoken Language Processing (Philadelphia, PA, 1996), pp. 1081–1084.
This content is only available via PDF.
You do not currently have access to this content.