This paper discusses the formulation, development and analysis of a segment-based approach to the automatic language identification (LID) problem. This system utilizes phonotactic, acoustic-phonetic, and prosodic information within a unified probabilistic framework. The implementation of this framework allows the relative contributions of different sources of information to be determined empirically, as well as providing the mechanism for combining them within one system. The system has been evaluated using the Oregon Graduate Institute (OGI) multi-language telephone speech corpus and the results are competitive with other current LID systems. The results have also indicated that, while the phontotactic information of a spoken utterance is the most useful information for LID, acoustic-phonetic and prosodic information can be useful for increasing a system’s accuracy, especially when the utterance is short.

1.
Cole, R., Fanty, M., Noel, M., and Lander, T. (1994). “Telephone speech corpus development at CSLU,” in Proceedings of the 1994 International Conference on Spoken Language Processing (Acoustical Society of Japan), pp. 1815–1818.
2.
Flammia, G., Glass, J., Phillips, M., Polifroni, J., Seneff, S., and Zue, V. (1994). “Porting the bilingual VOYAGER system to Italian,” in Proceedings of the 1994 International Conference on Spoken Language Processing (Acoustical Society of Japan), pp. 911–914.
3.
Glass, J., Goodine, D., Phillips, M., Sakai, S., Seneff, S., and Zue, V. (1993). “A bilingual VOYAGER system,” in Proceedings of the 3rd European Conference on Speech Communication and Technology (European Speech Commun. Assoc., Grenoble, France), pp. 2063–2066.
4.
Hazen, T. J. (1993). “Automatic language identification using a segment-based approach,” Master’s thesis, Massachusetts Institute of Technology.
5.
Hazen, T. J., and Zue, V. W. (1993). “Automatic language identification using a segment-based approach,” in Proceedings of the 3rd European Conference on Speech Communication and Technology (European Speech Commun. Assoc., Grenoble, France), pp. 1307–1310.
6.
Hazen, T. J., and Zue, V. W. (1994). “Recent improvements in an approach to segment-based automatic language identification,” in Proceedings of the 1994 International Conference on Spoken Language Processing (Acoustical Society of Japan), pp. 1883–1886.
7.
House
,
A. S.
, and
Neuburg
,
E. P.
(
1977
). “
Toward automatic identification of the language of an utterance. I. Preliminary methodological considerations
,”
J. Acoust. Soc. Am.
62
,
708
713
.
8.
Itahashi, S., Zhou, J. X., and Tanaka, K. (1994). “Spoken language discrimination using speech fundamental frequency,” in Proceedings of the 1994 International Conference on Spoken Language Processing (Acoustical Society of Japan), pp. 1899–1902.
9.
Jelinek, F. (1990). “Self-organized language modeling for speech recognition,” in Readings in Speech Recognition, edited by Alex Waibel and Kai-Fu Lee (Morgan Kaufmann, San Mateo, CA), Chap. 8, pp. 450–506.
10.
Kadambe, S., and Hieronymus, J. L. (1994). “Spontaneous speech language identification with a knowledge of linguistics,” in Proceedings of the 1994 International Conference on Spoken Language Processing (Acoustical Society of Japan), pp. 1879–1882.
11.
Kadambe, S., and Hieronymus, J. L. (1995). “Language identification with phonological and lexical models,” in Proceedings of the 1995 International Conference on Acoustics, Speech, and Signal Processing (IEEE, Piscataway, NJ), pp. 3507–3510.
12.
Leonard, R. G. (1980). “Language recognition test and evaluation,” Air Force Rome Air Development Center Technical Report No. RADC-TR-80-83.
13.
Leonard, R. G., and Doddington, G. R. (1974). “Automatic language identification,” Air Force Rome Air Development Center Technical Report No. RADC-TR-74-200.
14.
Leonard, R. G., and Doddington, G. R. (1975). “Automatic language identification,” Air Force Rome Air Development Center Technical Report No. RADC-TR-75-264.
15.
Leonard, R. G., and Doddington, G. R. (1978). “Automatic language discrimination,” Air Force Rome Air Development Center Technical Report No. RADC-TR-78-5.
16.
Mermelstein
,
P.
, and
Davis
,
S.
(
1980
). “
Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
,”
IEEE Trans. Acoust. Speech Signal Process.
28
,
357
366
.
17.
Muthusamy, Y., Berkling, K., Arai, T., Cole, R., and Barnard, E. (1993). “A comparison of approaches to automatic language identification using telephone speech,” in Proceedings of the 3rd European Conference on Speech Communication and Technology (European Speech Commun. Assoc., Grenoble, France), pp. 1307–1310.
18.
Muthusamy, Y. K., and Cole, R. A. (1992). “Automatic segmentation and identification of ten languages using telephone speech,” in Proceedings of the 1992 International Conference on Spoken Language Processing (Univ. of Alberta, Edmonton, Canada), pp. 1007–1010.
19.
Muthusamy, Y. K., Cole, R. A., and Gopalakrishnan, M. (1991). “A segment-based approach to automatic language identification,” in Proceedings of the 1991 International Conference on Acoustics, Speech, and Signal Processing (IEEE, Piscataway, NJ), pp. 353–356.
20.
Muthusamy, Y. K., Cole, R. A., and Oshika, B. T. (1992). “The OGI multi-language telephone speech corpus,” in Proceedings of the 1992 International Conference on Spoken Language Processing (Univ. of Alberta, Edmonton, Canada), pp. 895–898.
21.
Secrest, B. G., and Doddington, G. R. (1983). “An integrated pitch tracking algorithm for speech systems,” in Proceedings of the 1983 International Conference on Acoustics, Speech, and Signal Processing (IEEE, Piscataway, NJ), pp. 1352–1355.
22.
Yan, Y., and Barnard, E. (1995). “An approach to language identification with enhanced language model,” in Proceedings of the 4th European Conference on Speech Communication and Technology (European Speech Commun. Assoc., Grenoble, France), pp. 1351–1354.
23.
Zissman, M. A. (1995). “Language identification using phoneme recognition and phonotactic language modeling,” in Proceedings of the 1995 International Conference on Acoustics, Speech, and Signal Processing (IEEE, Piscataway, NJ), pp. 3503–3506.
24.
Zissman, M. A. (1996). “Comparison of four approaches to automatic language identification of telephone speech,” IEEE Trans. Speech Audio Process. 4, 31–44.
25.
Zissman, M. A., and Singer, E. (1994). “Automatic language identification of telephone speech messages using phoneme recognition and n-gram models,” in Proceedings of the 1994 International Conference on Acoustics, Speech, and Signal Processing (IEEE, Piscataway, NJ), pp. 305–308.
26.
Zue, V., Glass, J., Goodine, D., Leung, H., Phillips, M., Polifroni, J., and Seneff, S. (1990). “Recent progress on the SUMMIT system,” in Proceedings of the Third DARPA Speech and Natural Language Workshop (Morgan Kaufmann, San Mateo, CA), pp. 380–384.
27.
Zue, V., Glass, J., Phillips, M., and Seneff, S. (1989). “The MIT SUMMIT speech recognition system: A progress report,” in Proceedings of the DARPA Speech and Natural Language Workshop (Morgan Kaufmann, San Mateo, CA), pp. 179–189.
This content is only available via PDF.
You do not currently have access to this content.