A decision‐tree‐based quantization scheme for a very low bit rate speech coder based on HMMs is described. The encoder carries out HMM‐based phoneme recognition and then recognized phonemes, state durations, and F0 sequence are quantized, Huffman coded, and transmitted. In the decoder, sequences of mel‐cepstral coefficient vectors and F0’s are generated from the concatenated HMM‐using the HMM‐based speech synthesis technique. Finally, a speech waveform is synthesized by the MLSA filter using the generated mel‐cepstral coefficient and F0 sequences. In the previous system, we train an MSD‐VQ codebook for each phoneme for F0 quantization. Although this scheme can quantize F0 sequences efficiently, to achieve a better speech quality, larger codebook sizes are required. It leads to an increase in the bit rate of the system. To avoid this problem, we cluster F0 sequences using phonetic decision trees and then train a codebook for each leaf node. In the encoding and decoding, codebooks to be used can be determined by tracing the decision tree. It allows us to use smaller codebook sizes since the number of codebooks can be augmented without increase in bit rate. A subjective listening test result shows that the proposed scheme improves the quality of coded speech.
Skip Nav Destination
Article navigation
November 2006
Meeting abstract. No PDF available.
November 01 2006
Decision‐tree‐based F0 quantization for hidden Markov model‐based speech coding at 100 bit/s
Yoshihiro Itogawa;
Yoshihiro Itogawa
Dept. of Comput. Sci., Nagoya Inst. of Technol., Gokiso‐cho, Showa‐ku, Nagoya, 466‐8555 Japan
Search for other works by this author on:
Heiga Zen;
Heiga Zen
Dept. of Comput. Sci., Nagoya Inst. of Technol., Gokiso‐cho, Showa‐ku, Nagoya, 466‐8555 Japan
Search for other works by this author on:
Yoshihiko Nankaku;
Yoshihiko Nankaku
Dept. of Comput. Sci., Nagoya Inst. of Technol., Gokiso‐cho, Showa‐ku, Nagoya, 466‐8555 Japan
Search for other works by this author on:
Akinobu Li;
Akinobu Li
Dept. of Comput. Sci., Nagoya Inst. of Technol., Gokiso‐cho, Showa‐ku, Nagoya, 466‐8555 Japan
Search for other works by this author on:
Keiichi Tokuda
Keiichi Tokuda
Dept. of Comput. Sci., Nagoya Inst. of Technol., Gokiso‐cho, Showa‐ku, Nagoya, 466‐8555 Japan
Search for other works by this author on:
J. Acoust. Soc. Am. 120, 3038 (2006)
Citation
Yoshihiro Itogawa, Heiga Zen, Yoshihiko Nankaku, Akinobu Li, Keiichi Tokuda; Decision‐tree‐based F0 quantization for hidden Markov model‐based speech coding at 100 bit/s. J. Acoust. Soc. Am. 1 November 2006; 120 (5_Supplement): 3038. https://doi.org/10.1121/1.4787195
Download citation file:
78
Views
Citing articles via
All we know about anechoic chambers
Michael Vorländer
A survey of sound source localization with deep learning methods
Pierre-Amaury Grumiaux, Srđan Kitić, et al.
Does sound symbolism need sound?: The role of articulatory movement in detecting iconicity between sound and meaning
Mutsumi Imai, Sotaro Kita, et al.
Related Content
Usage of the HMM‐Based Speech Synthesis for Intelligent Arabic Voice
AIP Conference Proceedings (June 2008)
Noisy speech recognition based on codebook normalization of discrete‐mixture hidden Markov models
J Acoust Soc Am (November 2006)
Acoustic modeling with contextual additive structure for hidden Markov model‐based speech recognition
J Acoust Soc Am (November 2006)
Evaluation of cepstral lifters for articulatory codebook search
J Acoust Soc Am (August 2005)
Speech enhancement using a generic noise codebook
J. Acoust. Soc. Am. (July 2012)