Using speech as a probe stimulus to compute the Speech Transmission Index (STI) has been of great interest to speech researchers. One technique is based on first computing the speech modulation transfer function (SMTF). Approaches used to obtain the SMTF include those developed by Steeneken and Houtgast [H. Steeneken and T. Houtgast, Proc. 11th ICA, Paris 7, 85–88 (1983)] and Drullman et al. [R. Drullman, J. M. Festen, and R. Plomp, J. Acoust. Soc. Am. 95, 2670–2680 (1994)]. This paper compares these two approaches and a new one to the theoretically obtained modulation transfer function (MTF) for reverberant and noisy environments. The new method computes the magnitude of the cross‐power spectrum rather than the real part used by Drullman. As previously reported, Houtgast’s method exhibits artifacts at high modulation frequencies. Drullman’s approach eliminates artifacts in the reverberant environment but does not predict the theoretical MTF for the noisy environment. The new method outperforms the other two approaches in matching the theoretically derived MTF across both environments. This paper also examines the SMTF of amplitude‐compressed speech for these three methods. [Work supported by NIDCD.]
Skip Nav Destination
Article navigation
May 2002
Meeting abstract. No PDF available.
May 01 2002
Comparison of approaches to estimate the speech modulation transfer function
Karen L. Payton;
Karen L. Payton
ECE Dept., UMass Dartmouth, 285 Old Westport Rd., North Dartmouth, MA 02747‐2300
Search for other works by this author on:
Shaoyan Chen;
Shaoyan Chen
ECE Dept., UMass Dartmouth, 285 Old Westport Rd., North Dartmouth, MA 02747‐2300
Search for other works by this author on:
Louis D. Braida
Louis D. Braida
MIT, Cambridge, MA 02139
Search for other works by this author on:
J. Acoust. Soc. Am. 111, 2431 (2002)
Citation
Karen L. Payton, Shaoyan Chen, Louis D. Braida; Comparison of approaches to estimate the speech modulation transfer function. J. Acoust. Soc. Am. 1 May 2002; 111 (5_Supplement): 2431. https://doi.org/10.1121/1.4778339
Download citation file:
Citing articles via
Vowel signatures in emotional interjections and nonlinguistic vocalizations expressing pain, disgust, and joy across languages
Maïa Ponsonnet, Christophe Coupé, et al.
The alveolar trill is perceived as jagged/rough by speakers of different languages
Aleksandra Ćwiek, Rémi Anselme, et al.
A survey of sound source localization with deep learning methods
Pierre-Amaury Grumiaux, Srđan Kitić, et al.
Related Content
Effect of temporal modulation reduction on spectral contrasts in speech
J Acoust Soc Am (April 1996)
The role of modulation spectrum amplitude and phase in consonant intelligibility
J Acoust Soc Am (October 1999)
Effect of reducing slow temporal modulations on speech reception
J Acoust Soc Am (May 1994)
Modulation masking in a speech recognition task for normal‐hearing and hearing‐impaired subjects
J Acoust Soc Am (May 1998)
Spectral modulation detection as a function of modulation frequency, carrier bandwidth, and carrier frequency region
J. Acoust. Soc. Am. (January 2007)