The aim of the present study is to investigate the performance of automatic perceptual judgment models built with neural networks. In previous studies, Franco et al. (1997) used HMM-derived scores based on posterior probabilities of phone segments, demonstrating a high correlation with human raters. Deville et al. (1999) also used an HMM/ANN recognition approach, and showed how the results of automatic speech recognition can be used for perceptual judgments. However, since most previous studies made use of automatic speech recognition in their analysis, the present study provides a different approach: using features and raw data. Native speakers of English will listen to English sentences produced by native and non-native speakers of English, transcribe what they heard, and respond to one of three perceptual judgements: foreign-accentedness, fluency, and comprehensibility. The data will be fed into prediction models in three different ways; one with annotated features (pauses, durations, etc), another with Mel Frequency Cepstral Coefficients (MFCC), and the other with Mel-spectrograms. The performance of the models will be measured by analyzing the correlation between the judgments by models and by human raters. The preliminary results of this study will be used to build more accurate automatic proficiency judgment models.
Skip Nav Destination
,
Article navigation
October 2019
Meeting abstract. No PDF available.
October 01 2019
Automatic perceptual judgment using neural networks
Seongjin Park;
Seongjin Park
Dept. of Linguist, Univ. of Arizona, Tucson, AZ 85721, [email protected]
Search for other works by this author on:
John Culnan
John Culnan
Univ. of Arizona, Tucson, AZ
Search for other works by this author on:
Seongjin Park
John Culnan
Dept. of Linguist, Univ. of Arizona, Tucson, AZ 85721, [email protected]
J. Acoust. Soc. Am. 146, 2957 (2019)
Citation
Seongjin Park, John Culnan; Automatic perceptual judgment using neural networks. J. Acoust. Soc. Am. 1 October 2019; 146 (4_Supplement): 2957. https://doi.org/10.1121/1.5137271
Download citation file:
Citing articles via
Focality of sound source placement by higher (ninth) order ambisonics and perceptual effects of spectral reproduction errors
Nima Zargarnezhad, Bruno Mesquita, et al.
Speed-dependent directivity patterns of road-traffic vehicles
Christian Dreier, Michael Vorländer
Related Content
The relationship between word error rate and perceptual judgment
J. Acoust. Soc. Am. (October 2020)
Automatic proficiency judgments: Accentedness, fluency, and comprehensibility
J. Acoust. Soc. Am. (October 2021)
Variability in human judgments of foreign accent strength
J. Acoust. Soc. Am. (November 2000)
Diverse environments and their impact on accentedness judgments
J. Acoust. Soc. Am. (October 2020)
Comparing Levenshtein distance and dynamic time warping in predicting listeners’ judgements of accent distance
J. Acoust. Soc. Am. (October 2021)