How do listeners recover speech content from acoustic signals, given the immense variability between talkers? In this study, two experiments were conducted on Cantonese level tones, comparing the perception of multi-talker speech stimuli in isolation and within a speech context. Without prior knowledge of a talker's pitch range, listeners resort to the population-average pitch range as a default reference for perception. This effect is attested by the significant correlation between the distance from population-average pitch range and identification accuracy in the isolation condition (r=-.24, p<0.01). The closer a talker's pitch range is to the population-average, the higher the identification accuracy is. The population-average reference is gender-specific, showing separate accommodation scales for female and male talkers. Such default reference is presumably built from one's long-term acoustic experience, reflecting the dense distribution of talkers in a community whose pitch is close to the population-average. Above the effect of long-term experience, the presence of a speech context allows listeners to tune to talker-specific pitch range, boosting the identification accuracy from 43% (in isolation) to 86%. Our findings demonstrate that listeners have built-in knowledge of population-average pitch and can shift from the default reference to talker-specific reference with the facilitation of context information.

