The frequency range audible to humans can extend from 20 Hz to 20 kHz, but only a portion of this range—the lower end up to 8 kHz—has been systematically explored because extended high-frequency (EHF) information above this low range has been considered unnecessary for speech comprehension. This special issue presents a collection of research studies exploring the presence of EHF information in the acoustic signal and its perceptual utility. The papers address the role of EHF hearing in auditory perception, the impact of EHF hearing loss on speech perception in specific populations and occupational settings, the importance of EHF in speech recognition and in providing speaker-related information, the utility of acoustic EHF energy in fricative sounds, and ultrasonic vocalizations in mice in relation to human hearing. Collectively, the research findings offer new insights and converge in showing that not only is EHF energy present in the speech spectrum, but listeners can utilize EHF cues in speech processing and recognition, and EHF hearing loss has detrimental effects on perception of speech and non-speech sounds. Together, this collection challenges the conventional notion that EHF information has minimal functional significance.
I. INTRODUCTION
The upper limit of human auditory function does not exceed 20–22 kHz, and the upper end of the audible frequency range varies with individual abilities and age. Older adults can hear sounds up to about 13 kHz, but the audibility limit for young adults is typically higher, about 16–17 kHz, with some still being able to recognize air vibrations as sound around 20 kHz. However, only a portion of human audibility range—the low end—has been systematically explored because extended high-frequency information is not necessary for speech comprehension.
Indeed, the issue of what is necessary versus what is sufficient in human-to-human speech communication has been vigorously debated in speech and hearing research over the last century, with the implicit assumption that we speak to be understood. Based on early research, it has been widely accepted that the acoustic energy up to 4 kHz is essential to speech intelligibility and that the upper frequency limit that can make a useful contribution to spoken language comprehension does not exceed 7 kHz (French and Steinberg, 1947). There is a physiological basis for this “sufficient” 7-kHz frequency range: The frequency selectivity of auditory nerve fibers and sensory cells in the human cochlea is sharpest at low frequencies and declines with increasing frequency. Consequently, human auditory frequency selectivity has been the primary driving force in the long history of research exploring the contribution of individual frequency regions to speech intelligibility within the 7-kHz range.
However, emerging evidence from labs around the world indicates that high-frequency energy above the sufficient 7-kHz limit plays a more significant role than commonly believed and should be seriously considered in speech communication, human hearing, audiology, voice perception research, and those branches of acoustics that are concerned with sound information within the full range of human hearing and human speech [see recent reviews in Hunter (2020) and Jacewicz (2023a)]. Recognizing the potential of high-frequency information and particularly in the “gray area” above the typical audiometric frequency range (≤8 kHz), this special issue presents a collection of research studies exploring its presence in the acoustic signal and its perceptual utility.
Although this special issue concerns high-frequency energy within the limit of human audibility, we point out that what constitutes high-frequency vocalizations and high-frequency hearing is species specific. For example, bats are known to have ultrasonic hearing (i.e., can hear high-frequency sounds inaudible to humans) up to around 200 kHz (Davies , 2013), and the ultrasound calling of the male insect Supersonus aequoreus—the highest ever recorded in nature—can reach a frequency of 150 kHz (Sarria-S , 2014). Species communicating in such extreme ultrasonic ranges are rare. Most commonly exploited ultrasonic frequencies, such as those among katydid insects, are in the range 20–45 kHz (Montealegre-Z, 2009), indicating that some of these ultrasounds may be audible to some humans. But what do such ultrasonic communications have to do with human hearing? We will consider two possible scenarios.
First, recent evidence suggests that the lower limit of the “ultrasonic range” in humans should be set lower than 20 kHz, specifically, at 17.8 kHz when measured in third-octave bands (Leighton, 2017). This indicates that some humans may be able to hear ultrasounds in air that are present in the environment, whether in nature or generated from devices. If so, then prolonged exposure to ultrasounds may raise some safety concerns because of possible adverse effects and discomforts, such as headache, tinnitus, or fatigue; Ueda (2014) reported such effects when humans were exposed to ultrasound from a rodent repeller at 20 kHz and 90–130 dB sound pressure level (SPL). However, in controlled laboratory conditions, higher discomfort ratings were obtained not only from the exposure to ultrasounds (17.8–20 kHz) but also exposure to very high-frequency sounds (pure tones) ranging in frequencies between 13.4 and 17.7 kHz and levels of between 82 and 92 dB SPL relative to a 1 kHz reference stimulus (Fletcher , 2018). This indicates that humans may experience discomfort from extensive exposure to sounds at very high frequencies even if their levels do not exceed allowable SPLs in these frequency ranges.
Second, evidence exists that humans are responsive to inaudible nonstationary (i.e., music) ultrasounds. It has been shown that frequency information in the range 22–50 kHz (and often exceeding 50 kHz and at times 100 kHz) can be registered by the brain, specifically by deep-lying brain structures, including the brain stem and thalamus (Oohashi , 2000). However, activation in these areas occurred only when a full range sound containing audible low-frequency (below 22 kHz) and inaudible (above 22 kHz and up to 100 kHz) components was presented to listeners, whereas no such activation was found when either low or high components were presented separately. The complex interaction between audible and inaudible frequency components has been termed the hypersonic effect. Evaluating sound quality of music, listeners reported that the full range sound “was softer, more reverberant, with a better balance of instruments, more comfortable to the ears, and richer in nuance” (p. 3553) than the low-frequency sound alone. In a series of follow-up studies, it has been proposed that the hypersonic effect emerges when the air-conducing auditory system interacts with some other vibration sensing systems in the human body (Oohashi , 2006) and that deep brain activity may be a response to an environment rich in inaudible (to humans) high-frequency sounds, such as those generated by insects in the natural environment of tropical rainforests (Honda, 2015).
The two scenarios exemplify that the full human hearing range and not only the essential or sufficient low end can be utilized in complex interactions between the processes in the brain, human body, and environment and that the advantage of extended hearing up to 20 kHz can be revealed in processing sound environment information not confined to the frequency range of human speech. We hope that the articles in this special issue will lead to future investigations of such complexities and to recognition of the extent of applicability of high-frequency information both audible and inaudible to humans.
The 15 contributions in this special issue are organized into five categories showcasing the full range of topics in speech and hearing research: (1) the significance of extended high-frequency (EHF) hearing in auditory perception, (2) the effects of EHF hearing loss on speech perception in specific populations, (3) phonetic and indexical information in EHF speech spectrum, (4) explorations of EHF information in fricative sounds, and (5) ultrasonic vocalizations in mice in relation to humans.
II. SIGNIFICANCE OF EHF HEARING IN AUDITORY PERCEPTION
In recent years, hearing research has increasingly recognized that EHF hearing deficits contribute to difficulties in speech processing and comprehension, particularly in challenging listening conditions, such as noisy environments. EHF hearing loss is common, even among individuals with normal or near-normal hearing at standard audiometric frequencies (0.25–8 kHz). The onset of EHF hearing loss is discernible in individuals as young as their early 20s, and prevalence of EHF loss is significant among even younger adults (18–30 years).
Three papers focus on the role of EHF hearing in auditory perception. Mishra (2023) contribute to the ongoing discussion about the clinical implications of EHF hearing loss. Despite clinically normal audiograms, individuals with EHF hearing loss in that study exhibited deficits in auditory perception, suggesting a selective impairment in frequency discrimination. It is proposed that EHF hearing loss may indicate subtle auditory damage that is not reflected in standard audiograms but can affect specific aspects of auditory resolution in the standard frequency range.
The paper by Jain (2022) adds a valuable dimension to the overarching discussion on EHF hearing and its influence on higher-order auditory functions by investigating how age-related changes and variations in EHF hearing sensitivity influence individuals' abilities to segregate complex auditory streams and comprehend speech. The authors argue that impaired hearing in the EHF range is associated with impaired auditory function at lower frequencies despite normal audiometric thresholds. In conjunction with Mishra (2023), this study contributes to a better understanding of how EHF hearing thresholds relate to both basic auditory mechanisms and complex auditory tasks, specifically speech-related tasks in challenging listening conditions.
Waechter and Brännström (2023) investigate the relationship between the severity of EHF hearing loss and the measured distress associated with tinnitus while accounting for the magnitude of hearing loss at standard frequencies. A significant correlation between EHF loss and auditory-related tinnitus distress was found (but no significant relationship between EHF loss and the subjective experience of perceived tinnitus loudness), prompting a reevaluation of the hypothesis linking tinnitus to potential hearing loss or subclinical auditory dysfunction. Overall, these three papers emphasize EHF deficits as indicators of subclinical auditory processing dysfunctions, particularly within the standard audiometric frequency range.
III. THE EFFECTS OF EHF HEARING LOSS ON SPEECH PERCEPTION IN SPECIFIC POPULATIONS
Three papers discuss the impact of EHF hearing loss on speech perception. Saxena (2022) shed light on the functional consequences of EHF hearing impairment. In their study, young healthy individuals with clinically normal audiograms exhibited widespread EHF hearing loss and reported significant challenges in speech-in-noise recognition and everyday listening. The study employed the Speech, Spatial, and Qualities of Hearing Scale (SSQ) to assess self-reported hearing ability in complex listening situations. Individuals with EHF impairment had lower SSQ ratings than those with normal EHF hearing and had poorer speech-in-noise recognition, as evidenced by higher speech recognition thresholds (SRTs) that were associated with lower SSQ ratings. The authors' use of the SSQ emphasizes the significance of self-assessment tools in capturing the real-life listening experiences of individuals with EHF impairment.
Roup (2023) contribute to the broader discussion on the clinical implications of EHF hearing and its role in specific occupational settings. The study investigates the relationship between EHF hearing and speech-in-spatialized noise in firefighters with a history of exposure to occupational noise and airborne toxins. Considering firefighters' unique challenges, the authors explored how deficits in EHF hearing might impact speech comprehension in complex acoustic environments, such as listening to speech coming from different spatial locations. It was found that firefighters with poorer EHF thresholds experienced less benefit from spatial separation; EHF thresholds significantly predicted the spatial advantage, emphasizing the importance of EHF cues in spatialized noise performance among firefighters.
The article by Koerner and Gallun (2023) provides additional insights into the complex relationship between hearing capabilities and occupational exposures, complementing the findings in firefighters explored by Roup (2023). Koerner and Gallun investigate the speech understanding abilities and EHF hearing sensitivity in blast-exposed veterans, a population known to report auditory difficulties, e.g., understanding speech in complex environments, listening to rapid speech, or using the telephone, despite normal or near-normal hearing sensitivity. It was found that blast-exposed veterans exhibited a higher mean EHF pure-tone average and demonstrated poorer performance across various speech comprehension measures. Even after accounting for age and standard pure-tone averages, blast-exposed veterans consistently exhibited suboptimal performance in challenging listening conditions. Together, Roup et al. and Koerner and Gallun contribute valuable insights to the broader discussion on clinical implications in occupational settings, underscoring the need for a holistic assessment approach that includes evaluation of suprathreshold auditory abilities to capture the intricacies of auditory function in unique populations.
IV. PHONETIC AND INDEXICAL INFORMATION IN EHF SPEECH SPECTRUM
Moving from EHF hearing to acoustic information in EHF energy in speech, three papers report on its role in speech recognition and providing indexical (i.e., speaker-related) information. Monson (2023) show that EHF contains valuable phonetic information that enhances speech recognition, especially in challenging, complex listening conditions. Previous research demonstrated the utility of EHF cues in complex listening environments where maskers (other talkers) face away from the listener, leading to natural EHF attenuation. However, it is unclear whether the full range of EHF cues is beneficial when both masker and target talkers face the listener. Using various filtering combinations of full-band and low-pass filtered speech in both the target and masker, the authors propose three potential roles of EHF cues: providing phonetic information, supporting talker segregation, and facilitating selective attention to low-frequency phonetic information.
Jacewicz (2023b) examine the respective contributions of low- versus high-frequency information to the identification of a speaker's regional dialect and gender (male, female). Using highly variable conversational speech from 40 speakers (rather than controlled laboratory stimuli) and various low-pass and high-pass filter cutoffs, the study found that gender cues in EHFs were robust, and dialect was identified by human listeners well above chance. Both types of cues were still available in unintelligible speech, indicating that EHF can provide strong indexical cues about speaker characteristics that are not dependent upon speech comprehension.
The paper by Donai (2023) contributes additional evidence for the salience of segmental and indexical (speaker sex and speaker identity) EHF cues in brief 100-ms vowel segments using a machine learning framework. Classification performance for signals containing only low-frequency energy up to 4 kHz was very high, and for EHF signals above 4 kHz, it was well above chance for the same tasks. Together, Jacewicz et al. and Donai et al. emphasize that information related to speaker characteristics is well preserved in unintelligible EHF speech and suggest that indexical cues related to physical and social traits are integral to human speech.
V. EXPLORATIONS OF EHF INFORMATION IN FRICATIVE SOUNDS
Five papers represent the current interests in research on fricatives, the “high-frequency sounds,” which increasingly include languages other than American English, investigations of fricative productions in large conversational speech corpora, analysis of speech samples recorded at high sampling rates, modern statistical approaches, and explorations of new analytical methods that consider EHFs.
Hamza (2023) utilize computational models representing auditory nerve fibers and midbrain neurons in the inferior colliculus to study the neural response profiles to fricatives. These models were constructed with characteristic frequencies ranging from 125 Hz to 8 or 20 kHz. The modeled responses in the inferior colliculus (situated in the midbrain), particularly those associated with the characteristic frequencies beyond 8 kHz, better predicted behavioral perceptual data than the stimulus spectra and modeled auditory nerve responses. This was also true for spectrally degraded conditions, emphasizing the significance of EHF information. The activity in the inferior colliculus indicates that EHF energy is utilized in refining auditory information conducted to higher brain structures, and EHF plays an important role in this process.
Kharlamov (2023) analyze fricatives in a corpus of Canadian English interview speech sampled at both 44.1 and 16 kHz (the lower sampling rate was used in earlier widely used corpora recorded in the 1980s and 1990s). These two sampling rates allowed the investigators to compare contributions of EHF information up to 22.05 kHz with that up to 8 kHz, respectively. Using a set of measures based on both the linear and multitaper spectra and machine learning approaches (random forest analyses), the study demonstrated that frequency information above 8 kHz did not improve classification accuracy but did alter the classification rankings of individual measures and, thus, modified the magnitude of their contributions when EHF was included in the analysis of conversational speech.
The paper by Ulrich (2023) examines acoustic variation in Russian fricatives produced by men and women, implementing machine learning methods to evaluate the potential of two sets of parameters [acoustic parameters and mel frequency cepstral coefficients (MFCC)] to predict gender and speaker identity. Using a corpus of laboratory speech recorded at a 44.1 kHz sampling rate, analyses were conducted on an extended frequency range up to 20.05 kHz. Overall, gender of the speakers producing the fricatives was predicted with high accuracy by both sets of parameters, but individual speaker identity could not be predicted reliably. The paper further offers detailed analysis of acoustic variation in individual speakers' fricatives.
As a continuation of research aimed at finding improved methods to analyze fricatives, Shadle (2023) propose new measures that utilized spectral information up to 15 kHz. Speakers of American English producing four types of controlled speech were recorded at a 44.1 kHz sampling rate. The proposed measures reflected underlying articulatory and aerodynamic conditions in fricative production and were based on fricative spectra obtained in both isolated words and connected speech, which provided a richer base for deriving articulatorily based parameters than in traditional paradigms used in this type of work. The necessity of including EHF energy in deriving new measures characterizing fricatives was demonstrated.
Yang and Xu (2023) demonstrate that prelinguistically deaf Mandarin-speaking children using cochlear implants (CIs) are unable to produce the three-way contrast among the closely articulated fricative and affricate sounds in Mandarin as precisely as typically developing children. This is because children who were born deaf and were later fitted with CIs have never been exposed to the full frequency range of human speech due to inherent limitations of CI devices that typically cover the speech spectrum up to 8 kHz. In Yang and Xu, the spectral energy peaks in CI children's productions were much lower (around 5 kHz) and less distinctive than in the typical children (around 7 kHz). This finding implies that typically developing children benefit perceptually from high-frequency information in speech when learning phonological contrasts in their language and that the ability to hear the fine-grained acoustic distinctions is key to articulatory precision in producing them.
Together, these papers offer new insights into the utility of EHF information in fricative spectra, challenging the assumption that frequencies above 8 kHz are of limited relevance to speech recognition, acquisition and, possibly, technology applications.
VI. ULTRASONIC VOCALIZATIONS IN MICE IN RELATION TO HUMANS
The article by Yao (2023) offers an insightful review of ultrasonic vocalizations in mice. It elucidates various aspects of vocal communications, including social calls, isolation calls, and courtship calls, and discusses the relevance of ultrasonic vocalizations in mice to a model of human communication. Although some vocalizations in response to stress and pain are in human audible range, below 20 kHz, vocal communication in mice is typically within the ultrasonic frequency range between 30 and 120 kHz. The article emphasizes the social aspects of vocal communication in mice and reviews research employing ultrasonic vocalizations in clinical practice for studying neurodevelopmental disorders in humans, including autism, Rett syndrome, and various disorders involving abnormal cognitive, social, language, and motor behaviors. Specific deficits in mice serve as models for human disorders. Technological advancement has enabled the study of the acoustic properties of ultrasonic sounds in mice, and the article further discusses the possibility of extending the upper range of human hearing so that some forms of mouse communication could possibly be heard by humans. Wearable spatial hearing technology for ultrasonic frequencies that permits the listener to localize an ultrasonic sound (after shifting pitch) already exists, and future developments of such technologies could potentially extend human hearing into the ultrasonic range. This fascinating prospect seems not out of reach with further improvements in recording technology and analytics, and the findings about mice vocalizations reviewed in the article may lead to the development of stronger animal models for human behavior and neurocognitive abnormalities.
VII. CONCLUSION
The articles in this special issue represent current research exploring the potential of EHF information in human hearing, speech perception, development of statistical models, and refinement of traditional analytical methods of speech segments. Collectively, the research findings converge in showing not only that EHF energy is present in the speech spectrum, but listeners utilize EHF cues in speech processing and recognition. Moreover, EHF hearing loss has detrimental effects on the perception of sounds at lower frequencies and in real-world communication and occupational environments that involve complex listening conditions, which challenges the conventional notion that EHFs have minimal functional significance. Although EHF cues may not directly contribute to speech comprehension, they enhance sound characteristics and are in many ways beneficial in speech processing.
There is emerging evidence that even very high frequencies, those in the ultrasonic range of human hearing, activate brain structures in the auditory system, indicating that EHF information is detected and processed. Even more exciting is the possibility that complexity of sound processing may involve interaction of audible and inaudible frequency components, a certain kind of the hypersonic effect discussed at the outset, that may be in effect in specific communicative environments. Finally, with technological advancement, it seems not impossible that human audible frequency range can be extended even higher than the current audibility limit around 20 kHz, which implies that some vocalizations of other species can be heard by humans. These and other prospects await future research explorations.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts of interest to disclose.
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request.