Our ears are exquisitely sensitive. At the threshold of hearing, we can detect sounds that vibrate the tympanic membrane, or eardrum, over amplitudes of only a few picometers—a fraction of the diameter of a hydrogen atom. And yet at some frequencies, we can perceive sounds as loud as 120 dB—roughly the intensity of a chainsaw or jackhammer—before experiencing pain. Within that range, our ears can discern loudness differences smaller than a decibel and frequency differences much smaller than 1%.
The initial stages of sound perception involve purely mechanical energy. Sound waves displace the eardrum, and its vibration is transmitted to the inner ear, or cochlea, by three small bones in the middle ear—the malleus, the incus, and the stapes. (Figure 1 outlines the ear’s basic anatomy.) The main purpose of that three-bone lever system is to reduce the mechanical-impedance mismatch between the air-filled outside environment and the fluid-filled inner ear (see Physics Today, July 2015, page 14). Without the bony lever, most acoustic energy would be reflected rather than transmitted.
Sound waves transmitted into the inner ear travel on the basilar membrane, a structure that extends along the snail-shaped cochlea. The mechanical properties of the membrane change monotonically from base to apex along that cochlear spiral. The mass increases and stiffness decreases, which makes the cochlea act like a frequency analyzer (see Physics Today, April 2008, page 26). Each location on the membrane has a different resonant frequency. The basilar membrane’s vibration amplitude is maximal near the base of the cochlea for high frequencies and near the apex for low frequencies. And its mechanical characteristics determine the range of audible frequencies, which is about 20–20 000 Hz for humans.
The crucial step of mechanical-to-neural transduction is performed by inner hair cells, the neurons that lie on top of the basilar membrane. As the membrane moves, the stereocilia at the top of the cells are deflected; depending on the direction, the cells increase or decrease their release of chemical neurotransmitters (see the article by A. J. Hudspeth and Vladislav Markin, Physics Today, February 1994, page 22). Once a sufficient amount is released, a nerve impulse called an action potential is created in the auditory nerve, and an electrical signal goes to the brain: We hear sound.
Hearing lost and found
The auditory system is complex, and it can fail in many ways. In fact, roughly 17% of the US adult population suffers from some form of impaired hearing. Conductive hearing loss is due to mechanical interference with the propagation of acoustic energy into the inner ear. For example, pathological tissue growth can interfere with the displacement of the ossicular chain and seriously attenuate the sound going to the cochlea. That type of loss can frequently be cured with surgery that addresses the underlying mechanical problem or with hearing aids, which increase the amplitude of sound hitting the eardrum.
Sensorineural hearing loss, on the other hand, develops as inner hair cells are lost with age or from disease. When their loss is complete, the sound amplification provided by a hearing aid becomes useless because transduction of acoustic energy into neural impulses no longer takes place. A few decades ago, patients suffering from sensorineural loss would have been told that their impairment was untreatable. That changed in 1957 when the French team of André Djourno and Charles Eyriès designed a primitive device to restore hearing through electrical stimulation of the auditory nerve and implanted it into two patients.1 The cochlear implant was born.
The first versions of the device delivered stimulation through a single channel consisting of an electrode pair, and it therefore lacked the capability to stimulate different auditory neurons along the cochlea. Djourno and Eyriès abandoned their project in 1959 because of personal and philosophical differences. But other investigators continued the work, and over many years of successes and setbacks, it ultimately became clear that multichannel devices—those with multiple electrodes in the cochlea, each carrying information about a particular frequency range in the acoustic signal—would be necessary to achieve high levels of speech comprehension. Djourno had anticipated that finding2 as early as 1959, but the first successful multichannel cochlear implants were not developed until the 1970s.
Cochlear implants bypass the eardrum, the ossicular chain, the basilar membrane, and the (usually dead) hair cells. They stimulate the auditory nerve fibers with electrical pulses delivered from electrodes implanted inside the cochlea, as shown in figure 2. That mechanism works because a sufficient number of auditory nerve fibers remain alive and excitable in most patients with sensorineural hearing loss. An external speech processor captures sound with a microphone and encodes it for transmission from an external antenna to an implanted receiver.
The speech processor typically filters the acoustic signal into 12 to 22 frequency bands that cover the normal range of speech sounds. The output of each frequency band is sent to a different intracochlear electrode. High-frequency bands are sent to electrodes near the base of the cochlea, and low-frequency ones go to more apical locations. The mapping of frequency band to cochlear location mimics the frequency analysis performed by the basilar membrane in normal ears.
The processor calculates the envelope of the signal at the output of each filter and thus estimates the amount of energy in the input signal at each frequency and each point in time. The envelopes and the way they each change in time contain linguistic information that the processor transmits to the auditory nerve. The predominant method to stimulate the nerve electrically is through trains of square current pulses. Each square pulse delivers identical amounts of positive and negative charge and thereby avoids damaging neurons near an electrode.
To represent the frequency spectrum of the acoustic signal in the electrical stimulation pattern, each filter envelope modulates the electrical amplitude of a separate pulse train, one per electrode. Ideally, pulses delivered to each electrode should stimulate a restricted subset of auditory neurons. That goal is achieved by stimulating one electrode at a time while all the others are kept in an open circuit. The temporal interleaving of stimulation pulses prevents undesired currents between different channels and is done at a rate of several thousand pulses per second.3 Despite the discontinuous nature of the stimulation, patients experience a continuous auditory percept, much like the continuous image perceived in a movie theater, which is actually the successive presentation of 24 frames per second. Figure 3 illustrates how a given speech sound, such as “sa” in this case, is transformed by each main processing stage: band-pass filtering, envelope extraction, and pulse-train modulation.
After a patient has been surgically outfitted with a cochlear implant but before initial stimulation, a clinician measures the maximum amount of current that the patient can comfortably tolerate from each electrode and the minimum detectable current for each. The speech processor is programmed to ensure that stimulation levels of each electrode fall within those limits. The receiver chip that is implanted in the patient’s temporal bone gets its power from the same radiofrequency link that is used to transmit data. The chip decodes the incoming signal and converts it to stimulation pulses that are delivered to electrodes implanted in the cochlea.
Speech and the brain
Figure 4 illustrates how some of the dynamic frequency patterns found in a speech signal (top) are successfully mimicked (bottom) by a common signal-processing strategy used in cochlear implants. The top panel shows a spectrogram of the word choice. During the “ch” and “s” parts of the spoken word, the waveform is aperiodic. Both sounds are the result of turbulence created as air passes through different constrictions of the vocal tract, but the frequency range for the “s” sound is higher than that for “ch.” Vowels are sounds whose source lies in the quasiperiodic vibration of the vocal cords, and the frequency spectrum is altered by the shape of the vocal tract as determined by the position of the lips, tongue, velum, and other articulators.
Vowels are characterized by peaks of their frequency spectrum, also known as formants. The two lowest-frequency formants are particularly important for vowel identity. (See Physics Today, March 2004, page 23.) A change of those formants is noticeable in figure 4a between 0.55 and 0.8 seconds as the spoken vowel changes identity from “o” to “i.” Figure 4b shows the electrical stimulation that a cochlear implant imparted to its array of electrodes in response to the acoustic signal depicted in figure 4a. Many of the acoustic cues that identify different speech sounds are roughly replicated in the electrical stimulation pattern.
The subtle acoustic cues that are present in speech allow listeners to discriminate and identify different speech sounds; they facilitate what’s known as bottom-up processing of information. However, the extraordinary ability of the human brain to understand speech also relies on top-down processing—using semantic, grammatical, and real-world knowledge to enhance speech perception. Listeners can also focus attention on the speaker and take advantage of known characteristics, such as identity, gender, accent, voice qualities, and spatial location. (See the article by Emily Myers, Physics Today, April 2017, page 34.)
Indeed, humans are extraordinary pattern recognizers. Despite decades of R&D, children still outperform automatic speech-recognition systems in many acoustic settings. The uniquely human ability to employ top-down processing is largely what allows cochlear-implant patients to understand speech despite receiving an auditory signal that is deficient in bottom-up information.
Limitations and populations
Some aspects of speech signals are conveyed at least acceptably well by cochlear implants. In particular, temporal aspects of speech are delivered with good fidelity. Perception of sound duration by cochlear-implant users is on par with that of normal listeners. Such perception can be useful for identifying vowels—for example, heed is longer than hid—and for discriminating sounds that contain silent gaps, such as “apa” and “acha,” from those that do not, such as “ama” and “ala.”
On the other hand, cochlear implants suffer from some significant inherent limitations. The electrodes and the cell bodies of auditory-nerve neurons are separated by a bony wall, and the electrodes themselves are submerged in an electrically conducting medium. Both factors reduce a cochlear implant’s ability to stimulate small, distinct populations of neurons. Compounding the problem are possible local differences in the number of live neurons along the length of the cochlea.
Yet another limitation, at least for patients who enjoyed normal hearing prior to becoming deaf, is that the cochlear locations stimulated by the implant in response to a given frequency may differ from those stimulated by acoustic hearing. That is, the electrical stimulation patterns may not necessarily be delivered to the “right” physiological location—a feature not captured in a comparison between the panels of figure 4. The direction and extent of the frequency mismatch depend on the size of an individual’s cochlea, on the exact electrode location, and perhaps also on the pattern of neural survival. This frequency mismatch is likely one of the reasons why speech does not sound quite right upon initial stimulation. Sometimes it is completely unintelligible. Patients describe what they hear as a radio out of tune, Minnie Mouse, Donald Duck, or (less frequently) Darth Vader.
Fortunately for those patients, the human brain is plastic and shows impressive ability to adapt to distorted input. Although someone’s spouse might initially sound like a chipmunk, both the quality and intelligibility of their speech improves after a few weeks or months of experience with the implant. Determining the extent and limitations of the brain’s auditory plasticity is an important research direction that is likely to influence how cochlear-implant patients are managed clinically.
How well cochlear implants work depends to some extent on a patient’s history. People who lost their hearing after learning to understand spoken language do quite well, on average. The vast majority can enjoy a fluent face-to-face conversation, especially in quiet conditions, and most can communicate on the phone. A 2002 study found that adult cochlear-implant users were on average 78% correct in identifying words in sentences,4 and word-perception results are now even better with the introduction of newer speech processors.
A second population that benefits greatly from cochlear implantation is congenitally deaf children. In their case, hearing impairment affects not only speech perception but also the development of language skills and the ability to speak intelligibly. Cochlear implantation improves all three. According to some measures, language development after cochlear implantation proceeds at a similar pace as that found in normal-hearing children.5 Outcomes strongly depend on age at implantation, however, because speech perception, production, and oral language skills become increasingly delayed the longer a child is deprived of auditory input. The earlier the implantation, the smaller the developmental gaps.6
Clinicians typically screen babies for hearing loss shortly after birth, and the Food and Drug Administration has approved cochlear implantation in babies 12 months and older. Earlier implantation may be advisable in some cases, such as children deafened by meningitis. That pathology can cause bone growth inside the cochlea, which jeopardizes the success of future implantation. Children as young as two months old have received implants, but most surgeons prefer to wait until hearing impairment can be verified behaviorally. In any case, the human cochlea is near adult size at birth, so there are no concerns about a child’s outgrowing a given electrode array.
A third clinical population of interest is congenitally deaf patients who received implants in their adolescence or adulthood. Typically, those patients have difficulty understanding speech without the help of lip reading, and their ability to speak may also be impaired. Nonetheless, the patients usually appreciate a new awareness of environmental sounds and at least some speech perception, limited as it might be. Their results are much more modest than those of the two aforementioned groups.
Cochlear-implant users, regardless of their age when they became deaf and when they received an implant, frequently find it nearly impossible to communicate in noisy environments that a “normal hearer” would still find easy to navigate. Music is another problem area for implant recipients. The frequency range that is encoded and processed by an implant is designed to optimize speech perception, not music perception. The pitch of a musical note is determined by the fundamental frequency of a complex sound that has different amounts of energy at several harmonic frequencies, all of them some multiple of the fundamental frequency. And when harmonics either blur together or are shifted in frequency so that they cease to be multiples of a single fundamental frequency, the sound becomes severely distorted.
Both types of musical distortion happen in cochlear implants. Frequency–place mismatch, discussed earlier, accounts for one. A few features shown in figure 4 exemplify another. In the spectrogram of the acoustic input, the fundamental frequency and higher harmonics are visible as separate striations between 0.55 and 0.8 seconds. But in the corresponding pattern of electrical stimulation delivered to the patient’s cochlea, the harmonics become smeared and overlap each other because of the implant’s limited frequency resolution.
The focus of cochlear-implant designers on speech perception rather than, or perhaps to the detriment of, music perception is understandable in a pragmatic sense. But it may be one reason why music is so much less enjoyable for postlingually deaf cochlear implant users than it was when they had normal hearing. It is possible, but by no means certain, that alternative signal-processing schemes in future generations of the devices may help enhance their music enjoyment.
Past and future
In the early days of cochlear implantation, shortly after the FDA first approved the surgical intervention in the mid 1980s, only bilaterally, profoundly deaf individuals, with unaided hearing thresholds higher than 90 dB, were considered candidates. That made sense for several reasons. First, it was thought that insertion of stimulation electrodes in the cochlea would wipe out any residual hearing and thus violate the mythic medical directive to “first, do no harm.” Second, early cochlear implants provided a severely limited amount of speech perception, less than could typically be obtained even with a small amount of residual hearing.
Over time, speech-perception outcomes from the procedure improved and became comparable to those experienced by severely impaired hearing-aid users, with unaided hearing thresholds between 70 and 90 dB. The FDA approved cochlear implantation for that group of patients in the 1990s. Indications for cochlear implantation continued to expand, and now even some patients with normal hearing in one ear may seek cochlear implantation in their hearing-impaired ear. The most common reason for implanting a device in such patients is intractable tinnitus, or ear ringing, that sometimes accompanies hearing loss.7 In those “single-sided deafness” patients, the cochlear implant gives a small enhancement of speech perception in noisy settings, somewhat better sound localization, and increased enjoyment of music.
Another interesting development is the introduction of hybrid cochlear implants,8 approved by the FDA in 2014. They are intended for ears that hear low frequencies well but medium and high frequencies poorly. Patients with normal hearing up to 1500 Hz but profound hearing loss at 2000 Hz and higher are candidates for that type of device. Such hybrid implants use delicate electrodes that are carefully inserted to a somewhat shallower point than conventional cochlear-implant electrodes in an effort to preserve the patient’s low-frequency residual hearing, which is handled by the more apical regions of the cochlea.
A clinical trial of 50 patients showed that hybrid cochlear implants improve speech perception in both quiet and noisy settings, particularly for patients with better postoperative low-frequency hearing. However, nearly half of all hybrid-implant patients experienced significant low-frequency hearing loss, and six of them chose to have the hybrid removed and a traditional cochlear implant inserted.9
Regardless of the ultimate level of success obtained by hybrid cochlear implants, it seems clear that one important new way to improve hearing will be the development of devices whose stimulation is coordinated across acoustic and electric modalities. Prospective candidates would have residual acoustic hearing that on its own is insufficient for them to perceive speech, even with the use of hearing aids, but that may be successfully combined with a cochlear implant.
Today it’s common for children and adults with severe to profound hearing impairment to obtain bilateral cochlear implants.10,11 For those users, research is under way to better coordinate how the left and right ears are stimulated. Currently, that’s a tall order.
Sound localization is mediated by two acoustic cues: interaural level difference and interaural time difference (ITD), the differences in a sound’s amplitude and arrival times, respectively, at the two ears. (See the article by Bill Hartmann, Physics Today, November 1999, page 24.) The maximum possible ITD for an average-sized human head is about 0.7 ms, obtained for a sound located on one side. Although that’s small, normal-hearing listeners can detect differences as small as 10 µs. That small a time difference allows the brain to localize sound sources just one degree apart.
The poorer ITD conveyed by a cochlear implant may be related to the fact that the timing of stimulation pulses is not synchronized between left and right speech processors or between the acoustic signal at each ear and the corresponding processor. In either case, the delay adds to the difficulties in a noisy setting.
Lately, investigators have started exploring biological approaches to better integrate patients with their devices. Prototypes of cochlear implants include drug-delivery systems intended to reduce implantation trauma, reduce fibrous tissue growth around the electrodes, or even facilitate the growth of dendrites or axons from spiral ganglion cells toward the implanted electrodes, to improve the electrode–tissue interface. That direction, however, is one that researchers have only just begun to pursue.
More than a half million people have been surgically fitted with a cochlear implant. Yet only 1–5% of people who need cochlear implants actually have access to them. The device’s success has fostered the development of other types of neural prostheses, including ones that stimulate more central parts of the auditory system, such as the auditory brainstem12 or the midbrain. Most notably, researchers are making progress on prostheses that stimulate the retina13 or the visual cortex of blind patients. Some successful proof-of-concept designs of visual prostheses have benefited greatly from technologies developed for cochlear implants. And although researchers have yet to overcome important hurdles on the way to a clinically successful procedure, perhaps one day sight will join hearing as another human sense that can be restored by electronic means.
Mario Svirsky is the Noel L. Cohen Professor of Hearing Science in the department of otolaryngology–head and neck surgery at the New York University School of Medicine in New York City.