Cochlear implants have been the most successful neural prosthesis, with one million users globally. Researchers used the source-filter model and speech vocoder to design the modern multi-channel implants, allowing implantees to achieve 70%–80% correct sentence recognition in quiet, on average. Researchers also used the cochlear implant to help understand basic mechanisms underlying loudness, pitch, and cortical plasticity. While front-end processing advances improved speech recognition in noise, the unilateral implant speech recognition in quiet has plateaued since the early 1990s. This lack of progress calls for action on re-designing the cochlear stimulating interface and collaboration with the general neurotechnology community.
1. Introduction
Cochlear implants are the most successful neural prosthesis, having restored functional hearing to one million partially, or totally deafened people worldwide (estimated by the author in 2022, see below). The cochlear implant allows not only an average adult user to carry on a conversation on phones without lipreading, but, more importantly, it provides a mode for a child user to develop normal language (Niparko , 2010). In comparison, less than one thousand blind individuals have received a retinal implant to restore vision (Ayton , 2020) and 208 000 individuals have benefited from deep brain stimulation to treat neurological and neuropsychiatric disorders (Vedam-Mai , 2021).
The cochlear implant market is dominated by Cochlear Limited (Sydney, Australia), which has sold 650 000 implants over the last 40 years (Cochlear Limited, 2021). As a private company, Med El (Innsbruck, Austria) has not disclosed the number of implants sold, but reported in 2015 that it had surpassed Cochlear as the number one player in Europe and is clearly the second largest manufacturer (Zeng , 2015). Advanced Bionics (Valencia, CA) has not disclosed its number of implants sold either, but its parent company, Sonova (Stäfa, Switzerland) reported a total revenue of US$180 M for the cochlear implant segment in 2021, which was about one-fifth of Cochlear's revenue (Sonova, 2021). Reported in 2021, Nurotron (Hangzhou, China) has sold more than 20 000 devices while Oticon Medical, formerly Neurelec (Vallauris, France) sold 16 500.1 Other smaller companies have either sold several thousand devices (e.g., Listent, Shanghai, China) or implanted several hundred devices in their clinical trials (e.g., AIC Aiyisheng, Shenyang, China; Shree Coratomic Limited, Madhya Pradesh, India; Todoc, Seoul, Korea). There is also a dwindling number of active users of several thousand 3M/House (House and Urban, 1973) and Ineraid (Eddington, 1978) devices, which were developed in the 1970s but phased out of the market in the 1990s. Figure 1 shows the cumulative share of the one million cochlear implant market.
The one million cochlear implant market share. Cochlear Limited is the largest manufacturer with 50% or more market share, Med El is second with ∼25% share, and Advanced Bionics Corporation is third with 20% or less share. Other manufacturers are much smaller with about 5% share, including Nurotron (Sina, 2021), Oticon (Oticon, 2021), Listent (Listent, 2022), AIC (AIC, 2022), Shree Coratomic Limited (India_Technology_Development_Board, 2018), and Todoc (Todoc, 2022).
The one million cochlear implant market share. Cochlear Limited is the largest manufacturer with 50% or more market share, Med El is second with ∼25% share, and Advanced Bionics Corporation is third with 20% or less share. Other manufacturers are much smaller with about 5% share, including Nurotron (Sina, 2021), Oticon (Oticon, 2021), Listent (Listent, 2022), AIC (AIC, 2022), Shree Coratomic Limited (India_Technology_Development_Board, 2018), and Todoc (Todoc, 2022).
To celebrate the one millionth cochlear implant, this paper first describes the milestones of research and development that are responsible for this remarkable medical device. Then it examines the science behind the most successful neural prosthesis and discusses the current bottleneck limiting its performance. It is worth noting that many of the fundamental contributions were published by members of the Acoustical Society of America in its flagship journal. Finally, the paper will conjecture the pathways to improve cochlear implant performance and highlight the collaborative opportunities between the cochlear implant field and the general neural technology community.
2. Milestones of cochlear implant research and development
Three stages are involved in making a successful medical device: feasibility, safety, and efficacy. The feasibility of cochlear implants can be traced back to Volta who in 1800 demonstrated that electric stimulation can evoke hearing [for review see Zeng (2008)]. Figure 2 shows the three milestones that established the safety and efficacy of modern cochlear implants. William House, father of neurotology, developed the first FDA-approved cochlear implant, which had a passive coil-transmission link and a single intra-cochlear electrode. The House single-channel device was critical to the safety and acceptance of the cochlear implant, but it could produce only a minimal level of open-set sentence recognition, with its function being mostly to aid lipreading and improve environmental sound awareness (House and Berliner, 1986). Graeme Clark led the development of a multi-channel cochlear implant or the Nucleus-22 device, which he described as “a second breakthrough,” as it allowed “deaf post-lingual adults not only to perceive the complex noise of their acoustic environment but also to recognize words with and even without visual cues” (Clark , 1987). Wilson (1991) developed the “Continuous-Interleaved-Sampling” or CIS strategy, which has since become the de facto industrial standard that enables the average user to achieve a high level (70%–80% correct) of sentence recognition in quiet without lipreading.
Milestones of cochlear implant research and development. Open-set sentence recognition accuracy serves as the measure of milestones. William House's single-channel device established the feasibility and safety but provided limited sentence recognition (photo courtesy of House Institute Foundation). Graeme Clark developed a multi-channel device that provided ∼30% correct sentence recognition (photo courtesy of Graeme Clark). Blake Wilson, recognized by the Helmholtz-Rayleigh Interdisciplinary Silver Medal from the Acoustical Society of America in 2017, improved signal processing to result in 70%–80% accuracy by modern cochlear implant users (photo courtesy of Blake Wilson).
Milestones of cochlear implant research and development. Open-set sentence recognition accuracy serves as the measure of milestones. William House's single-channel device established the feasibility and safety but provided limited sentence recognition (photo courtesy of House Institute Foundation). Graeme Clark developed a multi-channel device that provided ∼30% correct sentence recognition (photo courtesy of Graeme Clark). Blake Wilson, recognized by the Helmholtz-Rayleigh Interdisciplinary Silver Medal from the Acoustical Society of America in 2017, improved signal processing to result in 70%–80% accuracy by modern cochlear implant users (photo courtesy of Blake Wilson).
3. Hearing and speech sciences behind the cochlear implant
Modern multi-channel cochlear implants were designed with deep knowledge of hearing and speech research in mind. On the hearing side, the design followed both the place and timing principles of pitch encoding. The arrangement of the electrodes abides by the tonotopic organization in the inner ear (Greenwood, 1961) while the number of the electrodes approximates the 24 auditory filters or critical bands (Zwicker, 1961). The electric pulses can be either periodic or random, depending on attributes of the speech signal, because they produce the nerve discharge patterns that are phase-locked to the stimulus waveform (Kiang and Moxon, 1972). On the speech side, cochlear implant signal processing relied on acoustic theories of speech production and perception.
3.1 Source and filter to represent spectral fine structure and envelope
Fant (1970) separated speech into two parts: (1) a source that is either a periodic signal from the vocal cord vibration or a noisy signal from air flow pushed out by the lung and (2) a filter that depicts resonance frequencies of the vocal tract. Figure 3 shows the spectrum of a periodic source, which consists of harmonics with the first harmonic being termed fundamental frequency or F0; this source is spectrally shaped by the vocal tract resonance or formant frequencies, with the first two formants being labelled as F1 and F2. The source contributes to speaker identity and quality, whereas the filter to speech intelligibility. For example, the F0 can be doubled to make a male speaker sound like a female speaker (second row), or the periodic source can be replaced by a noise source to make whispered speech (third row). None of these changes in the source will alter speech intelligibility (in quiet) as long as the vocal tract filter (i.e., formants) remains intact. Three audio samples (Mm. 1–Mm. 3) are provided to demonstrate the effects of changing sound source.
Gunnar Fant's source and filter model in speech production. Gunnar Fant received the 1981 ASA Silver Medal in Speech Communication [reproduced from “Fant, Gunnar, receives Silver Medal in Speech Communication,” J. Acoust. Soc. Am. 69, 605 (1981) with permission of Acoustical Society of America, Copyright 1981, Acoustical Society of America]. Top row: speech source (F0 represents the first harmonic of a periodic source), speech filter shaped by the vocal tract resonance frequencies (formants: F1 and F2), and the resulting speech spectrum. Second row: F0 is doubled. Third row: The periodic source is replaced by a noise source. Audio files are available online to demonstrate the effects of altering speech source without changing the filter formants.
Gunnar Fant's source and filter model in speech production. Gunnar Fant received the 1981 ASA Silver Medal in Speech Communication [reproduced from “Fant, Gunnar, receives Silver Medal in Speech Communication,” J. Acoust. Soc. Am. 69, 605 (1981) with permission of Acoustical Society of America, Copyright 1981, Acoustical Society of America]. Top row: speech source (F0 represents the first harmonic of a periodic source), speech filter shaped by the vocal tract resonance frequencies (formants: F1 and F2), and the resulting speech spectrum. Second row: F0 is doubled. Third row: The periodic source is replaced by a noise source. Audio files are available online to demonstrate the effects of altering speech source without changing the filter formants.
Original recording of a male voice “A large size in stockings is hard to sell.” This is a file of type “wav” (0.14 MB).
Original recording of a male voice “A large size in stockings is hard to sell.” This is a file of type “wav” (0.14 MB).
Double F0. This is a file of type “wav” (0.14 MB).
Noise source. This is a file of type “wav” (0.28 MB).
The early multi-electrode cochlear implants, especially the most successful Nucleus device, were designed based on this source and filter model. The first Nucleus wearable speech processor encoded the source with either a stimulation rate at F0 for a voiced sound or a random rate for a voiceless sound, but it extracted only the second formant or F2 of the sound to encode its filter information (Tong , 1980). Because F0 does not contribute to intelligibility and F2 is not adequate to encode consonant information (Blamey , 1987), this early speech processor produced 10%–20%-correct sentence recognition. Even when the F1 was added later, sentence recognition improved only to 30%–40% correct (Dowell , 1986). The relatively low-level sentence recognition was due to a technical difficulty at that time and a fundamental limitation in cochlear implants. The difficulty was that the digital signal processing chips then could only handle less sophisticated algorithms (i.e., zero crossing detection), which gave a crude estimate of the signal instantaneous frequency and was susceptible to background noise. Even if these fundamental and formant frequencies could be accurately measured, which is trivial by today's technology, they cannot be precisely mapped to the right cochlear place(s) by cochlear implants. Not only does the location of the intra-cochlear electrodes vary greatly due to insertion depth, bending or even incorrect placement (e.g., scala vestibuli), but also individuals have different degrees and patterns of nerve survival. This individual variability is further complicated by the large spread of electric excitation, making it impossible to reproduce the sharply-tuned normal tonotopic organization that is critical to encoding of place-specific frequency (Zeng , 2014). A different approach was needed to improve cochlear implant speech recognition performance.
3.2 Vocoders to represent temporal envelope
Dudley (1939) showed in his famous ten-channel “vocoder” invention that, to make speech intelligible, one does not have to accurately obtain formant frequencies, which are crudely represented by the high energy of slow (∼10 Hz) amplitude modulation extracted from each channel (e.g., see high amplitude in the two highest frequency bands for fricative/s/, red circle, and F1 and F2 for vowel/e/, dashed circles, respectively, in Fig. 4). This slow amplitude modulation was termed as the temporal envelope cue (Van Tasell , 1987; Rosen, 1992), which turned out to be the key to realizing the high-level performance (70%–80% correct sentence recognition in quiet) of modern cochlear implants. Several studies contributed to this success. First, the temporal envelope cues from as few as three channels are sufficient for speech recognition (Shannon , 1995). Second, the carriers, whether sinusoids or narrow-band noises, are not important (Dorman , 1997). The top-right panel of Fig. 4 shows the original waveform for a syllable /sa/ and its envelope (red line on the positive side of the waveform), while the lower panels show the sinusoid and noise carrier, modulated by the /sa/ envelope, respectively. Third, because of the high-degree of phase locking of the auditory nerve discharge to electric stimulation (Dynes and Delgutte, 1992), cochlear implant users can fully take advantage of temporal envelope cues, being able to detect even smaller fluctuations than normal acoustic listeners (Shannon, 1992). Finally, Wilson (1991) put it all together in the CIS strategy, in which the temporal envelope was properly extracted and compressed to match the individual implant user's dynamic range, which was then used to amplitude modulate a relatively high-rate (∼1000 Hz) biphasic pulse carrier, which was interleaved among electrodes to avoid interactions associated with simultaneous stimulation. The CIS and other similar implementations around that time [e.g., n-of-m (McDermott , 1992)] have remained the cornerstone of all modern cochlear implants.
Temporal envelopes and vocoders. Left panel: Dudley and his vocoder (photo courtesy of Nokia Corporation and AT&T Archives). Middle panel (modified from Fig. 10 in Dudley, 1940): ten-channel vocoder showing temporal envelopes from each channel and energy concentrations for /sh/ and /e/ (red circles). Right panel: top = original speech waveform and its envelope (red line on the positive side of the waveform), mid = the same envelope with a sinusoidal carrier, bottom = the same envelope with a noise carrier. Audio files are available online to demonstrate the effects of altering speech temporal fine structure without changing the speech temporal envelope.
Temporal envelopes and vocoders. Left panel: Dudley and his vocoder (photo courtesy of Nokia Corporation and AT&T Archives). Middle panel (modified from Fig. 10 in Dudley, 1940): ten-channel vocoder showing temporal envelopes from each channel and energy concentrations for /sh/ and /e/ (red circles). Right panel: top = original speech waveform and its envelope (red line on the positive side of the waveform), mid = the same envelope with a sinusoidal carrier, bottom = the same envelope with a noise carrier. Audio files are available online to demonstrate the effects of altering speech temporal fine structure without changing the speech temporal envelope.
Cochlear implant simulation with a sinusoid carrier. This is a file of type “wav” (0.14 MB).
Cochlear implant simulation with a sinusoid carrier. This is a file of type “wav” (0.14 MB).
Simulation with a noise carrier. This is a file of type “wav” (0.14 MB).
4. Cochlear implants as a research tool
Cochlear implants have also been used as a research tool to deepen our understanding of hearing and speech processes or even to discover new mechanisms. One example is loudness coding. It is well known that loudness grows as a power function of sound intensity. The classic theory is that this power function is due to nonlinearity in the cochlea (Stevens, 1961), with the brain behaving as a linear device (Krueger, 1989). Because the cochlea is bypassed in electric hearing, one would predict a linear loudness function in cochlear implant users, had the classic theory been true. Instead, an exponential loudness function was found, suggesting a two-staged, compression-to-expansion, nonlinear mechanism in loudness coding (Zeng and Shannon, 1994). This active loudness model has been incorporated to produce a unified theory of auditory intensity perception (Zeng, 2020) and explain tinnitus, which would be difficult for a passive, linear brain (Zeng, 2013).
Another example is pitch coding, which has traditionally adopted a duplex theory. For high-frequency tones, the pitch relies on a place code, corresponding to the aforementioned tonotopic organization (Zwislocki, 1991). For low-frequency tones, the pitch relies on a timing code, in which the tone frequency is the reciprocal of the nerve discharge interval (Rose, 1967). These two codes were thought to be independent of each other, with their relative contributions being a long-time hot topic of debate in hearing (Burns and Viemeister, 1976; Cariani and Delgutte, 1996; Oxenham , 2004). In cochlear implants, the location and timing of stimulation can be independently manipulated to provide a test of the classic duplex theory. Indeed, electric pitch depends on both stimulation location and timing, suggesting that “modern pitch models of complex sounds should take absolute place information into account” (Zeng, 2002). Oxenham (2004) used “transposed stimuli” in acoustic hearing to reach the same conclusion.
Finally, cochlear implants have been used to illustrate the importance of brain plasticity. The brain is a dynamic device that adapts to changes. Not only does electric dynamic range expand (Henkin , 2006), but also electric pitch changes with cochlear implant experience over time (Reiss , 2007). Electric stimulation was used to investigate brain plasticity as a result of both hearing loss and restoration of hearing via a cochlear implant (Ryugo , 2005). Comparison of evoked potentials between the implant and hearing individuals revealed a critical period for brain maturation and possibly language development (Ponton and Eggermont, 2001; Sharma , 2002; Harrison , 2005). Cochlear implant research in both animals and humans discovered cortical re-organization not only in the auditory area but also the visual and somatosensory regions in the brain (Lee , 2001; Land , 2016). While it is not clear whether these changes in the brain are good or bad, monitoring plasticity may lead to improvements in the ability to predict and explain cochlear implant outcomes (Feng , 2018).
5. Solutions to the bottleneck in cochlear implant performance
Despite these important discoveries as well as technological advances from improved front-end processing (Boyle , 2009) to bilateral electric hearing and electro-acoustic stimulation, the baseline performance of an average unilateral implant user has remained unchanged in the last 30 years (Zeng, 2017). Compared to normal-hearing listeners, current cochlear implant users still experience tremendous difficulty in recognizing speech in background noise, localizing sounds, and discriminating melody and timbre, let alone appreciating the subtlety and richness of more complex music such as symphony (McDermott, 2004; Kerber and Seeber, 2012; Limb and Roy, 2014; Fowler , 2021). This 30-year bottleneck is not due to a lack of effort, as many innovative ideas have been tried but not yet produced significant improvements in performance. These ideas included, for example, increasing stimulation rate to simulate the auditory nerve spontaneous activity (Rubinstein , 1999), frequency modulation encoding to restore fine structure (Nie , 2005), asynchronous interleaved sampling to simultaneously encode both envelope and phase (Sit , 2007), generating normal auditory nerve discharge patterns via a computational model (Chen , 2005; Erfanian Saeedi , 2017), reducing spread of excitation via bipolar and tripolar or asymmetrical waveform stimulation (Berenstein , 2008; Zhu , 2012; Carlyon , 2014), and creating virtual channels via multiple electrode stimulation (Donaldson , 2011).
To address this bottleneck, current cochlear implants need to improve both the encoding and transmission of spectral and temporal fine structure cues (Zeng , 2005). So far, most ideas have focused on the encoding of these fine structure cues in the speech processor, with little or no investment being made in the critical electrode-to-neuron interface. In fact, the fundamental design of the 12–24 electrodes in the scala tympani has remained unchanged since its inception, except for improvements and cosmetic changes in the electrode shape, size, and position or the array length, thickness, and curvature. The continued use of this same electrode design is the bottleneck limiting cochlear implant performance. Until the electrode-to-neuron interface is re-designed to be able to transmit the necessary cues, signal processing in the speech processor will not likely produce any further improvement in performance.
Several novel ideas have been proposed to change the electrode-to-neuron interface to improve future cochlear implant performance. One idea is to directly implant a penetrating electrode array into the auditory nerve, which has been shown in an animal model to reduce both the stimulation threshold and the spread of excitation compared to the conventional cochlear implants (Middlebrooks and Snyder, 2007). There are two NIH-supported studies testing the feasibility of this penetrating nerve implant in humans (5R01-DC017182 by John Middlebrooks and Harrison Lin at University of California Irvine and 5UG3-NS107688 by Hubert Lim at University of Minnesota). The other idea takes an opposite approach, whereby instead of implanting the electrodes into the auditory nerve, neurotrophic factors are applied to attract the auditory nerve to grow into the cochlear electrodes (Pinyon , 2014). This combined cochlear implant and drug delivery approach serves as a means of not only improving cochlear implant performance but also delivering drugs to the inner ear. A third idea, which is more aggressive than the previous two, uses light stimulation to restore normal patterns of excitation in the cochlea. Animal research has demonstrated the feasibility and safety, and attempts are being made to reduce the power and size of the light stimulation system appropriate for human usage (Keppeler , 2020; Littlefield and Richter, 2021).
6. Concluding remarks
The cochlear implant field has been and still is a leader in neural technology, but other fields are catching up and may likely take over in the near future. For example, spinal cord stimulation is now used to manage pain in ∼50 000 patients annually (Sdrulla , 2018), approaching the annual number of cochlear implantations (∼65 000). Technological development in cochlear implants, or recently lack thereof, has lagged behind that of brain-computer interfaces, which employ thousands of electrodes with both recording and stimulating capability, protected and seamless wireless communication, and extended recharge-free battery life up to 20 years in the body (De Wachter , 2020; Luan , 2020). As a result, a totally implantable cochlear implant remains elusive, despite that the idea has been proposed for decades and likely would make a tremendous impact on convenience, stigma, and lifestyle. While cochlear implant researchers were innovative in moving the site of stimulation from the cochlea to the auditory brainstem and midbrain (Brackmann , 1993; Lim , 2008), there is no recent advance in this methodology. In contrast, visual researchers are actively pursuing a high-density cortical implant in hoping to restore not only visual acuity but also color vision (Chen , 2020; Fernandez , 2021; Towle , 2021). Driven by the current $4.5-billion NIH Brain Initiative and similar efforts globally, neuroscientists and neuroengineers are rushing to the first-in-human applications that hope to treat or prevent a wide range of neurological and psychiatric diseases. While celebrating this remarkable one millionth cochlear implant achievement, we can and should apply our experience and expertise to help the “newcomers” avoid innovation traps while borrowing new ideas from them to restore or even enhance normal hearing.
Acknowledgments
The author thank Dr. Charles C. Church for inviting this paper, which was based on the Acoustical Society of America Webinar Series on 19 August 2021. The author thank Dr. Michelle Kapolowicz and Dr. Terrin Tamati for moderating the Webinar and comments on the paper. Comments from three anonymous reviewers also improved the final version of the paper. Jason Luo helped generate Fig. 4.
On April 27, 2022, Cochlear bought Oticon Medical for US$120 M.