The cascade of asymmetric resonators with fast-acting compression (CARFAC) is a cascade filterbank model that performed well in a comparative study of cochlear models, but exhibited two anomalies in its frequency response and excitation pattern. It is shown here that the underlying reason is CARFAC's inclusion of quadratic distortion, which generates DC and low-frequency components that in a real cochlea would be canceled by reflections at the helicotrema, but since cascade filterbanks lack the reflection mechanism, these low-frequency components cause the observed anomalies. The simulations demonstrate that the anomalies disappear when the model's quadratic distortion parameter is zeroed, while other successful features of the model remain intact.
1. Introduction
The CARFAC (cascade of asymmetric resonators with fast-acting compression) is a nonlinear cascade filterbank cochlear model devised as an efficient sound processor, for mono, stereo, or multi-channel sound inputs (Lyon, 2011, 2017). It was developed to serve as a realistic biologically-motivated cochlear audio processor for machine hearing systems such as those that process audio signals to extract desired acoustic features and inform classifiers, recommenders, and copyright decisions (e.g., Walters et al., 2013). A fully-digital hardware implementation of the model has been recently realized by Xu et al. (2018) on an Altera Cyclone V Field Programmable Gate Array platform to be used as an audio pre-processing frontend for complex machine hearing applications such as sound localization, sound segregation, and automatic speech recognition.
Saremi et al. (2016) compared seven cochlear models on several tasks to assess their capabilities in reproducing key aspects of human cochlear processing on the basilar membrane (BM) level such as excitation pattern, compression, frequency selectivity, and group delay. According to the results (Tables III and IV of Saremi et al., 2016), CARFAC demonstrated an outstanding agreement with the experimental data while having a reasonable computation cost. However, further investigation showed two apparent anomalies, in the frequency response of the model to low-level clicks and in the tail of the excitation pattern. We explore the underlying mechanism that is responsible for generating these anomalies on the model's BM output, i.e., CARFAC's intentional inclusion of quadratic distortion effects, and analyse how quadratic distortion can cause the observed anomalies in a cascade filterbank model.
1.1 Generating distortion in CARFAC
Since empirical evidence for the cochlear nonlinearity emerged (Rhode, 1971), non-linear functions have been incorporated in the computational models to simulate this phenomenon and its implications. One of the pioneering efforts in this regard is the BM model of Kim et al. (1973), which includes a symmetric nonlinear component, i.e., an even-order damping term that is insensitive to the direction of the BM displacement and generates third-order distortion.
The CARFAC model includes several nonlinear effects in one digital outer-hair-cell block (Lyon, 2017) that includes the outer hair cell (OHC) transduction/feedback nonlinearity with efferent modulation of its strength. The nonlinearity in the CARFAC model is closely related to the damping term in the second-order differential equation of Johannesma (1980). Equation (1) defines Johannesma's nonlinear damping function where is the resonance frequency, y denotes the resonator's displacement, and and are its velocity (first derivative) and acceleration (second derivative), respectively:
The coefficient of in Eq. (1) determines the damping of the harmonic resonance in a nonlinear mass-spring-damper system. The CARFAC nonlinear function is more complicated (Lyon, 2011, 2017), but Eq. (1) can be interpreted as a Taylor-series expansion of its nonlinear function for small displacements or velocities. In the CARFAC design, the nonlinear function controls via the concept of “relative undamping.” The amount of this relative undamping regulates the additional gain achieved by the filters of the model, and thereby simulates the role that the OHCs play in shaping the gain of the “cochlear amplifier” [see Fettiplace and Hackney (2006) for a review].
In Eq. (1), the term determines the amount by which the square of the resonator's displacement ( decreases the damping. Since acts on the square of the resonator's displacement and thus is insensitive of the direction of the displacement, it is reminiscent of the symmetric nonlinearity proposed by Kim et al. (1973). This symmetric nonlinearity is the source of the cubic distortion tones (CDTs) and compressive behaviour in several cochlear models (e.g., Kim et al., 1973; Johannesema, 1980; Saremi and Stenfelt, 2013).
Additionally, Eq. (1) includes also an odd-order (i.e., asymmetric) coefficient () that directly multiplies the linear velocity (. This coefficient has been incorporated in the CARFAC model to generate even-order distortions that are linked to the observed quadratic distortion tones (QDTs). Quadratic distortion has been found to have a significant perceptual effect, for example, in improving the perception of musical pitch for tones with missing fundamental (Pressnitzer and Patterson, 2001), which is why CARFAC includes this effect by default. According to the CARFAC design, the default value of the asymmetric coefficient, related to in Eq. (1), is 0.04. This value was not calibrated according to specific biophysical data but it successfully yields QDTs that are similar to those recorded by Cooper and Rhode (1997).
2. Method
The Matlab implementation of the CARFAC model was retrieved from the project's open-source website (Brandmeyer et al., 2015) and was executed for the default number of 71 channels (i.e., cochlear partitions) and at the default sample rate (Fs = 22.05 kHz). The asymmetric coefficient, which controls the amount of quadratic distortion generated in the model, was altered by changing the value of the parameter “V_offset” defined in the “CARFAC_Design.m” Matlab script; we call this asymmetry coefficient “the offset parameter” in subsequent sections.
2.1 Estimating the QDT
A 30 dB sound pressure level (SPL), complex tone consisting of two pure sinusoids at F1 = 1600 Hz and F2 = 1800 Hz was presented to CARFAC. The QDT was assessed by calculating the amplitude of the 200 Hz component (i.e., F2 – F1) at the output of each channel using the discrete Fourier transform (DFT).
2.2 Estimating the excitation pattern
An “excitation pattern” demonstrates the overall vibration of the BM in response to an acoustic stimulant and is obtained in psychoacoustics by plotting the output level of each auditory filter as a function of the filter's characteristic frequency (CF) (Fletcher, 1940; Moore and Glasberg, 1983).
The excitation pattern can be estimated from the root-mean-square (RMS) of each channel output in response to a tone at a given frequency (Fig. 3 of Moore and Glasberg, 1983). We constructed the excitation pattern of the CARFAC by calculating the RMS output of the channels, in response to a 30 dB SPL tone at 4 kHz without restricting to energy near the stimulus frequency as a tracking amplifier would.
2.3 Estimating the frequency response
The spectral characteristic of the model was assessed using impulse responses. Impulse responses provide a complete characterization of linear systems (Oppenheim et al., 2014); however this is generally not true for nonlinear systems. The cochlea is typically approximated as a quasi-linear system in the sense that it behaves locally linearly at any given input level, and thus impulse responses can be used to explore its spectral characteristics around certain intensity. This implicit assumption is commonly taken whenever tones (e.g., Cooper, 1998) and clicks (e.g., Recio et al., 1998) are used for characterizing the gain of the cochlea in the frequency domain.
The impulse response of the model to low- and high-intensity clicks was estimated using an 80-μs condensation click at 30 and 70 dB peak equivalent SPL (dB pe SPL), respectively. The frequency response was estimated by calculating the DFT of the impulse response. To assess the effect of the offset parameter on cochlear tuning, the frequency response was calculated at different intensities of the click and then the equivalent rectangular bandwidths (ERBs) were derived from the magnitude of the frequency response using the methods described in Saremi et al. (2016) to generate their Figs. 4 and 5. The cochlear tuning was then represented by the quality factor of the corresponding ERBs, called , as a function of the click intensity.
3. Results
Figure 1(A) shows the maximum of the Fourier-series spectra of the outputs from all 71 channels of the CARFAC model (with default parameters) in multiples of the 200 Hz fundamental, in response to combination tones at F1 = 1.6 and F2 = 1.8 kHz. It shows the primaries at 1.8 and 1.6 kHz, the CDT at 1.4 kHz, and the QDT at 200 Hz. The solid line in Fig. 1(B) illustrates the amplitudes of the QDTs (i.e., 200 Hz components) per cochlear location and CF. The gray lines show the same for the case where the offset value has been reduced from its default value (0.04) to 0.01. In both cases, the amplitudes of the QDT have significantly decreased. Moreover, as the dashed line shows, setting the offset parameter to zero reduces the QDT levels in the middle and apical channels, where the QDT is relevant, to effectively zero.
3.1 Excitation patterns
Figure 2(A) shows the excitation patterns of the model in response to a 30 dB SPL tone at 4 kHz. All curves depict a peak at the CF site of 4 kHz (channel 16); however the tails of the excitation patterns are different for different offset values. As a result of setting the offset value to zero, and thereby removing the quadratic distortion, the tail of the excitation pattern drops sharply.
To investigate the underlying cause of this phenomenon, the dominant frequency (i.e., the frequency at which the largest amplitude in the spectrum is observed) at the output of each channel from base to apex were assessed and shown in Fig. 2(B). The CARFAC channels on the basal side of the 4 kHz CF site (channels1 to 16) pass 4 kHz as the dominant frequency in their output, as do several channels after the CF site (until channel 26 which is the CF site of 2217 Hz). From this point toward the apex the dominant frequency is 0, indicating that the outputs of those channels are dominated by DC. This DC term, which is constant in amplitude across channels 27 onward to the apex (channel 71), creates the anomalous apical tail in Fig. 2(A), solid line. The DC term corresponds to the QDT that has been generated in the model in response to the single tone's steady envelope, a so called “DC QDT.”
DC shifts in BM are not generally seen (Cooper and Rhode, 1992); this might be because in real cochlear wave propagation, any DC pressure component is essentially canceled by the helicotrema boundary reflection (i.e., 180-degree phase shift of pressure reflection) at the apical end of the cochlea where the pressure field across the two scalae needs to equalize. However, some DC effects (baseline shifts) were reported elsewhere in the tectorial membrane (Rhode and Cooper, 1996) and in Hensen's cells in the organ of Corti (Cooper and Dong, 2003), indicating the presence of DC QDTs in the overall cochlear macromechanics beyond the BM level.
CARFAC, being a cascade filterbank model, lacks the reflection (retrograde propagation) mechanism of a transmission line. Therefore, the DC component that is generated by the quadratic distortion is not canceled by reflection and instead appears as a DC displacement in the model's BM output, causing the anomaly observed in Fig. 2(A). As a result, this anomaly could be regarded as a by-product of the CARFAC fundamental modelling principle (i.e., its cascade filter structure).
As Fig. 2(B) depicts, once the quadratic distortion is turned off by setting the offset parameter to zero, the DC QDT is gone and the apical channels (31 to 71) resonate at their intrinsic CF according to the model's location-frequency map, driven only by the tone onset transient and numerical noise in the model. The amplitude of the resonance is seen in the sloping tail of Fig. 2(A) (solid line) which indicates that the amplitude of the resonance fades toward the apex, as expected from excitation patterns estimated by psychoacoustic experiments (e.g., Fig. 3 of Moore and Glasberg, 1983).
3.2 Frequency responses
Figure 3(A) shows the frequency responses at the CF site of 1 kHz (channel 39) to a low-intensity click at 30 dB pe SPL. As the gray line shows, the frequency response shows a small low-frequency tail bump from DC to around 200 Hz, corresponding to the quadratic demodulation of the envelope when using the default offset parameter. This sort of low-frequency bump is not apparent in the BM frequency responses experimentally estimated using similar click impulses from chinchilla cochleae (Recio et al., 1998, Fig. 10), although some low-frequency bumps were seen elsewhere in Hensen's cells near the apex in guinea-pigs, at least by 50–60 dB SPL (Cooper and Dong 2003). Figure 3(B) shows the frequency response to a high-intensity click at 70 dB pe SPL. It is similar to Fig. 3(A), but with a relatively larger low-frequency bump.
As the solid lines in Figs. 3(A) and 3(B) show, zeroing the offset parameter resolves the anomalous low-frequency bump, and makes the tails decay sharply. As a result, the depth (i.e., the difference between the peak and the low-frequency tail of the curve) in Fig. 3(A) becomes 59 dB, which is in good agreement with the measurements recorded in guinea pigs [Cooper, 1998, Fig. 1(A)], which showed an overall depth of approximately 55 dB in response to 30 dB tone stimuli.
4. Discussion
The nonzero default offset parameter in CARFAC's digital OHC model is part of realistically reproducing the cochlea's nonlinearity, in which both cubic and quadratic distortions are prominent. That is, the inclusion of the offset parameter provokes low-frequency components that represent QDTs in the cochlea [see the 200 Hz component in Fig. 1(A)]. These low-frequency components that are meant for QDT simulation lead to the observed anomalies that disappear once the offset parameter is set to zero.
Our simulations indicate that the origin of the anomalies in the excitation pattern [i.e., the constant tail, solid line in Fig. 2(A)] is the DC QDT which disappears when the offset parameter is set to zero, or if the excitation pattern is measured through a stimulus-frequency filter or high-pass filter.
Furthermore, we analysed the model's frequency response at the 1 kHz CF site and observed low-frequency bumps that are generated when the quadratic distortion is present. Given that CARFAC as a cascade filterbank is not expected to simulate reflections, the DC and very-low-frequency components caused by quadratic distortion that appear in the outputs of the channels could be legitimately ignored or filtered out at the BM level. These DC and very-low-frequency components will be removed or substantially attenuated by 20 Hz high-pass filters in the later inner hair cell (IHC) stage of the CARFAC model. Our findings suggest that the high-pass filter that is incorporated in the IHC model for this purpose might have been better placed before the BM output, as a crude model of the effect of the pressure reflection from the helicotrema.
From another perspective, it is important to note that the perceptual relevance of very-low-frequency components (e.g., lower than 50 Hz) is generally unknown as frequencies lower than 100 Hz are not included in psychoacoustic studies that aim to assess human auditory filter shapes and excitation patterns (Fletcher, 1940; Moore and Glasberg, 1983). Moreover, BM vibrations are often recorded using laser velocimeters (e.g., Recio et al., 1998), which are generally incapable of capturing very-low-frequency vibrations. Measurements based on heterodyne laser interferometers (e.g., Cooper, 1998; Cooper and Rhode, 1997) do extend down to near DC components and do show QDT responses close to DC near the apex in some other parts of the organ of Corti, if not in the BM displacement itself. While the micromechanical and the perceptual relevance of DC and very-low-frequency components are not well understood, a simple high-pass filter to remove them from the IHC's effective stimulus in the model may not be the most realistic way forward.
4.1 Effects of zeroing the offset parameter on cochlear compression and tuning
One needs to investigate the effect of removing the offset parameter on other features of the model. Figure 4(A) shows the compressive behaviour of the cochlear input/output (I/O) function at the CF site of 1 kHz for input tone intensities from 10 to 100 dB SPL based on the methods used to produce Saremi et al. (2016), Fig. 3(B).
Figure 4(B) shows the cochlear tuning in response to click intensities from 10 to 100 dB pe SPL indicating a decrease in tuning as the intensity increases. The cochlear tuning in Fig. 4(B) is represented by calculating the which decay from approximately 6 to 3 as the click intensity increases from 10 to 100 dB pe SPL. Both Figs. 4(A) and 4(B) demonstrate that removing the offset parameter does not have an effect on the cochlear compression nor on the cochlear frequency selectivity (tuning).
5. Conclusion
The CARFAC model includes an offset parameter to generate QDTs such as those that are observed in healthy mammalian cochleae. We showed that the inclusion of this nonzero parameter is the source of some anomalies in the excitation pattern and the frequency response of the model that are due to DC and very-low-frequency components generated by the QDT mechanism. These low-frequency components are expected to be at least partially canceled by reflection in a transmission line but since CARFAC, as a cascade filterbank, lacks a retrograde propagation mechanism, these components appear at the outputs of apical channels of the model and cause the observed anomalies on the BM level.
Our results demonstrate that setting the offset parameter to zero, disabling the QDT generation in the model, removes the anomalies while having no negative effect on the successful features of the model (cochlear compression and frequency selectivity). Therefore, we conclude that the offset parameter should be zeroed for applications that do not intend to simulate QDTs, and also when the CARFAC model is compared to similar cochlear models that lack such a parameter. Alternatively, the apparent anomalies can be resolved if the BM outputs of the channels pass through an external high-pass filter to crudely simulate the reflections at the helicotrema and effectively filter out the DC and very-low-frequency (typically lower than 50 Hz) components. In this case, more realistic and perceptually-relevant QDTs that appear at higher frequencies [e.g., the 200 Hz QDT shown in Fig. 1(A) here] will be maintained and the model can still simulate them successfully.