Swept-sines provide a tool for fast and high-resolution measurement of evoked otoacoustic emissions. During the measurement, a response to swept-sine(s) is recorded by a probe placed in the ear canal. Otoacoustic emissions can then be extracted by various techniques, e.g., Fourier analysis, the heterodyne method, and the least-square-fitting (LSF) technique. This paper employs a technique originally proposed with exponential swept-sines, which allows for direct emission extraction from the measured intermodulation impulse response. It is shown here that the technique can be used to extract distortion-product otoacoustic emissions (DPOAEs) evoked with two simultaneous swept-sines. For proper extraction of the DPOAE phase, the technique employs previously proposed adjusted formulas for exponential swept-sines generating so-called synchronized swept-sines (SSSs). Here, the SSS technique is verified using responses derived from a numerical solution of a cochlear model and responses measured in human subjects. Although computationally much less demanding, the technique yields comparable results to those obtained by the LSF technique, which has been shown in the literature to be the most noise-robust among the emission extraction methods.
I. INTRODUCTION
Otoacoustic emissions are acoustical signals generated from within the inner ear (cochlea) (Kemp, 1978; Probst , 1991). If the ear is stimulated with at least two tones with near frequencies f1 and f2, interference between the tones generates distortion-product otoacoustic emissions (DPOAEs) due to the non-linear basilar membrane (BM) response (Goldstein, 1967; Johnstone , 1986; Rhode, 1978; Robles and Ruggero, 2001). DPOAE with frequency is called cubic (low-side) DPOAE or cubic difference tone (CDT) DPOAE. DPOAEs may serve as an effective and frequency-specific tool for the diagnosis of hearing loss because their level and their estimated threshold correlate with hearing sensitivity, (e.g., Boege and Janssen, 2002; Gaskill and Brown, 1990; Nelson and Kimberley, 1992). However, the accuracy of hearing threshold prediction based on the DPOAE level or an estimated threshold is affected by the interference between DPOAE components (Dalhoff , 2013; Mauermann and Kollmeier, 2004; Zelle , 2020; Zelle , 2017). DPOAE is composed of two components that differ in their generation mechanism: a nonlinear-distortion component and a coherent-reflection component (Shaffer , 2003; Shera and Guinan, 1999). Interaction between these two components causes a fine structure in distortion-product (DP) gram amplitude. A DP-gram shows the DPOAE amplitude and phase as a function of frequency (Brown , 1996; He and Schmiedt, 1997; Kemp and Brown, 1983).
The nonlinear-distortion and coherent-reflection components of DPOAEs can be separated by various techniques. One way is to present a third tone near the DP frequency, which would suppress the component generated by coherent reflection (Heitmann , 1998; Kemp and Brown, 1983). Another way is to measure the onset of the DPOAE signal, which is not affected by a long-latency (LL), reflected component (Vetešník , 2009; Zelle , 2017). If a DP-gram is measured with sufficient frequency resolution, the nonlinear-distortion and the coherent-reflection component can be separated by the inverse Fourier transform of the DP-gram (Dhar , 2002; Stover , 1996) or by time-frequency filtering methods (Moleti , 2012).
Swept-sines, also called chirps, are a tool for measuring DP-grams with sufficient frequency resolution but within a relatively short time (Choi , 2008; Long , 2008). A signal recorded by an otoacoustic emission (OAE) probe during the measurement must be post hoc analyzed in order to extract the measured emission. Kalluri and Shera (2013) compared three different methods used to extract OAEs: a digital heterodyne method (Choi , 2008), a method employing Fourier analysis (Kalluri and Shera, 2001), and a modeling technique using least-square fitting (LSF) (Long , 2008). Although more computationally demanding than, e.g., Fourier analysis, the LSF technique has been shown to outperform the other techniques due to its noise robustness (Kalluri and Shera, 2013).
In this paper, we present another method allowing for DPOAE extraction. The method is based on synchronized swept-sines (SSSs), introduced by Novak (2015) for analysis of nonlinear systems. SSSs are a special type of exponential, or sometimes called logarithmic, swept-sine signals (Novak , 2015). The SSS technique can be used for the analysis of nonlinear systems in terms of block-oriented models, e.g., Generalized Hammerstein models, or Diagonal Volterra Series (Novak , 2010). However, its main advantage consists in separating the frequency-dependent higher harmonics from each other. The technique of Novak (2015) is adapted here for the estimation of intermodulation DPs. DPOAEs are intermodulation DPs, and the current paper presents the synchronized-swept sine (SSS) technique for extracting them. The SSS technique is easy to implement, computationally inexpensive, and is not dependent on the type of nonlinearity in the system. All this, together with the ability of the technique to separate DPOAE components of different latencies, makes the SSS technique a promising tool for use during DPOAE measurements. The paper shows that in terms of noise robustness, the SSS technique is comparable with the LSF technique.
The paper is organized as follows. In Sec. II the theoretical background of the application of the SSS for estimating DPOAE is provided. The SSS technique is then combined with a windowing method for background noise reduction and DPOAE component separation and is verified on simulations (Sec. III). Then for experiments (Sec. IV), a method for sound artifact rejection is presented which was designed to be used with the SSS technique and the windowing method. A summary of the advantages of the proposed method is discussed next in Sec. V. All the scripts, including the simulated and experimental responses to SSSs and implementation of the cochlear model that is used can be downloaded from https://gitlab.fel.cvut.cz/vencovac/Prj04_OAEsweptsine_measurement_public.
II. SSS FOR DPOAE
In recent decades, the use of swept-sine signals has proven to be highly effective for the analysis and identification of nonlinear systems (Farina, 2000). The recently developed SSS signal (Novak , 2015) has unique properties that enable quick analysis of the amplitude and the phase of frequency-dependent DPs. While the method is widely used for analysing higher harmonics, it can be easily adapted to intermodulation products. The remainder of this section introduces the theoretical background of adapting the SSS for DPOAE.
A. SSS
B. Frequency-dependent harmonic distortion
When a pure sine wave of frequency f0 is used as the input to a nonlinear system, higher harmonics may appear at the output of the system as multiples of the input frequency f0, as shown in Fig. 1(A). Each of these harmonics has an amplitude Am and a phase , m indicating the index of the harmonic. In addition, these harmonics can be frequency-dependent, resulting in Higher Harmonic Frequency Responses (HHFR) . The corresponding Higher Harmonic Impulse Responses (HHIR) are , with being the inverse Fourier transform.1
C. Frequency dependent intermodulation distortion
Each of these frequency components (harmonic or intermodulation) has its amplitude and its phase , and can be frequency dependent. As in the previous case, we use the notation and .
Note that the previously noted derivation theory uses as reference the frequency f1 and, therefore, the first component of the SSS . On the other hand, the derivation could also be done for the second component f2, or for any other frequency component.
D. DPOAE extraction from the swept-sine measurement
The SSS technique described previously and applied to DPOAE measurement can be summarized in the following steps. First, the parameters of the two-component SSS, i.e., the start and stop frequencies and and the time duration T of the first component, and coefficient α for the second component, are chosen. The input signal, consisting of the sum of both swept-sines, is generated using Eqs. (12)–(14) and (15).
E. Effect of the sweep rate
III. SIMULATION VERIFICATION
This section extends the SSS technique for DPOAEs with a windowing method allowing for the extraction of DP-gram components and background noise suppression. In this section, DPOAEs derived from a cochlear model are presented. The cochlear model is used because it allows for the accurate separation of nonlinear-distortion and coherent-reflection DPOAE components. In addition, the cochlear model that is used generates the coherent-reflection component of DPOAEs with roughly similar latencies to those observed experimentally, e.g., in Moleti (2012) (compare their Fig. 7 with our Fig. 4). Section IV uses the same windowing method for DPOAEs measured in normally hearing human subjects.
A. Separation of DPOAE components
Figure 3(A) depicts a DP-gram (CDT at frequency shown in Fig. 4) in the temporal domain (ImIR). Most of the signal energy is located near the zero time delay, but we can also identify a LL component with most energy between 2 and 10 ms. The mechanism of nonlinear distortion generates a DPOAE component whose phase changes slowly with frequency—a short-latency (SL) component—and the mechanism of coherent reflection (due to mechanical irregularities) generates a DPOAE component whose phase changes rapidly with frequency—a LL component (Shera and Guinan, 1999). Therefore, both components can be separated from the DP-gram (Dhar , 2002; Kalluri and Shera, 2001; Knight and Kemp, 2001; Konrad-Martin , 2001; Stover , 1996). Because the SSS technique first calculates the DP-gram in the time domain—the ImIR calculated with Eq. (20)—suitable windows can be applied to separate the SL and LL components of the DP-gram before it is transformed into the frequency domain.
The latency of OAEs evoked due to reflection of forward traveling waves by localized irregularities in the micromechanics of the organ of Corti is frequency dependent; it shortens as the frequency increases (Bergevin , 2012; Kemp, 1978; Moleti , 2012; Shera and Bergevin, 2012; Shera and Guinan, 2003). For separation of DP-gram components, it is useful to employ frequency-dependent window duration, i.e., to shorten the window duration as the frequency increases. The advantage of shorter windows is greater suppression of the background noise which contaminates experimental data. Another advantage of the frequency-dependent window is that it can remove multiple internal reflections (Dhar , 2002; Shera and Zweig, 1991).
-
For each row in the matrix , set the samples which lie out of the specific interval to – empty set symbol denoting outliers or, as Fang and Liu (2022) suggested, “a don't care condition”, i.e., do not take these samples for the final averaging.
-
If then for which
-
Else if then for which
-
Else for which and
-
- Calculate mean across the rows (column-wise mean) of matrix ,
The entire DP-gram—the nonlinear-distortion (short latency) component of the DP-gram, and the coherent-reflection (LL component) of the DP-gram—is depicted in Fig. 4. Figure 4 shows very good agreement between the DP-gram components derived by the presented SSS technique and the DP-gram components derived from the steady-state responses using a cochlear model with roughness (impedance irregularities) and a smooth cochlear model which allows for a perfect decomposition of the nonlinear-distortion DPOAE component and the coherent-reflection DPOAE component.
IV. EXPERIMENTAL VERIFICATION
Section III expanded the SSS technique with a windowing method allowing for extraction of DP-gram components and larger suppression of background noise. This adapted SSS technique is used in this section to extract DP-grams from the swept-sine responses recorded in normally hearing human subjects. Because the OAE recordings may be affected by various forms of excessive noises, e.g., due to subject swallowing, this section extends the SSS technique by an artifact rejection method adapted from the method presented in Fang and Liu (2022).
A. Methods
1. Subjects
DPOAEs were measured in four normally hearing subjects. Their pure tone hearing thresholds were within the range of 20 dB re hearing level (HL) for frequencies between 0.125 and 8 kHz. The age of the subjects ranged between 22 and 24, with a median of 23.
2. Stimuli and data acquisition
DPOAEs were measured with SSSs of various stimulus levels ( dB SPL, and dB SPL, or dB SPL and dB SPL), the frequency ratio between the stimuli , and f2 tone swept at a rate of 0.5 oct/s. between 1 kHz and 12 kHz (the same frequency range as in the simulations in Sec. III). The onset and offset of the swept-sines were shaped with 20-ms long raised-cosine ramps.
All measurements were made in an audiological booth using custom software written in matlab. Sound signals were generated in a computer and were presented by an RME Fireface UCX sound card (RME Audio, Haihausen, Germany) connected to an Etymotic ER10C probe (Lucid Hearing Holding Company, LLC, Fort Worth, TX). The probe was calibrated inside the ear canal of the subjects before each measurement. To reduce the noise floor (NF), the swept-sines were presented repeatedly. The measurement was stopped when the DP-gram and the estimated background noise were almost unchanging. This criterion required 23 repetitions for subject s013, 23 repetitions for subject s014, 20 repetitions for subject s015, and 20 repetitions for subject s17. This means that, on average, the measurement required about 2 min and 40 s, including a few hundred ms long pauses between signal presentations. The experiment was conducted under the permission of the Ethics Committee of the Czech Technical University in Prague.
3. DPOAE extraction
DPOAEs were extracted by the SSS technique and by the LSF technique (Long , 2008),5 which served here as a reference. The LSF technique was used with “optimal” parameters for the given frequency range, as suggested in Abdala (2015) to be: 125 ms (5512 samples for 44.1 kHz sampling frequency) time window for a 0.5 oct/s. sweep for the 1–4 kHz range. In addition, because the LSF technique can smooth the DPOAE fine-structure—separate the nonlinear-distortion component—if a longer analysis window is used, we also extracted DP-grams with 500 ms time window (22 050 samples for 44.1 kHz sampling frequency), as recommended in Abdala (2015). For both window lengths (125 and 500 ms), the windows were shifted with a step of 200 samples (4.5 ms for 44.1 kHz sampling frequency).
The SSS technique was used together with the windowing method presented in Sec. III. The SSS technique extracted the entire DP-gram and the nonlinear-distortion component of the DP-gram.
4. Artifact rejection
To reduce the background noise in the experimental data, several (approximately 20) presentations of the swept-sine stimuli were needed for averaging. Some sound artifacts can be relatively pronounced but affect only a small number of samples, e.g., artifacts due to swallowing. Because the SSS technique processes the entire response, we decided not to reject the entire response contaminated with artifacts. Instead, we adapted the “point-wise artifact rejection method” presented by Fang and Liu (2022) for transient-evoked OAEs. This method rejects only those samples in the response that are affected by a pronounced sound artifact. We had to adapt the method slightly because, in addition to the OAE signal and the noise, our records also contain the evoking stimulus. The adapted technique is described in the following. The same artifact rejection method is also used for DP-grams extracted by the LSF technique and presented in this paper.
The adapted point-wise rejection method can then be described in several steps:
-
Initialize , and .
-
For all and if then and , where the empty set symbol indicates that this sample is not taken into account in the final averaging of the and matrices; Fang and Liu (2022) states that the symbol “denotes a don't-care condition.”
- The final one-dimensional set of samples with an averaged signal response and an averaged noise response (estimated background noise) is calculated by averaging and across the rows (columns-wise mean) of the matrices, namely,
where . is then used to calculate the DP-gram of the experimental data presented in this paper, and is used to estimate the background noise. To estimate the background noise, the SSS or LSF techniques are applied on the averaged noise signal .
B. Results
Figures 5–8 depict DP-grams measured in four subjects using swept-sines (0.5 oct/s.) of various levels indicated in the figure captions, and f2 swept between 1 and 12 kHz. The figures compare DPOAEs extracted by the SSS technique and DPOAEs extracted by the LSF technique. The data were chosen to cover various conditions: DP-grams with a less pronounced fine structure due to interaction between nonlinear-distortion and coherent-reflection components (Figs. 5 and 6), a DP-gram with a pronounced fine structure (Fig. 7), and a DP-gram with a large background noise level, leading to a small DPOAE to background noise ratio (Fig. 8). Figures 5(A)–8(A) depict entire DP-grams, including the nonlinear-distortion (short latency, SL) component and the coherent-reflection (LL) component. These DP-grams were extracted by the SSS technique with frequency-dependent windows (see Sec. III), and by the LSF technique with a 125-ms long time window (recommended by Abdala , 2015 for 0.5 oct/s. swept-sines). To demonstrate the ability of the SSS technique to extract the nonlinear-distortion component of a DP-gram (the SL component), Figs. 5(B)–8(B) in the figures compare DP-grams derived by the SSS technique with short, frequency-dependent windows, and by the LSF technique for 500-ms frames. In addition, Figs. 5(B)–8(B) also depict the LL component of the DP-grams, which should be generated due to coherent reflection. This component is obtained as a vector subtraction of the SL component depicted in Figs. 5(B)–8(B) from the entire estimated DP-grams depicted in Figs. 5(A)–8(A). The agreement between the LL components estimated by both techniques is very good. The largest discrepancies are visible in Fig. 8, where the estimated LL component is very close to the estimated NF.
The agreement between the DP-grams derived by the SSS technique and by the LSF technique is very good under the currently presented conditions. There seems to be slightly better agreement between the DP-gram phases, except in Fig. 7(A)−2, where the pronounced notches in the DP-gram amplitude for the SSS technique cause the unwrapped phases to depart by 1 cycle from the DPOAE phase derived by the LSF technique. However, the general trend in the DP-gram phase is the same for both extraction techniques.
Figures 5–8 show that the LSF technique yields a shallower fine structure of the DP-gram amplitude, in comparison with the SSS technique. Figure 7 in Abdala (2015) shows that the chosen analysis window (frame) duration affects the fine structure of the DP-gram. If we were to choose shorter frames than 125-ms suggested in Abdala (2015), we would get a deeper fine structure in the DP-grams derived by the LSF technique. Therefore, the cause of the discrepancy between the LSF and SSS techniques is the chosen analysis window duration for the LSF technique and parameter a = 0.05 s. for the SSS technique.
V. DISCUSSION AND CONCLUSION
This paper has presented an approach for extracting DPOAEs from the ear canal responses evoked with SSS (Novak , 2015). The paper is composed of three parts. The first part (Sec. II) presents the theory describing how the SSS technique extracts intermodulation DPs from the responses to two simultaneous swept-sines. This technique could also be useful for other possible applications focused on intermodulation DPs, not only for CDT DPOAEs at as shown in this paper. The technique can be easily adapted for any intermodulation DPs generated by a system. The remaining two parts of the paper (Secs. III and IV) then extend the SSS technique with methods that we designed in order to allow for separation of the DP-gram component and for suppressing the background noise (Sec. III) and for rejecting sound artifacts during a measurement (Sec. IV). The reader can come up with different methods and adapt the SSS technique based on his/her needs.
As designed in this paper, the SSS technique estimates DP-grams with similar accuracy as the LSF technique, which has been suggested to be the most noise robust (Kalluri and Shera, 2013). The SSS technique is computationally inexpensive. The measured DP-gram can therefore be calculated during the measurement and presented to the experimenter almost instantaneously. In addition, because the DP-gram is also available in the time domain during the calculation, nonlinear distortion and coherent-reflection components of DPOAE can be extracted by temporal windows and provided to the experimenter during the measurement. The experimenter can thus decide which number of stimulus repetitions could be adequate for DP-gram measurement. To conclude, the presented SSS technique is not suggested to be superior to other DPOAE extraction techniques (e.g., the LSF technique, the heterodyne technique, a technique based on the Fourier transform), which can still be used post hoc for verification and data analysis. The time efficiency of the SSS technique makes it suitable for data presentation during a measurement and, therefore, for example, in clinical equipment used for hearing loss assessment based on a DP-gram.
A. Application of SSS for DPOAE measurement
A DPOAE signal recorded in the ear canal is a weak acoustical signal with a level close to the background NF in quiet. To increase the noise robustness of the SSS technique, it is useful to multiply the calculated ImIR for the DP component with a suitable window that suppresses the noise in the time samples outside of the required time interval. In addition to suppressing background noise, multiplication of the ImIR with a temporal window allows for the decomposition of DPOAE components while they are being measured (see Fig. 4). DPOAE at (and possibly also other low-side DPs) is generated by at least two sources: a source due to inter-modulation distortion generating DP wavelets which travel backward into the cochlea-middle-ear boundary, and DP wavelets which travel forward toward the DP tonotopic place. At the DP tonotopic place, these wavelets are partly reflected by impedance irregularities, which are the second source of DPOAEs (Shera and Guinan, 1999).6 The DPOAE component evoked with the first source (also called the primary or nonlinear-distortion component) has a short latency, whereas the source due to coherent reflection (also called the secondary source of DPOAEs) has a long latency, which decreases as a function of frequency (Kalluri and Shera, 2001; Moleti , 2012; Shera and Guinan, 1999).
We have solved the issue of background noise suppression and component separation by using temporal windows with frequency-dependent parameters (see Sec. III). For the application of a frequency-dependent window, we can either separate the response into shorter parts or process the entire response for a set of windows and select only a subset of frequency samples from each set based on the frequency region for which the specific window was constructed. As described in Sec. III, we chose the latter approach using a set of windows. We saw an advantage in this approach because the separation of the swept-sine response into shorter time frames caused pronounced ringing in the DP-gram amplitude if the short temporal windows needed for extraction of SL DP-gram component were used (data not shown).
The accuracy of the extraction method is verified using a nonlinear cochlear model (Sec. III) and for experimental data by comparison with DP-grams extracted by the LSF technique (Long , 2008). The LSF technique also allows for DP-gram component extraction and background noise suppression (Abdala , 2015). The technique fits an assumed DPOAE signal into the swept-sine response in the time-domain. The fitting is done within a temporal frame of fixed duration. For sines swept exponentially at a rate of 0.5 oct/s., the 125-ms window was suggested for the entire DP-gram including short and long latency components, and a 500-ms window was suggested for the SL component only (Abdala , 2015). The use of a fixed-duration window for responses obtained with exponentially swept-sines is equivalent to the use of frequency-dependent windows with the SSS technique. As the temporal window shifts across the swept-sine response, a larger frequency range falls into the windowed response because the frequency of the swept-sine increases exponentially with time. However, we should mention that the use of the fixed window duration in the LSF technique is more elegant than the method suggested in this paper in Sec. III, which requires post hoc construction of the final DP-gram.
The coherent-reflection (LL) component of DP-grams estimated by the SSS or LSF techniques can be extracted as a vector subtraction of the nonlinear-distortion component from the entire DP-gram, as we did in Figs. 4–8. For the coherent-reflection component, Figs. 5–8 show larger discrepancies between the LSF and SSS techniques than for the nonlinear-distortion component, which may, especially in Fig. 8, be due to the large NF relative to the level of the coherent-reflection component. For practical use, we think the SSS technique is suitable for extracting the nonlinear-distortion (short latency) component of a DP-gram, for which the use of a short temporal window decreases the NF. However, for an analysis of LL components, we would advise the use of time-frequency filtering techniques based on wavelets (Bergevin , 2012; Moleti , 2012).
B. Artifact rejection
As designed in Secs. II and III, the SSS technique processes the entire response to a swept-sine, which may be up to several seconds long based on the sweep rate (0.5 oct/s. used in the present paper) and the frequency range (f2 swept from 1 kHz to 12 kHz in the present paper). Relatively common measurement artifacts, caused, e.g., by subject movement or by swallowing, usually contaminate only a small number of adjacent samples of the response. An advantage of the LSF technique is that it processes the response in temporal frames, which allows for the detection of measurement artifacts in these frames. The affected frames can then be abandoned, but the rest of the response can be kept for processing. The method suggested in Sec. III for the SSS technique processes the entire response to swept-sine stimuli. However, Sec. IV adapted the method of Fang and Liu (2022) designed for transient-evoked OAEs. This adapted method detects sound artifacts in the swept-sine response and abandons them. The method can be used with any DP-gram extraction technique.
C. Sweep rate limit
The SSS technique calculates impulse responses at the frequencies of the input tones and their intermodulation products [see Fig. 2(C)]. As the sweep rate increases, the time difference between the adjacent impulse responses decreases, and for fast sweep rates, the impulse responses can overlap. Equations (24) and (23) show that the time difference depends only on the sweep rate and the frequency ratio between the tones ( ). For upward swept-sines, the impulse response for the f1 tone approaches the impulse response of the CDT tone. If we neglect the effect of the probe transducer on the duration of the f1 impulse response and focus only on the delay of OAEs evoked with a single tone, we can for suggest that the upper limit for the swept-sine rate is about 10 oct/s., which gives a time difference between the impulse responses of about 32 ms. We assume that most of the LL components in OAEs are within 32 ms (Moleti , 2012). However, this value is frequency-dependent and can increase if there are significant higher order reflections. On the other hand, preserving the measurement NF for higher sweep rates “requires a compensating increase in the number of sweeps presented and averaged” (Abdala , 2015). Therefore, it is questionable whether the speed near 10 oct/s. is useful for DPOAE measurement. Lower sweep rates up to about 5 oct/s. are for sure applicable with the presented SSS technique.
D. Efficiency
The SSS technique presented in Sec. II applies the fast Fourier transform on the entire swept-sine response, and performs frequency domain deconvolution and the inverse fast Fourier transform [Eq. (22)]. Then the technique extracts the ImIR and applies a window and performs the fast Fourier transform. This is in fact the total required computation. The windowing method added into the technique in Sec. III increases the computational time by the needed repetition of the last step for a set of windows (seven windows times 2 for the frequency range used in the paper from 1 to 12 kHz). Even this complication does not complicate real-time use of the technique implemented in matlab or in another interpreted programming language.
In comparison, the LSF technique processes the response in time frames. For the stimulus parameters that are used: 0.5 oct/s. sweep rate and f2 ranged between 1 and 12 kHz and a 125-ms window for the entire DP-gram extraction, 56 fittings have to be performed if the adjacent time frames were not overlapped. In reality, overlapping is often needed to achieve good frequency resolution. The LSF technique is, therefore, much more computationally demanding than the SSS technique.
To summarize, the SSS technique provides an easy to implement, fast, accurate, and noise robust method for DPOAE estimation. Because the method is not computationally expensive, it can be used during a measurement to provide feedback to the experimenter. The method could be implemented in the OAE measurement systems used in clinics.
ACKNOWLEDGMENTS
We would like to thank two anonymous reviewers for their helpful comments on the manuscript. This work was supported by the project 23-07621J of the Czech Science Foundation (GAČR), internal grant of the Czech Technical University in Prague SGS23/185/OHK3/3T/13, and by European Regional Development Fund-Project “Center for Advanced Applied Science” (Grant No. CZ.02.1.01/0.0/0.0/16_019/0000778). Access to computing and storage facilities owned by parties and projects contributing to the National Grid Infrastructure MetaCentrum provided under the Projects of Large Research, Development, and Innovations Infrastructures programme (CESNET LM2015042) is greatly appreciated.
APPENDIX: COCHLEAR MODEL
Due to the sigmoidal function , the model is nonlinear. The model parameters were set to work in the range of levels that are common for mammalian cochlea; nonlinearity in the input/output function of the simulated BM displacement is reached for levels above about 30 dB SPL (see Fig. 1 from Vencovský , 2019). The gain of the model is about 50 dB at frequencies between 1 and 5.5 kHz, which might be assumed to simulate normal-hearing cochlea at least in that frequency region. The cochlear model is coupled with a middle ear model. Therefore, the OAEs can be derived from the model as pressure changes at the ear drum. However, the model was not calibrated to predict BM displacement in physically correct units. Hence, the OAEs are expressed in arbitrary units (a.u.).
The model is implemented in matlab. All simulations in the presented paper were done numerically with an explicit Runge-Kutta (4,5) integration algorithm for 600 kHz sampling frequency. The model was composed of 800 segments.
As the terms impulse response and frequency response are associated with linear system theory, we use the prefix “Higher Harmonic” or “Intermodulation” in front of the terms “impulse response” and “frequency response” to distinguish between the entire measured impulse response using linear system theory (with no prefix) and the products appearing in the measured impulse response due to nonlinear systems (with a prefix).
In contrast to linear system theory, such an impulsive response is level dependent. However, because linear system theory is used to obtain it, the term “impulse responses” is commonly used with the swept-sine technique for non-linear systems.
Shera and Zweig (1993) presents equations for a recursive exponential window. The window of the nth order is defined by , where τ is time, is the window length (cutoff time), is a recursively defined function and . , where and . With these parameters, the window has a maximum value of 1 at τ = 0 and the value of at (Kalluri and Shera, 2001). The windows used in this paper have the order n = 10 and latencies given by Eq. (26) for second or set to 1 ms for t < 0; see Fig. 3(A). For positive τ, represents one half of a recursive exponential window decreasing in amplitude from 1 to 0. The windows depicted in Fig. 3(A) are constructed by mirroring the samples for and concatenating them with samples for , where τc is given by Eq. (26). This process creates the entire (asymmetrical) windows used in this paper.
Equations for the SSS technique are presented in Sec. II for the f1 tone at 0 time, which is useful because is adjacent to the f1 component. This setting could be changed without any effect on the accuracy of the technique. On the other hand, DP-grams are often depicted with f2 frequency on the x-axis, because the assumed generation region for DPOAEs is near the f2 best frequency place.
matlab implementation of the LSF technique available in the OAETOOLBOX (OAE) is used in this paper. See https://gitlab.com/simonhenin/oaetoolbox/.
A recent paper by Vetešník (2022) presented an additional source of DPOAEs due to perturbation of the nonlinear force in the generation region of DPOAEs. However, this source was shown to have a long latency, comparable to the latency of the coherent-reflection source of DPOAEs. Therefore, the currently presented windowing technique would combine the DP-gram component evoked due to the coherent-reflection source and the DP-gram component evoked due to perturbation of the nonlinear force.
The symbol was chosen because we wanted to keep the notation x for the longitudinal position along the BM, but we wanted to distinguish between and x used as a symbol for the input signal in Sec. II.