Distortion product otoacoustic emissions (DPOAEs) are evoked by two stimulus tones with frequency $f1$ and $f2$ of ratio $f2/f1\u2009$ in the range between approximately 1.05 and 1.4. This study theoretically and experimentally analyzes the cubic $2f1\u2212f2$ DPOAE for different stimulus levels of one of the tones while the other is constant. Simulations for $f2/f1$ of 1.2 and moderate stimulus levels (30–70 dB sound pressure level) indicate that cubic distortion products are generated along a relatively large length of the basilar membrane, the extent of which increases with stimulus level. However, apical from the place of maximum nonlinear force, the wavelets generated by these distributed sources mutually cancel. Therefore, although the spatial extent of the primary DPOAE sources broadens with increasing stimulus level (up to 1.5 oct), the basilar-membrane region contributing to the DPOAE signal is relatively narrow (0.6 oct) and level independent. The observed dependence of DPOAE amplitude on stimulus level can be well-approximated by a point source at the basilar-membrane place where the largest distortion product (maximum of the nonlinear force) is generated. Onset and offset of the DPOAE signal may contain amplitude overshoots (complexities), which are in most cases asymmetrical. Two-tone suppression was identified as the main cause of these onset and offset complexities. DPOAE measurements in two normal-hearing subjects support the level dependence of the steady-state DPOAE amplitude and the asymmetry in the onset and offset responses predicted by the theoretical analysis.

## I. INTRODUCTION

Otoacoustic emissions (OAEs) are acoustic signals generated in the cochlea either spontaneously or evoked by an external stimulus (Shera, 2004). They are a by-product of the electromechanical force produced by the soma of the outer hair cells (OHCs), which in turn is the basis of cochlear amplification (Dong and Olson, 2013). These signals are transmitted backward through the middle ear and are recorded as a pressure signal in the ear canal. Evoked OAEs are generated by two different mechanisms: (1) distortion due to the nonlinearity of the mechanoelectrical transducer in the OHC stereocilia and (2) coherent reflection of mechanical waves traveling along the basilar membrane (BM) (Zweig and Shera, 1995; Shera, 2004; Shera and Guinan, 2008). The level of the OAEs generated by both mechanisms is affected—usually decreasing—if cochlear amplification is reduced (Gorga *et al.*, 1997; Kemp, 1978). Therefore, OAEs are routinely used clinically to estimate the functional status of the cochlear amplifier (Blankenship *et al.*, 2018).

OAEs evoked by two simultaneously presented pure tones, with frequencies usually denoted as $f1\u2009$ and $f2$, are called distortion product OAEs (DPOAEs). The largest DPOAE is the lower sideband, cubic component at $2f1\u2212f2$. The evoked emission, generated due to the nonlinearity of the mechanoelectrical transducers close to the place of maximum overlap of the two traveling waves produced by $f1$ and $f2$, is not only transmitted retrograde to the ear canal, but also stimulates a wave traveling anterograde towards the tonotopic place of $2f1\u2212f2$. In the case of local perturbations of mechanical properties near the tonotopic place of $2f1\u2212f2$, this anterograde traveling wave can be coherently reflected back along the BM and add constructively or destructively to the initial retrograde traveling wave (Zweig and Shera, 1995; Shera and Guinan, 1999). Therefore, the overall DPOAE can be thought of as being composed of both the nonlinear-distortion and the coherent-reflection components. Conventionally, the generators of these two components are called the primary and the secondary sources, respectively.

The nonlinear distortion component of DPOAEs does not originate from a discrete point-like source. Instead, the primary source is spread along a region of the BM. Specifically, the nonlinear-distortion component is a superposition of distributed primary sources whose phase and amplitude may be different (Shera and Guinan, 2008). The primary-source region is located at and basal to the tonotopic place of $f2$, which is where the traveling-wave envelopes evoked by both tones overlap maximally (Kim *et al.*, 1980; Young *et al.*, 2012). Cancelation between these sources due to phase differences may cause a decrease of DPOAE amplitude (Martin *et al.*, 2010, 2013; Mills, 1997; Young *et al.*, 2012). Martin *et al.* (2013) measured DPOAEs in rabbits and presented complexities (amplitude overshoots) in the onset and offset of the time-domain signal. Since the complexities disappeared and the steady-state amplitude of DPOAEs increased after adding a third tone with frequency above $f2$, they proposed that the more basal sources of DPOAEs add in opposite phase to the sources near $f2$. They showed this phenomenon for a small ratio of $f2/f1$ (1.07), and also for a larger ratio of 1.25, but using relatively high stimulus levels [$L1=L2=65\u201375$ dB sound pressure level (SPL)]. It was suggested that mutual cancelation of wavelets leads to a decrease of the amplitude of DPOAEs for $f2/f1$ ratios approaching 1. This suggestion agrees with modeling studies using nonlinear cochlear models (van Hengel, 1996; Talmadge *et al.*, 1998; Shera, 2003; Liu and Neely, 2010; Sisto *et al.*, 2018). On the other hand, the decrease of DPOAE amplitude for frequency ratios less than about 1.2 could be also explained by filtering by the tectorial membrane (TM) (Allen and Fahey, 1993).

Another phenomenon which may play a role in the reduction of DPOAE amplitude is two-tone suppression (Robles and Ruggero, 2001). Kanis and de Boer (1997) suggested that the DPOAE amplitude may decrease because the stimulus tones mutually suppress if their frequency separation is small $(f2/f1\u22481.05)$; also see Fahey *et al.* (2006). Two-tone suppression can partially explain the complexities in the onset and offset of DPOAE signals, as demonstrated with cochlear models (Gummer *et al.*, 2018; Vencovský and Vetešník, 2018b) combined with experimental measurements (Zelle *et al.*, 2018) from human subjects. Kummer *et al.* (2000) suggested that two-tone suppression can cause the non-monotonic growth of DPOAE input/output functions observed if one of the stimulus levels is held constant and the other is increased (Johnson *et al.*, 2006). In contrast, Sisto *et al.* (2018) hypothesized that the non-monotonic DPOAE level growth may be due to mutual cancelation of wavelets from the primary DPOAE source.

The aim of this study is to elucidate mechanisms responsible for the nonlinear-distortion component of DPOAEs up to moderate stimulus levels for $f2/f1=1.2$. Simulations of a two-dimensional nonlinear cochlear box model are compared with experimental results measured in normal-hearing human subjects. Previous modeling studies revealed how the primary and secondary DPOAE sources affect both the steady-state DPOAE amplitude and also the envelope of the DPOAE signal (Talmadge *et al.*, 1998; Talmadge *et al.*, 1999; Tubis *et al.*, 2000). Here, we focus on how the distributed nature of the primary source affects the steady-state DPOAE amplitude as well as the onset and offset of the DPOAE signal.

## II. METHODS

This section describes a hydrodynamical box model of the cochlea (Sec. II A) and an approximate analytical solution of the model at the angular frequency of the lower cubic distortion product (Sec. II B). The analytical solution is proportional to the DPOAE signal in the ear canal. This solution provides a tool for elucidating how DP wavelets generated along the BM might contribute to the nonlinear-distortion component of the DPOAEs. Two simulation techniques are used: (1) numerical solution of the box-model equations (Sec. II A) which provides pressure in the ear canal from which the DPOAE signal (the nonlinear-distortion component in the case of a “smooth” model) can be extracted, and (2) analytical approximation of the solution of the box-model equations (Sec. II B), which allows specification of the primary DPOAE source and hence analysis of the distributed nature of this source. The simulation techniques are described in Sec. II C and the experimental paradigm is presented in Sec. II D. If only interested in the results, the reader may skip most of the modeling details. However, to comprehend the results, the reader is advised to consult Eq. (16), which describes how the nonlinear-distortion component of the DPOAE results from the summation of distributed primary sources of unique phases. In this description, the summation is achieved by a cumulative integration.

### A. 2D cochlea box model

The 2D cochlear model used in this study was formulated in Sondhi (1978), Allen and Sondhi (1979), and further developed in Vetešník and Nobili (2006), Vetešník and Gummer (2012). The model is supplemented with a 1D middle-ear model adapted from Talmadge *et al.* (1998); the equations are given in the Appendix. The cochlear model assumes that the cochlea has the shape of a box. The box is divided by the BM with the top and bottom chambers filled with incompressible fluid. The BM is modeled as a continuous array of harmonic oscillators with mass $m(x)$, damping $h(x)$, and stiffness $k(x)$, per unit BM length, where $x$ is measured from the basal end of the BM. Parameter values of the model are given in Table I and the most commonly used symbols in the model are listed in Table II. The main assumptions underlying the model are described in Mammano and Nobili (1993) and Nobili *et al.* (1998). The BM oscillators are coupled longitudinally by hydrodynamical forces and by shearing resistance between the adjacent BM oscillators. The input signal is fed into the model via vibrations of the stapes, which are also coupled to the BM oscillators by hydrodynamical forces. In the model, cochlear amplification derives from the OHC electromechanical force acting in phase with BM velocity and, therefore, against viscous forces acting on the BM, as found experimentally by Dong and Olson (2013). This phase relation is achieved by a second array of oscillators simulating the shearing displacement between the reticular lamina (RL) and the TM. The model is, therefore, based on the amplification mechanism postulated by Gummer *et al.* (1996).

Parameter . | Value/Definition . | Description . |
---|---|---|

$\rho $ | 1 (g/cm^{3}) | Water density^{a} |

$L$ | 3.5 (cm) | BM length^{a} |

$D$ | 0.1 (cm) | Cochlear-duct height^{a} |

$wBM$ | 1 (cm) | BM width^{a} |

$fBM(x)$ | $A10\alpha (1\u2212x/L)\u2212Akg$ (1/s) $\u2009$ | Greenwood's tonotopic map for human BM^{b} |

$A=165.4$ (1/s), $\alpha =2.1$, $kg=0.88$ | ||

$m$ | $1\xd710\u22124$ (g/cm) | BM mass per unit length |

$k(x)$ | $(\nu D/3+m){2\pi fBM(x)}2$ [g/(cm s^{2})] | BM stiffness per unit length |

$\nu =1$ (g/cm^{2}) | ||

$h(x)$ | $h0(x)+10h0(L)\u2009exp\u2009[\u22123.5(1\u2212x/L)]+880h0(L)\u2009exp\u2009[\u221218.2(1\u2212x/L)]$ [g/(cm s)] | BM damping per unit length^{c} |

$\u2009h0(x)=6\xd7103k(x)k(0)$ [g/(cm s)] | ||

$s(x)$ | $20.9$ [(g cm)/s] | Shearing viscosity coefficient |

$\omega TM(x)$ | $2\pi fTM(x)$ (rad/s), | TM resonance frequency |

$fTM(x)=max{fBM(x+x0),5}$ (1/s) | ||

$x0=0.45$ (cm) | ||

$\gamma TM(x)$ | $2\zeta TM(x)\omega TM(x)$ (rad/s) | TM damping^{d} |

$\zeta TM(x)=21.5/fTM(x)$ | ||

$\Gamma mi$ | 1.4 | Incus lever ratio^{e} |

$Sow$ | $wBMD=0.1\u2009$($cm2$) | Oval-window area |

$Sty$ | 0.49 ($cm2$) | Tympanic-membrane effective area^{e} |

$Pea$ | $\rho aca2=1.42\xd7106$ $(dyn/cm2)$ | Adiabatic bulk modulus of air^{e} |

$\gamma air$ | 1.4 | Ratio of specific heats of air^{e} |

$Ve$ | 0.5 $(cm3)$ | Outer-ear volume^{e} |

$Gme$ | $Sty/Sow\Gamma mi=6.86$ | Mechanical gain of ossicles^{e} |

$mme$ | 0.0531 (g) | Effective mass of oval window and ossicles |

$hme$ | 1298 (g/s) | Effective damping of middle ear |

$kme$ | $6.813\xd7106$ ($g/s2$) | Effective stiffness of middle ear and tympanic cavity |

Parameter . | Value/Definition . | Description . |
---|---|---|

$\rho $ | 1 (g/cm^{3}) | Water density^{a} |

$L$ | 3.5 (cm) | BM length^{a} |

$D$ | 0.1 (cm) | Cochlear-duct height^{a} |

$wBM$ | 1 (cm) | BM width^{a} |

$fBM(x)$ | $A10\alpha (1\u2212x/L)\u2212Akg$ (1/s) $\u2009$ | Greenwood's tonotopic map for human BM^{b} |

$A=165.4$ (1/s), $\alpha =2.1$, $kg=0.88$ | ||

$m$ | $1\xd710\u22124$ (g/cm) | BM mass per unit length |

$k(x)$ | $(\nu D/3+m){2\pi fBM(x)}2$ [g/(cm s^{2})] | BM stiffness per unit length |

$\nu =1$ (g/cm^{2}) | ||

$h(x)$ | $h0(x)+10h0(L)\u2009exp\u2009[\u22123.5(1\u2212x/L)]+880h0(L)\u2009exp\u2009[\u221218.2(1\u2212x/L)]$ [g/(cm s)] | BM damping per unit length^{c} |

$\u2009h0(x)=6\xd7103k(x)k(0)$ [g/(cm s)] | ||

$s(x)$ | $20.9$ [(g cm)/s] | Shearing viscosity coefficient |

$\omega TM(x)$ | $2\pi fTM(x)$ (rad/s), | TM resonance frequency |

$fTM(x)=max{fBM(x+x0),5}$ (1/s) | ||

$x0=0.45$ (cm) | ||

$\gamma TM(x)$ | $2\zeta TM(x)\omega TM(x)$ (rad/s) | TM damping^{d} |

$\zeta TM(x)=21.5/fTM(x)$ | ||

$\Gamma mi$ | 1.4 | Incus lever ratio^{e} |

$Sow$ | $wBMD=0.1\u2009$($cm2$) | Oval-window area |

$Sty$ | 0.49 ($cm2$) | Tympanic-membrane effective area^{e} |

$Pea$ | $\rho aca2=1.42\xd7106$ $(dyn/cm2)$ | Adiabatic bulk modulus of air^{e} |

$\gamma air$ | 1.4 | Ratio of specific heats of air^{e} |

$Ve$ | 0.5 $(cm3)$ | Outer-ear volume^{e} |

$Gme$ | $Sty/Sow\Gamma mi=6.86$ | Mechanical gain of ossicles^{e} |

$mme$ | 0.0531 (g) | Effective mass of oval window and ossicles |

$hme$ | 1298 (g/s) | Effective damping of middle ear |

$kme$ | $6.813\xd7106$ ($g/s2$) | Effective stiffness of middle ear and tympanic cavity |

^{b}

^{c}

In Vetešník and Gummer (2012), the BM damping towards the apex is increased by one exponential term [their Eq. (A10)]. Here, to avoid long-lasting BM oscillations at low frequencies, the damping towards the apex ($x>2.6$ cm, CF < 200 Hz) is increased by two exponential terms: $10h0(L)\u2009exp\u2009[\u22123.5(1\u2212x/L)]+880h0(L)\u2009exp\u2009[\u221218.2(1\u2212x/L)]$.

^{d}

In Vetešník and Gummer (2012), the TM damping ratio $\zeta TM(x)$ (=0.24) is independent of the position $x$. Here, $\zeta TM(x)$ depends on $x$, increasing with distance from base to apex, inversely proportional to the square root of the TM resonant frequency. The damping ratio was made place dependent to broaden the BM region being undamped by the electromechanical force for more apical positions.

$x$ | Longitudinal position along the BM measured from the base |

$t$ | Time |

$\u2202x$ | Partial derivative with respect to $x$ |

$\u2202t$ | Partial derivative with respect to $t$ |

$\xi $ | Transversal BM displacement (positive towards scala vestibuli) |

$\sigma $ | Stapes displacement (positive into scala vestibuli) |

$G$ | Green's function representing longitudinal fluid coupling between BM segments |

$GS$ | Green's function representing longitudinal fluid coupling between the stapes and the BM |

$U$ | Undamping force equal to the OHC electromechanical force |

$S$ | Second-order Boltzmann function (sigmoidal function) representing the receptor current as a function of stereocilia displacement |

$\eta $ | Shearing displacement between RL and TM |

$HTM$ | Transfer function of the TM relating $\eta $ to $\xi $ |

$UNL$ | Nonlinear part of the undamping force produced by the OHC |

$\xi \u0302(1)$ | The first base function: a superposition of a wave traveling backward towards the stapes and a wave reflected by the stapes |

$\xi \u0302(2)$ | The second base function: a wave traveling towards the apex |

$PP$ | Pressure difference between scala vestibuli and scala tympani due to the primary source of DPOAEs |

$PS$ | Pressure difference between scala vestibuli and scala tympani due to the secondary source of DPOAEs |

$R$ | Reflection coefficient at the stapes as seen intracochlearly |

$x$ | Longitudinal position along the BM measured from the base |

$t$ | Time |

$\u2202x$ | Partial derivative with respect to $x$ |

$\u2202t$ | Partial derivative with respect to $t$ |

$\xi $ | Transversal BM displacement (positive towards scala vestibuli) |

$\sigma $ | Stapes displacement (positive into scala vestibuli) |

$G$ | Green's function representing longitudinal fluid coupling between BM segments |

$GS$ | Green's function representing longitudinal fluid coupling between the stapes and the BM |

$U$ | Undamping force equal to the OHC electromechanical force |

$S$ | Second-order Boltzmann function (sigmoidal function) representing the receptor current as a function of stereocilia displacement |

$\eta $ | Shearing displacement between RL and TM |

$HTM$ | Transfer function of the TM relating $\eta $ to $\xi $ |

$UNL$ | Nonlinear part of the undamping force produced by the OHC |

$\xi \u0302(1)$ | The first base function: a superposition of a wave traveling backward towards the stapes and a wave reflected by the stapes |

$\xi \u0302(2)$ | The second base function: a wave traveling towards the apex |

$PP$ | Pressure difference between scala vestibuli and scala tympani due to the primary source of DPOAEs |

$PS$ | Pressure difference between scala vestibuli and scala tympani due to the secondary source of DPOAEs |

$R$ | Reflection coefficient at the stapes as seen intracochlearly |

The transversal displacement $\xi (x,t)$ (positive for displacement towards *scala vestibuli*) of the oscillators is calculated in the model using the integro-differential equation

where $\u2202t$ and $\u2202x$ represent partial derivatives with respect to time, $t$, and $x$, respectively, $GS(x)$ is the Green's function representing the hydrodynamic coupling between the stapes and the BM at $x$, $\sigma (t)$ is the stapes displacement (positive for displacement into the cochlea), $\u2202xs(x)\u2202x$ accounts for the longitudinal coupling of the oscillators due to the shearing viscosity, $s(x)$, and $G(x,x\xaf)$ is the Green's function representing longitudinal coupling between the BM segments caused by hydrodynamic forces from the surrounding fluid. The acceleration of the oscillator at $x\xaf$ with a length $dx\xaf$ produces force per unit BM length at $x$. This force is given by the term $G(x,x\xaf)\u2202t2\xi (x\xaf,t)dx\xaf$. The Green's function $G(x,x\xaf)$ can be expressed as a sum of two terms: the long-range term and the short-range term; analytical expressions for these two terms for the box model can be found in Vetešník and Nobili (2006).

Each oscillator is locally amplified by the undamping force, $U(x,t)$, which represents the electromechanical force produced by the OHC soma (Nobili *et al.*, 1998). Similar to Nobili and Mammano (1996), we assume that $U(x,t)$ depends nonlinearly on the hair-bundle displacement $\eta (x,t)$:

where $a=1\u2009(cm\u22121)$ and $u(x)$ is a suitable function controlling the degree of undamping along the BM and is given by

$g1$, $g2$, and $g3$ were set empirically so that the desired cochlear amplifier gain at CF was achieved; the chosen values are $g1=1.44$ ($g/s2$), $g2=5.88$, and $g3=0.15$. $S(y)$ is a sigmoidal function (second-order Boltzmann function) given in Nobili and Mammano (1996) as

where $y$ is equivalent to $a\eta (x,t)$ in Eq. (2); the parameter values of the function are $y1=0.1139$, $y2=0.3736$, $c1=0.7293,$$c2=1.4974$, $b=0.309\u200991$. This function simulates the mechanoelectrical transducer which is the dominant nonlinearity in the system (Santos-Sacchi, 1993).

The hair-bundle displacement, $\eta (x,t)$, caused by the BM motion, is defined as the shearing displacement of the RL relative to the TM, and is calculated using the differential equation

where $\gamma TM(x)$ is the damping caused by the viscosity of the subtectorial space and viscoelasticity of the TM, and $\omega TM(x)$ is the TM resonance frequency. The sigmoidal shape of the nonlinear function, $S(y)$, limits cochlear amplification at higher intensities and causes the input/output (I/O) functions of BM motion to be compressive.

### B. Analytical approximation for a two-tone input

This section provides an analytical approximation to the solution of the model equations at the radial frequency of the lower cubic DPOAE. The solution, Eq. (16), determines how adjacent parts of the distributed DPOAE primary sources contribute to the resulting DPOAE. The equations presented below are based on the equations in Vetešník and Gummer (2012) and, therefore, the reader is referred to that publication for a comprehensive presentation of the details.

The model is analyzed in the frequency domain. Since the stimulus is composed of two pure tones and the model is nonlinear, the solution becomes tractable if we assume that the input is composed of tones which are multiples of some fundamental angular frequency, $\omega 0$; that is, $\omega 1=n1\omega 0$, $\omega 2=n2\omega 0$. The steady-state solution of Eqs. (1) and (A1), $\xi stat(x,t)$, can then be approximated by a truncated Fourier series (Nobili and Mammano, 1996)

because the amplitudes of the Fourier components, $\xi \u0302n(x)$, are negligible for $|n|>N$, where $N$ is limited by the frequency range of the cochlear model. (The circumflex designates a phasor.) Each term of the Fourier series satisfies the integral equation

where $H\u0302nTM(x)$ is a dimensionless transfer function relating $\eta \u0302n(x)$ to $\xi \u0302n(x)$; namely,

and $U\u0302nNL(x)$ is the *n*th Fourier component of the nonlinear undamping force defined as

Finally, $\Omega \u0302n$ is the *n*th Fourier component of the stapes displacement, $\sigma (t)$, which is calculated from Eq. (A1) as

The approximate analytical solution at the radial frequency, $\omega DP=2n1\omega 0\u2212n2\omega 0$, ($nDP=2n1\u2212n2$), was derived in Vetešník and Gummer (2012). It assumes (1) the DP generation site is restricted to a narrow region which is well-separated from the BM characteristic frequency (CF) site of $\omega DP$ and (2) the longitudinal coupling due to shearing viscosity and the short-range term of the Green's function do not qualitatively affect the steady-state solution. The effects of both couplings can be then included into local properties of the BM oscillators; i.e., into the BM transfer function $H\u0302nDP(x)$ defined by Eq. (22) in Vetešník and Gummer (2012). This transfer function relates the local hydrodynamic force per unit length across the cochlear partition to BM displacement at the DP frequency and includes both passive and active components. Under these assumptions, Eq. (7) is equivalent to a second-order differential equation with boundary conditions at $x=0$ and $x=L$, and allows derivation of the approximate analytical solution.

This solution is the sum of two terms (Vetešník and Gummer, 2012); namely,

where the first term represents the direct response of the BM to $U\u0302nDPNL(x)$ and the second term represents responses due to fluid coupling. $QnDP$ is a constant defined as (Vetešník and Gummer, 2012)

where $\rho $ is the fluid density given together with the remaining parameters in Table I, and $K\u0302nDP(x,x\xaf)$ is the Green's function of the boundary-value problem, and is given by (Vetešník and Gummer, 2012)

where $W\u0302$ is the Wronskian determinant of the homogeneous differential equation and is a constant for this application (Vetešník and Gummer, 2012). The functions $\xi \u0302nDP(1)(x)$ and $\xi \u0302nDP(2)(x)$ are linearly independent homogeneous solutions (base functions) which satisfy the boundary conditions at $x=0$ and $x=L$, respectively. $\xi \u0302nDP(1)(x)$ can be expressed as a superposition of two waves with opposite phase slope (a backward wave and a forward wave due to the reflection at the stapes),

where $R\u0302$ represents the stapes reflection coefficient viewed from within the cochlea, $a\u0302$ is the complex constant determined by the boundary condition at $x=0$, and $AnDPB,F(x)$ and $\phi nDPB,F(x)$ represent the amplitudes and phases of the backward $(B)$ and forward $(F)$ waves, respectively. $\xi \u0302nDP(2)(x)$ is approximated by a single wave traveling from the base to the apex of the cochlea,

where $b\u0302$ is the complex constant determined by the boundary condition at $x=L$.

For studying the generation of the DPOAE nonlinear component, the solution of Eq. (11) at $x=0$ is the relevant BM displacement because it is this spatial value that drives the stapes in reverse. Provided the tonotopic place of $f2$ is located sufficiently distant from the basal end of the BM, the nonlinear force at the DP frequency at the basal end, $U\u0302nDPNL(0)$, is negligibly small. Therefore, $\xi \u0302nDP(0)$ is given (approximately) by the second term in Eq. (11). Furthermore, the integration interval in Eq. (11) can be narrowed by assuming that $U\u0302nDPNL(x)$ due to interaction between the $f1$ and $f2$ traveling waves is negligible except for $x1<x<x2$.^{1} That is, the fluid pressure difference at the BM base, $P\u0302nDPP(0),$ which is proportional to the DPOAE pressure in the ear canal, is given by

In this interval, the nonlinear force, $U\u0302nDPNL(x),\u2009$ produces DPs which may contribute to the DPOAE signal. In Sec. III, we determine this interval by the method described in footnote ^{1} and use Eq. (16) to analyze the dependence of DPOAE amplitude on stimulus intensities and on the spatial extent of the nonlinear force. We emphasize that the intensity dependence of the DPOAE signal does not depend on $\xi \u0302nDP(2)(x)$ because being a solution of a homogeneous equation, $\xi \u0302nDP(2)(x)$ is independent of a stimulus. Because the first three right-hand factors of Eq. (16), $\xi \u0302nDP(1)(0)(QnDP/W\u0302)$, are independent of the stimulus amplitudes, we treat those terms as a constant for a given $nDP$ and focus the analysis only on the integral. The theoretical analysis and the experimental measurements use a frequency ratio of $f2/f1=1.2$, the value commonly used in routine measurements in the laboratory and the clinic. This choice provided a relatively localized primary source up to stimulus levels of at least 50 dB SPL.

According to Vetešník and Gummer (2012), the nonlinear force $U\u0302nDPNL(x)$ can be written as

where $\phi n1F(x)$ and $\phi n2F(x)$ are the spatial phases of the steady-state solution for the stapedial input at the stimulus frequencies, $C(x)$ is a slowly changing function of $x$, and $UANL(x)$ is the spatial amplitude of $U\u0302nDPNL(x)$. If the generation region is restricted to a sufficiently narrow interval, then the phases of $U\u0302nDPNL(x)$ and $\xi \u0302nDP(2)(x)$ do not change significantly and the DPOAE signal depends on the constructive contribution from distributed DPOAE sources.

#### 1. Time-domain extension of the analytical solution

If sufficiently slow onset and offset of the DPOAE eliciting tone is used, we can assume instantaneous coupling between the DP generation sites and the DPOAE signal at the BM base. Numerical simulations suggest that for an elicitor duration longer than about 10 ms (Vencovský and Vetešník, 2018b), Eq. (16) can be extended to the time domain to approximate the time courses of the onset and offset responses of the DPOAE fluid pressure difference at the BM base; that is, $P\u0302nDPP(0)$ is now extended to $PnDPP(0,t)$ by the approximation

where

$x1(t)$ and $x2(t)$ specify the generation region of the primary sources, but in contrast to the foregoing steady-state analysis, this region now changes with time during the onset and offset of the DPOAE-eliciting stimulus tone, as does the nonlinear force, $U\u0302nDPNL(x,t)$. To determine the generation region at a given time instant, we used the same approach as for the steady-state analysis as given in footnote ^{1}. Equation (9) calculates the nonlinear force in the frequency domain and assumes a stationary signal. However, the calculation can be also done numerically in the time domain. The method used to evaluate Eq. (18) is given in Sec. II C 2.

It has been shown in the study by Vencovský and Vetešník (2018b) that the suppression of BM responses, $\xi \u0302n1,2(x)$, at one stimulus frequency during the onset of a pulsed tone causes complexities in the onset and offset of $PnDPP(0,t)$. In the present study, we used Eq. (18) to investigate the temporal development of the complexities within the onset and offset regions of analytically simulated $PnDPP(0,t)$ signals. The results are presented in Sec. III A 2.

### C. Simulations

This section describes the methods used for the simulations. Subsections have the same names as corresponding subsections in Sec. III, to improve orientation and clarity. In the text, we use the terms “numerical simulations” and “analytical simulations.” By “numerical simulations” we mean simulations obtained by numerical solution of the cochlear box-model equations in the time domain [Eqs. (1), (5), and (6)] which, together with the middle-ear transmission model ( Appendix), yield the acoustical pressure of the nonlinear-distortion component of the DPOAE signal in the external auditory canal. The term “analytical simulations” denotes calculation of Eqs. (16) and (18), using $\xi \u0302nDP(2)(x)$ and $U\u0302nDPNL(x)$ derived from the numerical simulations, to yield analytical approximations of the nonlinear-distortion component of the DPOAE fluid pressure across the organ of Corti at the BM base.

The cochlear box model was discretized in the spatial dimension with 800 equal-length segments. We repeated some of the simulations with a larger number of segments (1200) and did not find any significant differences. The time-domain solution of the model was performed with a matlab implementation of the Dormand–Prince method, a member of the Runge-Kutta family of ODE solvers. The sampling frequency was 600 kHz. Source codes of the model will be provided on request. The box model was not calibrated to yield absolute values of BM displacement and DPOAE pressure expected for the human cochlea. In addition, only those parts of Eqs. (16) and (18) are calculated which depend on the examined stimulus parameter (stimulus levels). Therefore, arbitrary units (a.u.) are used to represent simulated responses.

Because of the assumed near-scale invariance of the box model^{2} (Vetešník and Nobili, 2006), we mainly focused our analysis on one setting of the stimulus frequencies: $f1=2$ kHz, $f2=2.4$ kHz $(f2/f1=1.2)$, which gives the cubic distortion product at $fDP=2f1\u2212f2=1.6$ kHz. We kept the level of one of the stimulus tones fixed (50 dB SPL) and changed the other tone level from 30 to 70 dB SPL with a step of 10 dB SPL. In addition, we simulated DPOAEs for the same stimulus tones as in the experiment described below. Those simulations were conducted for two $f2$ values: 5.5 and 5.9 kHz, and a frequency ratio $f2/f1$ of 1.2. Stimulus levels were changed in the same way as described above for $f1=2$ kHz and $f2=2.4$ kHz stimuli.

#### 1. Analysis of the steady-state response of the simulated DPOAE

In the analytical solution, only the integral over the product of the base wave $\xi \u0302nDP(2)(x)$ and the nonlinear force $U\u0302nDPNL(x)$ in Eq. (16) depends on the stimulus levels. Therefore, an estimate of the variation of the steady-state DPOAE signal as a function of $L1$ or $L2$ can be obtained by calculating only this integral. The base wave $\xi \u0302nDP(2)(x)$ was calculated from the model equations solved in the frequency domain [Eqs. (1), (5), and (6)]. The nonlinear force $U\u0302nDPNL(x)$ was calculated numerically in the time domain by first subtracting the linear approximation of $U(x,t)$ from $U(x,t)$; that is, referring to Eq. (3), by calculating the difference $u(x)S[a\eta (x,t)]\u2212u(x)a\eta (x,t)$. To extract the cubic distortion component, this difference function was filtered with a zero-phase bandpass filter centered at $fDP$, which was designed as a window-based finite impulse response (FIR) filter of order 800 with filter coefficients computed using a Hamming window. The 6-dB bandwidth was chosen to be dependent on $f2$ and calculated according to

where the unit of frequency in this equation is Hz. Before filtering, the time-domain signal was resampled to the sampling frequency used in the experiments; namely, 102.4 kHz.

In order to better understand the dependence of the steady-state DPOAE signal on stimulus level for the box model, which contains a longitudinal array of coupled nonlinearities, we investigated the two-tone steady-state response inherent to the sigmoidal nonlinearity used in the box model [Eq. (4)]. A purely analytical approach for quantifying the two-tone response is complicated because of the saturating character of the sigmoidal function. The Taylor series expansion of the sigmoidal function converges too slowly in the saturation regions. Therefore, the two-tone response was simulated in matlab. A systematic analysis of the behavior of the nonlinearity can be found in Duifhuis (1989) and Lukashkin and Russell (1998, 1999, 2001).

The stimulus-level dependence of the DP amplitude inherent to the (isolated) sigmoidal function will be shown to be qualitatively similar to that for a single nonlinear BM oscillator generating a DP in the box model (Sec. III A 1). However, the similarity cannot be anything but qualitative because not being embedded in the cochlear box model, the output and, therefore, the input of the isolated nonlinearity is not coupled to neighbouring nonlinearities. Therefore, DPOAEs generated by a single oscillator within the cochlea are, in this publication, determined analytically by using only one model-segment in the cumulative integral given in Eq. (16). For this purpose, we used the model segment at the place of maximum nonlinear force. This approach allows us to determine how the distributed nature of the primary source contributes to the observed level dependence of the DPOAE in the box model.

#### 2. Analysis of the time course of the simulated DPOAE

DPOAE signals were simulated by numerical solution of the model equations in the time domain. In order to remove the responses at the stimulus frequencies, the stimulus tones were presented with different phases (multiples of 90° and 180°) and the signals averaged in the time domain (Whitehead *et al.*, 1996). For the analytical simulations, the nonlinear force $U\u0302nDPNL(x,t)$ and the base wave $\xi \u0302nDP(2)(x)$ were calculated in the same way as described in Sec. II C 1 for analytical simulation of the steady-state DPOAE response. Then, the DPOAE signals for fluid pressure at the basal end of the BM were derived analytically from Eq. (18). In all simulations, onset/offset of the DPOAE elicitors were shaped with 10 ms raised-cosine ramps.

#### 3. Contribution from the coherent-reflection source to simulated DPOAEs

To investigate how the coherent-reflection source may affect the DPOAE steady-state signal for the stimulus parameters used in this publication, random inhomogeneities (roughness) were introduced into the model used for numerical simulations. For this purpose, the function $u(x)$ given by Eq. (3), which determines the undamping force in the model, was modified to $u\u0303(x)$ by adding Gaussian noise,

where $\u03f5=0.05$ and $N(0,1)$ denotes a Gaussian distribution with zero mean and variance of unity. In order not to affect the nonlinear-distortion component generated near the tonotopic place of the $f2$ traveling wave, we added the inhomogeneities only to the model sections between $x=1.50$ cm and $1.67$ cm (CF of 1.8 kHz and 1.4 kHz) for $f2=2.4$ kHz, and $x=0.90\u2009$ and $1.10$ cm (CF of 4.0 kHz and 3.0 kHz) for $f2=5.5$ kHz. The onset and offset of the inhomogeneities were shaped with a Tukey window defined as

where $r=0.3$ and $z$ ranges between 0 and 1. The window covered all the model sections into which the inhomogeneities were added; e.g., for $f2=2.4$ kHz the model section for $x=1.50$ cm corresponds to $z=0$, and the model section for $x=1.67$ cm corresponds to $z=1$; the positions of the remaining model sections were linearly scaled between 0 and 1.

### D. Experiment

#### 1. DPOAE signal acquisition

DPOAEs were measured in two normal-hearing (pure-tone hearing threshold <20 dB hearing level) men of age 29 and 46 years, denoted as s01 and s02, respectively. The experiments were approved by the Ethics Committee of the University of Tübingen in accordance with the Declaration of Helsinki for human experiments. All measurements were performed in a sound-proof chamber (Industrial Acoustics Company, Niederkrüchten, Germany) using an ER-10C DPOAE probe-microphone system (Etymotic Research, Elk Grove Village, IL) connected to a PC-based setup (Zelle *et al.*, 2015, 2017b).

The DPOAE measurements were conducted for two different $f2$ values: 5.5 and 5.9 kHz. These frequencies were chosen so as to find a frequency region which was free of spontaneous otoacoustic emissions (SOAEs) and which had a relatively small secondary-source amplitude (≤0.27 of the primary-source amplitude) at low stimulus levels (≤45 dB SPL). SOAEs can interfere with small-amplitude DPOAEs (van Dijk and Wit, 1990; Zelle *et al.*, 2017b). In both subjects, the secondary source component was small at high frequencies: in s01 at 5.5 kHz and in s02 at 5.9 kHz. DPOAE measurements were conducted for a frequency ratio $f2/f1=1.2$.

In order to reduce the amplitude responses at $f1$ and $f2$, the stimulus tones were presented with different phases (multiples of 90° and 180°) and the signals averaged in the time domain (Whitehead *et al.*, 1996). Two different acquisition schemes were used. In the first scheme, the focus was on the onset, offset, and steady-state parts of the DPOAE signal. The longer of the two stimulus tones was turned on shortly after the onset of the acquisition block; the other stimulus tone was turned on after a further time delay and turned off before the offset of the first stimulus tone. The shorter stimulus tone is called the “elicitor” since its presentation evokes DPOAEs. Both $f1$ and $f2$ tones were tested as elicitors, similarly to Zelle *et al.* (2018). Each acquisition block had duration of 90 ms. The longer (80-ms duration) stimulus tone was turned on 5 ms after the onset of the acquisition block; the onset and offset envelopes were shaped with 4-ms raised-cosine ramps. The elicitor tone was turned on 20 ms after the onset of the acquisition block and had duration of 30 ms. The onset and offset of the elicitor envelopes were shaped with 10-ms raised-cosine ramps. The measurements were performed for two level paradigms of the stimulus tones: either for $L1$ kept fixed at 50 dB SPL and $L2$ varied from 30 to 70 dB SPL with a 10-dB SPL step, or $L2$ kept fixed at 50 dB SPL and $L1$ varied between 30 and 70 dB SPL with a 10-dB SPL step. The second acquisition scheme used shorter elicitor pulses (duration of 9 ms instead of 30 ms) to allow separation and acquisition of both the nonlinear-distortion and coherent-reflection components (Zelle *et al.*, 2017a). The $f2$ tone was the elicitor and the relative levels of the stimuli were optimized for each $f2$ and $L2$, by choosing $L1$ such that the generated DPOAE amplitude would be as large as possible (Zelle *et al.*, 2015). The duration of the acquisition block was 60 ms. The $f1$ tone was turned on 5 ms after the onset of the block and had duration of 50 ms. The onset and offset envelopes of the $f1$ tone were shaped with 4-ms raised-cosine ramps. The elicitor was turned on 20 ms after the onset of the acquisition block; its onset and offset envelopes were shaped with 2.5-ms raised-cosine ramps. For both acquisition schemes, the frequency ratio was $f2/f1=1.2$.

The same averaging procedure was used for both acquisition schemes. Averaging was performed for a maximum of 100 acquisition blocks; acquisition was terminated if the signal-to-noise ratio (SNR) between the DPOAE amplitude and the noise, estimated in the time domain, was at least 10 dB; see Zelle *et al.* (2017b) for details. The average included only those acquisition blocks which enhanced the SNR.

#### 2. DPOAE source extraction

The short-pulse DPOAEs, acquired with the second scheme described in Sec. II D 1, were used to extract information about the coherent-reflection source. Due to the different latencies of the two sources, the onset of the DPOAE signal is dominated by the nonlinear-distortion source and the offset by the coherent-reflection source (Zelle *et al.*, 2013). Therefore, the DPOAE signal can be described as a superposition of the nonlinear-distortion and coherent-reflection components, each of which is estimated in amplitude and phase by fitting a pulse-basis function (PBF) to the recorded DPOAE signal (Zelle *et al.*, 2013, 2017a). Details of the PBF method are given in Zelle *et al.* (2013). The PBF method was applied to the DPOAE signals averaged in the time domain (sampling frequency 102.4 kHz) and band-pass filtered using a zero-phase finite impulse response (FIR) filter with an order of 800 and filter coefficients computed using a Hamming window.

## III. RESULTS

Section III A presents a theoretical analysis of the steady-state (Sec. III A 1) and time course (Sec. III A 2) of the DPOAE signal. To this end, the DPOAE generated by a single oscillator is compared with the DPOAE generated by the box-model containing an array of oscillators, operating in a smooth cochlea (without mechanical inhomogeneities) and producing mutually interfering wavelets. Section III B compares the simulations (without mechanical inhomogeneities) with experiments (Sec. III B 1) and then examines a possible contribution from the secondary source to the DPOAE onset and offset responses (Sec. III B 2). It is concluded that, for the stimulus intensities and frequencies used in the present experiments, the secondary source had negligible effect on the DPOAE signal and, therefore, that complexities in the onset and offset responses of the DPOAE were due to two-tone suppression between stimuli.

### A. Simulations

#### 1. Analysis of the steady-state amplitude of the simulated DPOAE

This section presents model simulations of DPOAEs, focusing on their steady-state properties and the effect of changing one of the stimulus levels. It combines numerical simulations in the time-domain with analytical simulations in order to reveal how stimulus levels affect the DPOAE primary source.

##### a. Basilar-membrane responses.

Before presenting the DPOAE simulations, we demonstrate the consistency of model BM responses with experimental data reported in the literature for single-frequency and impulse stimulation. To this end, the I/O function of the BM displacement amplitude, obtained from numerical solution of the box model, at the BM position tuned to 2.4 kHz (upper panel) and 5.5 kHz (lower panel) at 0 dB SPL is depicted in Fig. 1. At these BM positions, the gain of the cochlear amplifier^{3} was 54 and 58 dB, respectively. The simulations at 5.5 kHz were performed because this frequency was also used for experimental DPOAE data (Sec. III B 1).

Figure 2(A) shows displacement amplitude and phase of the iso-intensity model responses at the BM position tuned to 2.4 kHz (left panels) and 5.5 kHz (right panels) at 0 dB SPL. Notice the broadening of the amplitude responses and the basal shift of their maxima with increasing stimulus level. The phase responses are relatively independent of level. The level dependence of the iso-intensity responses is in qualitative agreement with experimental data (Robles and Ruggero, 2001). Figure 2(B) depicts impulse responses obtained at the same BM position as for the iso-intensity frequency responses. The zero crossing points of the impulse responses are independent of level, agreeing with experimental data (Recio and Rhode, 2000; Verhulst *et al.*, 2012).

Taken together, this agreement between simulation and experiment provides confidence in the cochlear model.

##### b. Comparison of numerically and analytically derived steady-state DPOAE amplitudes.

To check the validity of the analytical approximation for the DPOAE amplitude given by Eq. (16), steady-state DPOAE amplitudes were compared for the numerical and analytical simulations of the box model. Figure 3 compares the intensity dependence of the steady-state DPOAE amplitude calculated by both methods for $f2=2.4\u2009$ kHz with $f2/f1=1.2$. The analytical and numerical amplitudes well-nigh superimpose, thus supporting the use of Eq. (16) in our analysis of suppression phenomena. For each intensity condition, either constant $L1=50$ dB SPL or constant $L2=50$ dB SPL, the maximum DPOAE amplitude is reached for the optimal combinations of $L1$ and $L2$ given in Kummer *et al.* (1998) and Zelle *et al.* (2015). At the optimal levels, according to definition (Kummer *et al.*, 1998), the amplitudes of both traveling waves at the $f2$ tonotopic place are approximately equal.

##### c. Spatial distribution of the DPOAE primary source.

Now, using the analytical solution [Eq. (16)], we describe the spatial distribution of the DPOAE primary source, which is given by the nonlinear undamping force $U\u0302nDPNL(x)$.

Figures 4(A) and 4(D) depict the steady-state amplitude and phase of the traveling-wave displacement of the BM when stimulated, respectively, with fixed $L1=50$ dB SPL or $L2=50$ dB SPL; the variable stimulus level ranged from 30 to 70 dB SPL with a step of 10 dB SPL. The traveling waves are depicted for the stimuli and the DP component. Figures 4(B), 4(E) show the corresponding nonlinear forces, $U\u0302nDPNL(x)$, which are the source of the nonlinear-distortion component of DPOAEs. Figures 4(C) and 4(F) depict the corresponding cumulative integrals calculated from these forces as an integral of the product of the nonlinear force and the DP base wave $\xi \u0302nDP(2)(x)$ [Eq. (16)]. Evaluated on the interval $[0,\u2009\u2009x]$, the result of the integration gives a value proportional to the DPOAE signal for primary source generators distributed along the BM between the base ($x=0$) and $x.$ In other words, the integral sums all the primary sources (wavelets) from the base to the place $x$, with each generator having a unique phase. Notice that the amplitude of the cumulative integral does not change for $x$ greater than some apical value. This means that there are no sources of DPOAE apically from that place. The region in which the cumulative integral is drawn with a thickened line designates the region which should mainly contribute to the generation of the DP.^{1} That region corresponds to the integration interval $[x1,\u2009x2]$ in Eq. (16); the procedure for estimating this interval is given in footnote ^{1}. The ringing in some of the cumulative integrals [Figs. 4(C) and 4(F)] around the $fDP$ tonotopic place, apical to $x2$, is associated with the rapid roll-off of the phase of the nonlinear force in that region, which leads to interference between wavelets. Since the ringing is approximately symmetrical about the asymptotically constant value of the cumulative integral at large $x$, the wavelets in that ringing region mutually cancel and, therefore, do not contribute to the DPOAE signal.

For constant $L1=50$ dB SPL, the traveling-wave amplitude for $f2$ increases with increasing $L2$ and for $f1$ decreases with increasing $L2$ above 40 dB SPL [Fig. 4(A)]. That is, the $f2$ tone suppresses the response to the $f1$ tone above $L2$ of 40 dB SPL. The maximum amplitude of the nonlinear force $U\u0302nDPNL(x)$ slightly increases for $L2$ up to 40 dB SPL and then decreases for higher levels. In addition, the maximum amplitude shifts apically (0.08 cm, 0.16 oct)^{4} and the portion of the BM involved in the generation of DPOAE broadens as $L2$ increases, as indicated by the spatial extent of the line thickening in Figs. 4(B) and 4(C). The widest generation region is achieved for $L2$ of 70 dB SPL (0.8 cm, 1.5 oct). However, at these high levels, the amplitude of the cumulative integral [Fig. 4(C)] first increases up to $x$ of about 1.25 cm, then starts to decrease up to $x$ of about 1.3 cm and then increases again. The amplitude notch at about 1.3 cm is accompanied by a half-cycle lag in the phase of the cumulative integral [Fig. 4(C)]. This means that the primary DPOAE sources located basally from $x=1.3$ cm create DPs with approximately opposite phase to the sources located apically from $x=1.3$ cm. Therefore, the basal contribution partially cancels the apical contribution. Also notice that for all $L2$, the amplitudes of the cumulative integrals reach a value about which they start to oscillate [e.g., orange arrow in Fig. 4(C) for $L2=$ 40 dB SPL]. The beginning of this oscillatory course coincides with the spatial position of the nonlinear force maximum and the beginning of the rapid phase roll-off. The oscillations of the integral imply that the wavelets from sources located apically from this force maximum mutually cancel and, therefore, do not contribute to the DPOAE signal in the ear canal. This destructive wavelet interference constrains the effective width of the DPOAE generation region.

For constant $L2=50\u2009$dB SPL, the traveling-wave amplitude for $f1$ increases with increasing $L1$ and for $f2$ decreases with increasing $L1$ above 40 dB SPL [Fig. 4(D)]. That is, the $f1$ tone suppresses the response to the $f2$ tone above $L1$ of 40 dB SPL. In contrast to the situation for constant $L1=50\u2009$ dB SPL, the amplitude of the nonlinear force $U\u0302nDPNL(x)$ increases by two orders of magnitude across the whole $L1$ range, and the portion of the BM which generates DPOAEs broadens with increasing $L1$. The force begins to saturate at the $f2$ tonotopic place for $L1\u224850$ dB SPL [Fig. 4(E)], which is within the compressive region of the BM I/O function for $f2$ (Fig. 1). Notice the large basal shift of the maximum of the nonlinear force amplitude as $L1$ increases, amounting to 0.25 cm ($\u22480.5$ oct) for $L2=50$ dB SPL and $L1$ increasing from 30 to 70 dB SPL. As with the case for $L1$ constant [Figs. 4(B) and 4(C)], the amplitude of the cumulative integral [Fig. 4(F)] starts to oscillate at $x$ near the maximum of the nonlinear force amplitude [Fig. 4(E)]. This again means that although the generation region broadens with increasing stimulus level, the wavelets from a large part of this region apical to the place of maximum force mutually cancel.

##### d. Distortion-product properties of the isolated sigmoidal nonlinearity.

As shown in Fig. 3, the simulated DPOAE amplitudes derived both numerically and analytically for the box model depend non-monotonically on stimulus level, the amplitudes first increasing with increasing $L1$ or $L2$ and then decreasing. In order to better understand the origin of this level dependence, we now consider DP properties inherent to the sigmoidal nonlinearity used in the box model. This nonlinearity, which represents the probability of channel opening in the stereocilia, is given by Eq. (4) and depicted in Fig. 5(A). The function is linear for small hair-bundle displacements and nonlinear and asymmetric at larger displacements. The output level of the nonlinearity at $2f1\u2212f2$ is depicted in Fig. 5(B) as a contour map of $A1$ and $A2$, where $A1$ and $A2$ denote the amplitudes of the hair-bundle displacement at the stimulus frequencies $f1$ and $f2$, respectively. We can see from the contour map that if either $A1$ is kept constant and $A2$ increased, or vice versa, the DP level first increases and then decreases. This amplitude course is better visible in Fig. 5(C) for arbitrarily chosen values of $A1=0.03$ and $A2=0.03$. One might assume that this (lower-sideband) cubic DP is proportional to the cubic DPOAE in the ear canal, provided that the DPOAE were generated by a single nonlinear oscillator and not affected by BM filtering. Indeed, the level dependence in Fig. 5(C) qualitatively resembles the simulated DPOAE amplitudes shown in Figs. 3 and 5(D) for the numerical and analytical solutions of the box model. However, this situation, in which the amplitude of one tone ($A1$ or $A2$) is kept fixed and the other is changed at the input of an isolated nonlinearity, is not equivalent to the numerical or analytical situations for the box model, where either $L1$ or $L2$ is constant. Coupling of the BM oscillators and filtering lead to changes in both traveling waves, even if only one of the levels is changed [Figs. 4(A) and 4(D)].

Therefore, in order to determine how the distributed DPOAE primary sources affect the observed DPOAE amplitude dependence (Fig. 3), Fig. 5(D) shows the (relative) DPOAE amplitude obtained by numerical solution of the box model (circles, *Full mod*.) together with approximate analytical solutions of the box model (crosses, *Single osc.*) in which the primary source was localized to a single model segment located at the maximum of the nonlinear force [Eq. (16); Figs. 4(B) and 4(E)]. Despite the large change in the width of the primary-source region as either $L1$ or $L2$ increases [Figs. 4(B) and 4(E)], the level dependence of the DPOAEs for numerical solution of the box model with distributed oscillators is similar to that for the analytical solution with a single nonlinear oscillator located at the place of maximum nonlinear force. Notice that, because of BM oscillator coupling and filtering, the amplitude dependence for both solutions [Fig. 5(D)] is much less than that for the (isolated) sigmoidal nonlinearity [Fig. 5(C)]. Given the results shown in Fig. 4, the similarity between the numerical and analytical solutions [Fig. 5(D)] is mainly due to mutual cancelation of the wavelets in the DP generation region. In other words, the effect of the distributed primary sources on the DPOAE steady-state amplitude is not very pronounced, at least up to stimulus levels of 70 dB SPL.

#### 2. Analysis of the time course of the simulated DPOAE

Where Sec. III A 1 focused on the steady-state DPOAE response, this section presents numerically simulated time-domain DPOAE signals and identifies the cause of complexities, defined as non-monotonic onsets and offsets of the DPOAE envelope, which are found in experimental data. In the model, the complexities usually present as amplitude overshoots relative to the steady-state DPOAE envelope.

##### a. Temporal analysis.

Figure 6 shows envelopes of a normalized DPOAE and normalized BM displacement obtained by filtering the BM displacement with a zero-phase bandpass FIR filter centered at $f1$ or $f2$ (Sec. II C 2).^{5} The BM displacements were obtained numerically in the model section at which the nonlinear force was maximal [Figs. 4(B) and 4(E)]—the presumed DPOAE generation region; the same BM sections were used to calculate the single oscillator DPOAEs in Fig. 5(D). All envelopes are normalized to their steady-state parts in order to visualize the DPOAE and BM envelopes on the same ordinate. Panels in each column depict results for increasing level of one of the stimulus tones: from 30 to 70 dB SPL with a step of 10 dB SPL. The bold text above each column indicates parameters of the stimulus tones and which of the tones was the elicitor. The pronounced complexities (overshoots) during the onset and offset of the DPOAE envelopes are associated with suppression of the DPOAE response, as seen, for example, in Fig. 6 for $L1=50$ dB SPL and $L2=70$ dB SPL with $f2$ as the elicitor. The tone with the larger BM response at the $f2$ place suppresses the BM response to the other tone. This effect is also visible in the steady-state BM traveling waves depicted in Figs. 4(A) and 4(D). Notice that the most pronounced complexities occur at levels which are relatively far from the optimal level for DPOAE amplitudes given in Kummer *et al.* (1998) or Zelle *et al.* (2015).

##### b. Spatialtemporal analysis.

Figure 7 analyzes the spatial properties of the complexities in the onset (A)–(D) and offset (E)–(H) of the DPOAE signal for $L1=50$ dB SPL, $L2=70$ dB SPL, $f2=2.4$ kHz, $f2/f1=1.2$, and $f2$ as the elicitor. We chose these stimulus levels and $f2$ as elicitor because these numerically simulated DPOAEs have asymmetrical onset and offset overshoots with very large amplitudes relative to the steady-state response (Fig. 6, left-most column, bottom panel). Since the phase ensembles of the stimulus affect the onset and offset responses of the nonlinear force and the DPOAE, we arbitrarily chose the phase of the first of the four phase ensembles to be zero degrees; that is, $\phi 1=\phi 2=0\xb0$, where the subscripts refer to the first and second stimulus tones. The eliciting $f2$ tone was turned on 30 ms after the beginning of the $f1$ tone, has duration of 80 ms, and onset/offset times of 10 ms. Relative to the beginning of the $f1$ tone, the DPOAE signal was sampled in the onset region of the eliciting $f2$ tone at 31, 32, 33.5, 38, and 50 ms and then in the offset region at 90, 106, 109, 111, and 113 ms. The sampling instants at 50 and 90 ms are within the steady-state region of the eliciting $f2$ tone.

Figures 7(A) and 7(E) depict the amplitude and phase of the displacement traveling waves for the $f1$ (solid lines), $f2$ (dashed lines), and $fDP$ (dash-dotted lines) tones calculated in the frequency domain from the numerically simulated BM displacement in the time domain by filtering with a zero-phase FIR filter^{5} tuned to $f1$, $f2$, or $fDP$. For the onset region [Fig. 7(A)], the $f2$ displacement amplitude increases with sampling instant and is accompanied by decrease of the $f1$ displacement amplitude. This dependence of amplitudes on sampling instant is due to the larger amplitude $f2$ traveling wave suppressing the $f1$ traveling wave. The offset region [Fig. 7(E)] presents the opposite behavior (or conversely the same behaviour when measured backward from the end of the $f1$ envelope). With increasing sampling instant, the position of the maximum displacement amplitude of the $f2$ traveling wave shifts basally in the onset region and apically in the offset region; the shift is larger for the offset region, being 0.12 cm (0.24 oct) as opposed to 0.04 cm (0.08 oct). The different amounts of shift in the $f2$ traveling waves in these two sampling regions together with change in the $f1$ traveling wave due to two-tone suppression lead to differences in the nonlinear force and DPOAE amplitude in these regions. Notice that for the offset time instant giving the largest DPOAE amplitude (111 ms), the $f2$ traveling wave (violet color) peak has approximately the same amplitude as the tail of the $f1\u2009$ traveling wave (violet color) at $x\u22481.22$ cm. This is not the case for the onset time instant that gives the largest DPOAE amplitude (33.5 ms, yellow curve). Figures 7(B) and 7(F) depict the amplitude and phase of the nonlinear force, $U\u0302nDPNL(x)$. In the onset region [Fig. 7(B)], the amplitude of the force increases for sample instants up to about 32 ms and the amplitude maximum shifts apically (0.08 cm, 0.16 oct). In the offset region [Fig. 7(F)], the amplitude maximum of the force increases up to about 111 ms and shifts basally (0.05 cm, 0.1 oct).

Figures 7(C) and 7(G) depict the amplitude and phase of the cumulative integral of the product of the nonlinear force, $U\u0302nDPNL(x)$, and the base function, $\xi \u0302nDP(2)(x)$, as given by Eq. (16) with integration on the interval $[0,\u2009\u2009x]$. Normalized DPOAE amplitudes calculated by analytical simulation from the cumulative integral in Eq. (18) are shown in Figs. 7(D) and 7(H) using the colored crosses. The normalized envelope of the numerically simulated DPOAE signal, extracted as the fluid pressure difference across the BM in the segment nearest to the stapes, is depicted by the black solid line in Figs. 7(D) and 7(H). The pressure was filtered with the same filter that was used for the BM responses and forces.^{5} Ripple in the steady-state portion of the DPOAE envelope is the remnant of the stimuli which was not completely filtered. The analytically simulated DPOAE amplitudes and the numerically simulated DPOAE envelopes are normalized such that their values in the steady-state part of the envelopes are equal to unity. We compare these analytically simulated DPOAE amplitudes with the numerically simulated DPOAE envelope of the pressure difference at the BM base rather than of the sound pressure in the ear canal because Eq. (18) used to calculate the analytically simulated DPOAE does not contain time delays of the middle ear. DPOAE amplitudes from the analytical and numerical simulations agree—in both the onset and offset responses. This observation supports the extension of the analytical approximation of DPOAE amplitude given by Eq. (16) to the time-domain as given by Eq. (18).

Notice that the amplitude of $U\u0302nDPNL(x)$ is qualitatively very similar for DPOAE onset [Fig. 7(B)] and offset measured backward from the end of the $f2$ eliciting tone [Fig. 7(F)]. Large differences are seen in the phase of $U\u0302nDPNL(x)$ [Figs. 7(B) and 7(F)]. These differences in the forces $U\u0302nDPNL(x)$ for the DPOAE onset and offset lead to very different cumulative integrals for the DPOAE onset and offset [Figs. 7(C) and 7(G)]. For the sampling instant at the maximum of the DPOAE onset response [at 33.5 ms; yellow line in Fig. 7(C)], the amplitude of the cumulative integral starts to oscillate at $x=1.2$ cm. This position corresponds approximately to the place of the maximum amplitude of the nonlinear force [Fig. 7(B)]. The segments located apically from this point produce wavelets that mutually cancel. In contrast, at the maximum in the DPOAE offset response [at 111 ms; violet line in Fig. 7(G)], the oscillating part of the amplitude of the cumulative integral is reached near $x=1.35$ cm, which is more apically located than for the onset and is slightly more apically located than the place of the amplitude maximum of the nonlinear force [Fig. 7(F)]. These differences between the cumulative integrals for the DPOAE onset and offset lead to larger DPOAE amplitude in the offset than in the onset.

### B. Experimental results and numerical simulations

To test the modeling results presented in Sec. III A, we conducted experiments with two normal-hearing subjects. In this section, we compare the experimental results with simulations (Sec. III B 1) and then consider possible contributions to the experimental data from coherent-reflection sources not included in the model (Sec. III B 2).

#### 1. Nonlinear-distortion sources of DPOAEs

##### a. Dependence of DPOAE envelope on stimulus intensity and elicitor frequency.

Figure 8 depicts the time course of the envelope of DPOAE pressure measured in two subjects (s01 and s02) and also derived from numerical solution of the box model; $L1=50$ dB SPL and $L2$ is changed from 30 to 70 dB SPL in steps of 10 dB SPL. The data for $f2=5.5$ kHz were measured in s01 and for $f2=5.9$ kHz in s02. Only the simulations at 5.5 kHz are depicted because the simulations for both frequencies yield essentially the same results. The model gain at 5.5 kHz is 58 dB (Fig. 1). Figure 9 depicts the time course of the envelope of DPOAE pressure measured in the same subjects and simulated with the same model for the other level condition: $L2=50$ dB SPL and $L1$ changed. Otherwise, the stimulus parameters were the same as in Fig. 8.

Ripples in the experimental data, which are most pronounced in Fig. 8 where DPOAE envelopes are at least three times smaller than in Fig. 9, are most probably due to noise. Additional reflections may also cause ripples (Talmadge *et al.*, 1999). However, such reflections seem not to be the case for our experimental data since the amplitude of the coherent-reflection component in these two subjects was relatively small compared with the amplitude of the nonlinear-distortion component (see Sec. II D 1 and Sec. III B 2).

The complexities in the onset and offset of the DPOAE envelopes of the experimental data and simulations show similar trends. If $L1$ is held constant, the largest complexities (overshoots at the onset or offset of DPOAE pressure) occur if the $f2$ tone is the elicitor. On the other hand, if $L2$ is held constant, the complexities are more pronounced when $f1$ is the elicitor.

##### b. Dependence of DPOAE steady-state amplitude on stimulus intensity.

Figure 10 depicts steady-state DPOAE amplitudes derived^{6} from the experimental data (Figs. 8 and 9) and numerical simulations. Figure 10(A) shows the experimental data for either $L1$ or $L2$ constant (50 dB SPL). For both frequencies, the I/O function is approximately S-shaped for $L2$ constant, but non-monotonic for $L1$ constant. The most salient difference for the two frequencies was the larger “saturating” levels (approximately 30%) for $f2=5.9$ kHz, which is probably subject specific. For comparison between the experimental and simulated I/O functions, Fig. 10(B) shows DPOAE steady-state amplitudes normalized to the corresponding amplitudes at $L1$ or $L2$ of 50 dB SPL. There is good correspondence between the shapes of the I/O functions for experiment and simulation, with the S-shape being captured for $L2$ constant [Fig. 10(B), right] and non-monotonicity for $L1$ constant [Fig. 10(B), left]. In the case of non-monotonicity, notice that the “band”-shaped I/O functions for experiment and model practically superimpose for $f2=5.5$ kHz and $L1$ constant [Fig. 10(B), left].

#### 2. Contribution from the coherent-reflection source of DPOAEs

The complexities (overshoots) in the onset and offset of the numerically simulated DPOAE signal envelope shown in Figs. 6, 8, and 9 derive from a model producing the nonlinear-distortion component by the primary DPOAE source. A coherent-reflection component generated by the secondary DPOAE source is absent in the simulation because the model does not possess inhomogeneities (“roughness” in the impedance or gain functions). However, in experimental data, similar overshoots have been proposed to originate from destructive interference between the nonlinear-distortion and coherent-reflection components (Talmadge *et al.*, 1999; Vetešník *et al.*, 2009). Both destructive and constructive interference may, therefore, have influenced the onset and offset responses. Therefore, we introduced roughness into the gain function of the model and numerically calculated DPOAEs with nonlinear-distortion and coherent-reflection components (Sec. II C 3). In addition, we sought the presence of the coherent-reflection source in the experimental data by means of the onset-decomposition method and also by the PBF method (Zelle *et al.*, 2013, 2017b).

##### a. Numerical simulations.

The simulations presented in Figs. 6, 7, 8, and 9, derived from the cochlear model *without* roughness, have suggested that the onset and offset complexities are produced by suppression of the DPOAE response caused by the tone of larger stimulus amplitude suppressing the BM traveling wave for the tone of smaller stimulus amplitude. Therefore, here, we concentrate on the steady-state responses of the DPOAE components. Specifically, we examine the dependence of the relative amplitudes and phases of the steady-state nonlinear-distortion and coherent-reflection components on stimulus levels. If the amplitude and phase of the coherent-reflection component relative to the amplitude and phase of the nonlinear-distortion component are approximately independent of stimulus level, the coherent-reflection source should not affect the shape of the steady-state DPOAE I/O function. Figures 11(A) and 11(C) show the relative amplitudes and phases for either $L1$ constant at 50 dB SPL and $L2$ varied between 30 and 70 dB SPL (circles connected with solid lines), or $L2$ constant at 50 dB SPL and $L1$ varied between 30 and 70 dB SPL (crosses connected with dashed lines). The simulations are for $f2=2.4$ kHz [Fig. 11(A)] and $f2=5.5$ kHz [Fig. 11(C)]. For $L1=50$ dB SPL, the amplitude ratio changes by about a factor of 2 and the phase difference increases (slightly) from about −0.25 to 0 cycles for $f2=2.4$ kHz, and is almost constant for $f2=5.5$ kHz. The situation for constant $L2=50$ dB SPL and varying $L1$ is more complicated. The amplitude ratio decreases by about a factor of 60 for $L1$ increasing from 30 to 70 dB SPL. In addition, the phase difference changes by more than one half of a cycle—the amount needed to change between destructive and constructive interference. Figures 11(B) and 11(D) then show the amplitude and phase of the simulated DPOAE signal composed of the nonlinear-distortion and coherent-reflection components (black lines). In addition, the same data are shown for the nonlinear-distortion component only (gray lines). As can be expected, a larger effect of the coherent reflection component on the DPOAE amplitude is visible for $f2=5.5$ kHz where the coherent-reflection component has comparable amplitude to the nonlinear-distortion component (about two times larger coherent-reflection component than nonlinear-distortion component for $L1=L2=50$ dB SPL).

##### b. Experimental data.

Based on these simulations, the experimental data (Figs. 8, 9, and 10) were measured for frequencies at which the amplitude ratio of the coherent-reflection component to the nonlinear-distortion component was expected to be less than about 0.25 (at moderate stimulus levels). Such a ratio limit means that amplitude overshoots larger than 25% of the DPOAE steady-state amplitude are unlikely to be due to the secondary source. This expectation is borne out in the following analysis of the experimental data using PBFs (Sec. II D 2) to estimate the steady-state amplitudes and phases of the DPOAE components.

Table III shows the amplitudes and phases of the estimated nonlinear-distortion (amplitude $PP$ and phase $\theta P$) and coherent-reflection (amplitude $PS$ and phase $\theta S$) components derived from the experimental data. The phase difference $\theta S\u2212\theta P$ can be used to determine whether the coherent-reflection component adds destructively (180° ± 45°) or constructively (0° ± 45°) to the nonlinear-distortion component. The data clearly indicate that for $f2=5.5$ kHz and $f2=5.9$ kHz, $PS/PP$ is (slightly) larger than 0.25 only for $L2=30$ dB SPL (one-sided z-test). Importantly, the largest overshoots (up to 100% of the steady-state amplitude) were observed at intensities greater than 40 dB SPL (Figs. 8 and 9), where $PS/PP$ was much less than 0.25 (Table III). As expected, there was no evidence of complexities in the responses to these short-pulse stimuli (data not illustrated).

$f2$ (Hz) . | 5500 . | 5900 . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|

$f1$ (Hz) . | 4583 . | 4917 . | ||||||||

$L2$ (dB SPL) | 30 | 40 | 50 | 60 | 70 | 30 | 40 | 50 | 60 | 70 |

$L1$ (dB SPL) | 62 | 66 | 69 | 73 | 77 | 62 | 66 | 69 | 73 | 77 |

$r2$ | 0.98 | 0.98 | 0.99 | 0.99 | 1.00 | 0.99 | 1.00 | 1.00 | 1.00 | 1.00 |

$PP$(μPa) | 45 ± 0.1 | 70 ± 0.2 | 130 ± 0.2 | 202 ± 0.4 | 295 ± 0.4 | 98 ± 0.2 | 168 ± 0.2 | 275 ± 0.3 | 467 ± 0.5 | 656 ± 0.7 |

$PS$(μPa) | 12 ± 0.1 | 9 ± 0.2 | 19 ± 0.3 | 26 ± 0.4 | 52 ± 0.4 | 25 ± 0.2 | 42 ± 0.2 | 62 ± 0.3 | 56 ± 0.5 | 66 ± 0.7 |

$PS/PP$ | 0.27 ± 0.00^{a} | 0.13 ± 0.00 | 0.15 ± 0.00 | 0.13 ± 0.00 | 0.18 ± 0.00 | 0.26 ± 0.00^{a} | 0.25 ± 0.00 | 0.23 ± 0.00 | 0.12 ± 0.00 | 0.10 ± 0.00 |

$\theta P$ (deg.) | 111 ± 0.2 | 117 ± 0.2 | 122 ± 0.1 | 122 ± 0.1 | 122 ± 0.1 | 298 ± 0.1 | 287 ± 0.1 | 282 ± 0.1 | 278 ± 0.1 | 270 ± 0.1 |

$\theta S$(deg.) | 351 ± 0.6 | 244 ± 1.2 | 360 ± 1.4 | 0 ± 1.2 | 360 ± 0.6 | 220 ± 0.5 | 180 ± 0.3 | 155 ± 0.3 | 148 ± 0.5 | 69 ± 0.6 |

$\theta S\u2212\theta P$ (deg.) | 240 ± 0.7 | 127 ± 1.8 | 238 ± 0.9 | −122 ± 1.0 | 238 ± 0.6 | −78 ± 0.7 | −107 ± 0.7 | −127 ± 0.4 | −130 ± 0.7 | −201 ± 0.7 |

$f2$ (Hz) . | 5500 . | 5900 . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|

$f1$ (Hz) . | 4583 . | 4917 . | ||||||||

$L2$ (dB SPL) | 30 | 40 | 50 | 60 | 70 | 30 | 40 | 50 | 60 | 70 |

$L1$ (dB SPL) | 62 | 66 | 69 | 73 | 77 | 62 | 66 | 69 | 73 | 77 |

$r2$ | 0.98 | 0.98 | 0.99 | 0.99 | 1.00 | 0.99 | 1.00 | 1.00 | 1.00 | 1.00 |

$PP$(μPa) | 45 ± 0.1 | 70 ± 0.2 | 130 ± 0.2 | 202 ± 0.4 | 295 ± 0.4 | 98 ± 0.2 | 168 ± 0.2 | 275 ± 0.3 | 467 ± 0.5 | 656 ± 0.7 |

$PS$(μPa) | 12 ± 0.1 | 9 ± 0.2 | 19 ± 0.3 | 26 ± 0.4 | 52 ± 0.4 | 25 ± 0.2 | 42 ± 0.2 | 62 ± 0.3 | 56 ± 0.5 | 66 ± 0.7 |

$PS/PP$ | 0.27 ± 0.00^{a} | 0.13 ± 0.00 | 0.15 ± 0.00 | 0.13 ± 0.00 | 0.18 ± 0.00 | 0.26 ± 0.00^{a} | 0.25 ± 0.00 | 0.23 ± 0.00 | 0.12 ± 0.00 | 0.10 ± 0.00 |

$\theta P$ (deg.) | 111 ± 0.2 | 117 ± 0.2 | 122 ± 0.1 | 122 ± 0.1 | 122 ± 0.1 | 298 ± 0.1 | 287 ± 0.1 | 282 ± 0.1 | 278 ± 0.1 | 270 ± 0.1 |

$\theta S$(deg.) | 351 ± 0.6 | 244 ± 1.2 | 360 ± 1.4 | 0 ± 1.2 | 360 ± 0.6 | 220 ± 0.5 | 180 ± 0.3 | 155 ± 0.3 | 148 ± 0.5 | 69 ± 0.6 |

$\theta S\u2212\theta P$ (deg.) | 240 ± 0.7 | 127 ± 1.8 | 238 ± 0.9 | −122 ± 1.0 | 238 ± 0.6 | −78 ± 0.7 | −107 ± 0.7 | −127 ± 0.4 | −130 ± 0.7 | −201 ± 0.7 |

^{a}

Data for which the ratio $PS/PP$ is significantly larger (95% confidence level) than 0.25, as estimated by the one-sided z-test; there are two such examples and both are for $L2=30$ dB SPL.

Taken together, we conclude that the complexities in the measured DPOAEs were not caused by the coherent-reflection source for intensities above 40 dB SPL.

## IV. DISCUSSION

This study focused on the time-dependent suppression of DPOAE signals which may occur at certain stimulus levels. The analysis was performed with a two-dimensional nonlinear cochlear model. Numerical simulations were supplemented with approximate analytical solutions of the model equations. The simulations were compared with experimental data measured in two normal-hearing subjects. Experiments and simulations were conducted for one frequency ratio $f2/f1$ of 1.2. Due to the near-scaling invariance of the box model, we focused the theoretical analysis on only a single value of $f2$ (2.4 kHz). Focusing on the steady-state amplitude of the nonlinear-distortion component of the DPOAE, the model-based analysis shows that although the spatial extent of the primary DPOAE sources broadens with increasing stimulus level, the DPOAE (for $f2/f1\u2009$ = 1.2) can be well-approximated by a point source located at the maximum of the nonlinear force. The spatial distribution of the nonlinear force, which acts as the primary source of DPOAEs, is responsible for the complex time courses (amplitude overshoots) of the onset and offset responses of the DPOAE signals. According to the simulations and the experimental data, these complexities are due to two-tone suppression of the DPOAE responses, which in turn is associated with suppression of the tone of smaller stimulus intensity by the tone of larger stimulus intensity.

### A. Nonlinearity and the spatial extent of the primary-source DPOAE generators

Nonlinearity was built into the model by means of nonlinearity in the OHC mechanoelectrical transducers, and was defined by a second-order Boltzmann function [Fig. 5(A)], referred to here as a sigmoidal function. The same function was used in simulations by Lukashkin and Russell (1998, 1999, 2001), Nobili and Mammano (1996), Nobili *et al*. (2003), and Vetešník and Gummer (2012). Keeping the amplitude of one of the tones constant and changing the amplitude of the other tone, the sigmoidal nonlinearity (in isolation from the cochlea) generates DPs at $2f1\u2212f2$ with non-monotonic I/O functions [Fig. 5(C)]—first increasing to a peak and then decreasing. The position of the peak and its amplitude depend on which stimulus amplitude is held constant. Similar non-monotonic dependencies have been reported by Lukashkin and Russell (1998, 1999, 2001) and Duifhuis (1989) for sigmoidal and power-law nonlinearities, respectively. The shapes of the I/O functions for the isolated sigmoidal nonlinearity are in qualitative agreement with experimental (Fig. 10) and cochlear model [Fig. 5(D)] data presented here, and with the literature for experimental data from humans (Kummer *et al.*, 2000; Johnson *et al.*, 2006; Zelle *et al.*, 2015) and rodents (Brown, 1987; Whitehead *et al.*, 1992).

Clearly, the intensity response properties of the isolated nonlinearity [Figs. 5(A)–5(C)] provide only a very rough approximation to intensity responses in the cochlea because no account is taken of the filtering properties of the BM and the possibility that the nonlinear-distortion component might be generated by a set of spatially distributed sources (Martin *et al.*, 2013). Therefore, using the box model to simulate DPOAEs, we examined the spatial extent of distributed sources and their influence on the temporal properties of the DPOAE signal. The distributed sources may mutually cancel, as has already been demonstrated by earlier cochlear modeling (Talmadge *et al.*, 1998; Young *et al.*, 2012). Building on the results of Vetešník and Gummer (2012), we investigated the spatial distribution of nonlinear OHC somatic force which is the generator of the DPOAE. We showed that DPOAE amplitudes derived by numerical simulation of the cochlear box model can be accurately approximated by the analytical simulation given by Eq. (16) (Figs. 3 and 7). In addition, despite the spatial distribution of primary DPOAE sources over a long length of the BM (up to 0.8 cm, 1.5 oct), the steady-state DPOAE signals can be well-approximated by a DP generated from a single BM segment (single oscillator) located at the place of maximum nonlinear force [Fig. 5(D)]. For the stimuli used in this study, this spatial localization to effectively a single DPOAE source is due to mutual cancelation of the wavelets along a large proportion of the source generation region. This cancelation effect is particularly evident for $L2$ constant (50 dB SPL), where the amplitude maximum of the nonlinear force shifts basally as $L1$ increases [Fig. 4(E)] and the cumulative integral [Fig. 4(F)] indicates that mainly the sources distributed in a narrow region basal to the amplitude maximum of the nonlinear force contribute to the measured DPOAE signal. Most of the other wavelets generated in the region^{1} between $x1$ and $x2$, where the nonlinear force $U\u0302nDPNL(x)$ makes a significant contribution to the cumulative integral, mutually cancel. Using the spatial extent for achieving 2.5% to 97.5% of the amplitude of the cumulative integral in its constant apical region as a metric for the effective generation region of the measured DPOAE signal, we find a basal extent of 0.27 cm (0.53 oct) to 0.32 cm (0.62 oct) from the place of maximum force for $L1$ increasing from 30 dB SPL to 70 dB SPL, respectively; that is, the basal extent is approximately 0.3 cm (0.6 oct) independent of stimulus intensity. Therefore, despite the large spatial broadening of the nonlinear force with increasing stimulus level, the BM region contributing to the DPOAE signal in the model is relatively narrow and level independent.

### B. Effect of DPOAE spatial extent on assessing cochlear-amplifier status

The use of DPOAEs as a diagnostic tool is motivated by their providing an objective estimate of the functional state of the cochlear amplifier. As suggested originally by Boege and Janssen (2002), perhaps the potentially most reliable method to extract this information is to construct DPOAE I/O functions semi-logarithmically, plotting the DPOAE amplitude as a function of the level $L2$ ($f2>f1)$ and using optimal relative values of $L1$ and $L2$. Using linear regression, such an I/O function allows estimation of the so-called DPOAE threshold which has been shown to correlate with auditory threshold (Boege and Janssen, 2002) and *accurately* correlate provided the coherent-reflection component is removed from the recorded DPOAE signal (Zelle *et al.*, 2017b).

Larger stimulus levels lead to widening of the BM region generating the nonlinear-distortion component of DPOAEs. This was shown in the present study (Fig. 4) and, for example, in Young *et al.* (2012). Despite the broadening of the generation region, for the stimulus levels presented in this study ($L1=50$ dB SPL and $L2=30$ to $70$ dB SPL, and vice versa), the resulting DPOAE steady-state amplitude can still be well-approximated by generation with a point-like source at the place of the maximum nonlinear force. Therefore, the real DPOAE source can still be assumed to be spatially narrow, which is important for assessing the function of the cochlear amplifier at a given place along the BM. However, the place of maximum steady-state force shifts with increasing stimulus level—apically for constant $L1$ [Fig. 4(B)] and basally for constant $L2$ [Fig. 4(E)]—which could result in erroneous conclusions in the case of a localized hearing loss in a narrow BM region.

The model provided evidence that for most of the stimulus-level combinations used in this study, DPOAE sources apical to the place of the maximum amplitude of the nonlinear force mutually cancel. For example, for $L1=70$ and $L2=50$ dB SPL [Fig. 4(F)], the amplitude (and phase) of the cumulative integral began to oscillate beginning at about $x=1.1$ cm, which coincides with the place of maximum amplitude of the nonlinear force. Such oscillations are a hallmark of destructive interference. For the stimulus levels used in this theoretical and experimental study, mutual cancelation of adjacent sources leads to almost the same dependence of DPOAE amplitude on stimulus levels as would be generated by a point source located at the amplitude maximum of the nonlinear force [Fig. 5(D)]. Contrary to this point-source interpretation, Martin *et al.* (2013) reported experimental evidence of sharp notches for DPOAE amplitude plotted as a function of $f2$ for equal level (75 dB SPL) stimuli ($f2/f1=1.25$). These notches were explained by proposing basal DPOAE sources that cancel the sources near the $f2$ tonotopic place. Non-monotonic and saturating DPOAE I/O functions measured for optimal stimulus levels; i.e., for $L1$ at given $L2$ which leads to maximal DPOAE amplitude, were also observed in humans (Zelle *et al.*, 2017b, their Fig. 5; Johannesen and Lopez-Poveda, 2010, their Fig. 1). In both studies, the authors took precautions against notches being formed by interference between the nonlinear-distortion and coherent-reflection components. With the box model, we also observed sharp notches in DPOAE amplitude for $L1$ and $L2$ larger than about 60 dB SPL (Vencovský and Vetešník, 2018a). We plan to study these notches in the future.

### C. Onset and offset complexities

The measured DPOAE signals contained complexities (overshoots) in their onset and offset envelopes, dependent on stimulus intensities and whether the lower or higher stimulus frequency served as the DPOAE elicitor (Figs. 8 and 9). The simulations indicated that the complexities were due to suppression of one stimulus tone by the other (Figs. 6 and 7). The complexities were more pronounced when the tone with the larger level was the elicitor. There were distinct differences between the onset and offset complexities in the experimental and simulated data. To elucidate the causes of the differences, in the model we analyzed the nonlinear force generating the DPOAE (Fig. 7). For the purpose of illustration, we chose the stimulus intensities and the elicitor to achieve the largest asymmetry between the onset and offset overshoots. For this purpose, $\u2009f2$ was the elicitor with $L1=50$ dB SPL and $L2=70$ dB SPL ($f2=2.4$ kHz with $f2/f1=1.2$); see Fig. 6 (left-most column, bottom row). The principle cause of the asymmetry between the onset and offset responses is differences between the positions of the maxima of the $f2$ traveling wave within the onset and offset response regions to the $f2$ elicitor [Figs. 7(A) and 7(E)]. For increasing time within the onset of the $f2$ eliciting tone, the peak of the envelope of the BM traveling wave shifts basally [Fig. 7(A)]; the shift is approximately 0.4 mm ($\u2248$ 0.08 oct). For increasing time within the offset of the $f2$ eliciting tone, the envelope shifts apically [Fig. 7(E)] and the shift is larger than for the onset responses, being approximately 1.2 mm ($\u2248$ 0.24 oct). Moreover, differences between the onset and offset regions are clearly demonstrated with the cumulative integrals [Figs. 7(C) and 7(G)]. These simulations demonstrate that the onset kinetics are different to the offset kinetics and, therefore, explain the asymmetry in the transient responses of the DPOAE signals.

Based on their experiments, Martin *et al.* (2013) suggested that onset and offset complexities were due to basally located primary sources having opposite phase to the primary source near the $f2$ tonotopic place. Accordingly, since the basal components build up faster than the $f2$-place component, they argued that the DPOAE onset would have larger amplitude than the steady-state response. In the simulations, we also found evidence of such basal components with opposite phase; namely, in phase jumps in the cumulative integrals in Figs. 4(C) and 7(C) at $L2=60$ dB SPL and $70$ dB SPL. For the steady-state time instant [at 50 ms, Fig. 7(B)], basal components of the nonlinear force are visible as a broad local maximum at $x=1.14$ cm. However, for the onset time instants [up to 33.5 ms, Fig. 7(B)], the amplitude of the nonlinear force presents a global maximum near the $f2$ tonotopic place ($x=1.25$ cm), and there is no evidence of a local force maximum basal to this place. Therefore, for the stimulus intensities of our study, in the onset, there is no evidence for a preponderance of basal force generators required for the explanation of the onset transient proposed by Martin *et al.* (2013). Nevertheless, the distribution of the nonlinear force [Figs. 7(B) and 7(F)]—in amplitude and phase—determines the shapes of the onset [Fig. 7(D)] and offset [Fig. 7(H)] complexities and the differences between them. A difference between this study and the Martin *et al.* (2013) study is in the stimulus levels. With both stimulus levels at 75 dB SPL, they observed complexities to be produced by a sudden decrease of the steady-state response. Although this high-level condition is not studied here, the underlying principle that the temporal properties of suppression are determined by the spatial distribution of the nonlinear undamping forces is applicable.

In principle, onset and offset complexities may also arise due to the coherent-reflection component of DPOAEs (Talmadge *et al.*, 1999; Tubis *et al.*, 2000; Vetešník *et al.*, 2009). The simulations in the present study derive from a “smooth” cochlear model which cannot generate coherent-reflection components. For the experimental data, we estimated the amplitude and phase of the coherent-reflection component and presented only data for which the ratio of the coherent-reflection amplitude to the nonlinear-distortion amplitude was no more than 0.27. Table III shows that relative amplitudes $\u2265$ 0.25 occur at the lower levels of $L2=30$ and $40$ dB SPL, whereas the greatest overshoots are found for higher levels of $L2\u226550$ dB SPL (Figs. 8 and 9). In other words, the prominent overshoots appear at level combinations where the coherent-reflection component was found to have a relative amplitude much less than 0.25, whereas the overshoots are often 25% and more above the steady state. The greatest overshoot was found for $L2=60$ dB SPL and $L1=50$ dB SPL with $f2$ (5.9 kHz) as elicitor (Fig. 8), where there was almost a 100% overshoot in both the onset and the offset. These observations mean that it is most unlikely that the complexities observed in the experimental data were caused by coherent reflection.

This assertion is further supported by the observed differences between the DPOAE signals for $f1$ and $f2$ elicitors, where we found that the largest overshoots occur if the elicitor level is larger than that of the other tone (Figs. 8 and 9). We would have expected the coherent-reflection component—specifically, in the steady-state part of the DPOAE envelope—to have the same influence for both elicitors. Theoretical considerations that include coherent-reflection suggest that only an onset overshoot would be present when the coherent-reflection component is smaller than the nonlinear-distortion component [see Figs. 2(a), 2(c) in Tubis *et al*., 2000]. In contrast, we found both onset and offset overshoots for relatively small coherent-reflection amplitudes. Finally, it should be mentioned that, according to our analysis, the more prominent onset and offset responses induced by two-tone suppression appear to arise for non-optimum stimulus levels; that is, for relative stimulus intensities that do not produce maximum steady-state DPOAE amplitudes for a given level of one of the stimulus tones.

## V. CONCLUSIONS

This paper studied possible sources of DPOAE amplitude suppression for specific generation situations in which either the level of the $f1$ stimulus tone was fixed at 50 dB SPL and the level of the $f2$ stimulus tone was changed from 30 to 70 dB SPL, or vice versa. In both situations, the dependence of the DPOAE amplitude on the level of one of the stimulus tones was non-monotonic, first increasing to attain a maximum and then decreasing. It is shown by means of a two-dimensional nonlinear box model that the non-monotonic behavior can be approximately represented by a single point-like source localized at the BM place where the nonlinear electromechanical force is maximal. The BM position of this source changes with stimulus levels; for constant $L1$, the source shifts apically with increasing $L2$, and for constant $L2$ it shifts basally with increasing $L1$. This shift is due to the dependence of the peak of the traveling-wave envelope on stimulus level and also on two-tone suppression. Two-tone suppression is also responsible for the complexities in the onset and offset of the DPOAE signal observed in the simulations and experiments. The numerical solution of the box model allowed examination of the temporal properties of the wavelets which form the nonlinear-distortion component of the DPOAE. Although the wavelets were generated over a BM length spanning as much as 0.8 cm (1.5 oct), most of the wavelets mutually canceled, and those wavelets which contributed to 95% of the resulting amplitude of the cumulative integral spanned a relatively narrow length of the BM (0.3 cm, 0.6 oct) which was independent of stimulus intensity. This narrow region of the cumulative integral explains why the simulated DPOAE steady-state amplitudes could be approximated by a point-like primary source.

## ACKNOWLEDGMENTS

We would like to thank Dr. Petr Honzík, Dr. Christopher A. Shera as the handling Associate Editor, and three anonymous reviewers for their comments and constructive criticism of earlier versions of this paper. Supported by the Grant Agency of the Czech Technical University in Prague, Grant No. SGS17/190/OHK3/3 T/13, by the German Research Council, Grant No. DFG Da 487/3-1,2 and Grant No. Gu 194/12-1, and by European Regional Development Fund-Project “Center for Advanced Applied Science” (Grant No. CZ.02.1.01/0.0/0.0/16_019/0000778). V.V. is supported by International Mobility of Researchers in CTU (CZ.02.2.69/0.0/0.0/16_027/0008465). Access to computing and storage facilities owned by parties and projects contributing to the National Grid Infrastructure MetaCentrum provided under the programme “Projects of Large Research, Development, and Innovations Infrastructures” (CESNET LM2015042), is greatly appreciated.

### APPENDIX: MIDDLE-EAR MODEL AND PARAMETERS

The middle-ear model was adapted from the model given in Talmadge *et al.* (1998), their Eq. (14). The stapes displacement, $\sigma (t)$, positive for inward displacement into the cochlea, obeys the equation

where $Pdr(t)$ is the ear-canal pressure in the case of a rigid eardrum, $mme$ is the total mass of the middle-ear ossicles, $hme$ is the damping, and $kme$ is the stiffness of the middle-ear ligaments and the tympanic cavities. The values of the mass, damping, and stiffness were set to achieve the stapes reflectance measured by Puria (2003); Fig. 12. Stapes reflectance is given by

where $Z\u0302c$ is the cochlear input impedance, $Z\u0302c*$ is its complex conjugate, and $M3\u0302$ is the reverse middle-ear impedance. The cochlear input impedance is defined as

and the reverse middle-ear impedance as

where $P\u0302v$ is the vestibule pressure, and $V\u0302st$ is the stapes velocity. The right arrow ($\u2192$) indicates the situation when the system is driven by the pressure in the auditory canal, while the left arrow $(\u2190$) indicates the reverse drive; i.e., drive from a pressure source in the cochlea. Figure 12 shows the amplitude of the stapes reflectance. The solid line depicts the simulated reflectance amplitude obtained by selecting the model parameters to approximate the experimental data (dashed line) from Puria (2003). Parameters are defined and their values given in Table I.

The stapes displacement $\sigma (t)$ affects the pressure in the ear canal $Pe(t)$, which, under the assumption of adiabatic compression/expansion, is related to the ear-canal pressure $Pdr(t)$ for a rigid eardrum as

where $Pea=\gamma airPe0$, $Pe0$ is the ambient pressure of the air in the ear canal, and $Ve$ is the volume of the ear canal. The tip of the OAE probe closes the ear canal; hence, the acoustical volume of the closed ear canal determines the impedance for reverse propagating signals (Talmadge *et al.*, 1998).

^{1}

The interval $[x1,\u2009\u2009x2]$ over which that part of the nonlinear force $U\u0302nDPNL(x)$ due to the overlap between the $f1$ and $f2$ traveling waves is not negligible is arbitrarily defined as starting at the point $x1$ where the cumulative integral in Eq. (16) is larger than 1% of its value at the apical-most place. It is more difficult to find or define $x2$ because the nonlinear force, $U\u0302nDPNL(x)$, also contains a contribution from the place where the DP reaches its tonotopic place. However, this region does not contribute to the overall DPOAE signal because the phases of $U\u0302nDPNL(x)$ and $\xi \u0302nDP(2)(x)$ in this region quickly rotates, which causes individual wavelets in that region to mutually cancel. Such destructive interference is visible as several ripples in the right-most portions of the individual cumulative integrals [Figs. 4(C) and 4(F)]; e.g., apical to the orange arrow in Fig. 4(C) at $L2=40\u2009$ dB SPL. Therefore, we defined $x2$ as the place at which the amplitude of the (apical flank of the) $f1$ traveling wave decreased below the amplitude for 20 dB SPL pure-tone stimulation at the $f1$ tonotopic place. At this level and below, the traveling wave is linear, as evidenced by the BM I/O function in Fig. 1 and the iso-intensity functions in Fig. 2.

^{2}

The model would be scale invariant if the equation defining the transverse dispalcement of the BM oscillators [Eq. (1)] were without the term $U(x,t)$, which defines the undamping force produced by the outer hair cells. We set the model parameters defining the magnitude of the undamping force [Eq. (3)] in order to achieve approximately the same gain at frequencies between about 1.5 and 6 kHz.

^{3}

The gain was estimated as the level difference between the I/O function at low levels and the extrapolated linear region of the high-intensity portion of the I/O function.

^{4}

Characteristic frequencies as a function of the BM place $x$ were estimated by a 0 dB SPL pure tone in order to simulate traveling waves in the linear part of the model I/O functions (see Fig. 1).

^{5}

The bandwidth of the filter was set according to Eq. (20). The time-domain signal was resampled into the sampling frequency used in the experiments; namely, 102.4 kHz.

^{6}

For the experimental data, the steady-state amplitudes were extracted by calculating the mean value of the DPOAE envelope between 15 and 22 ms after the beginning of the 30-ms elicitor, and averaging over the responses for the $f1$ and $f2$ elicitors. For these (numerical) simulations, the elicitor had a duration of 80 ms and the longer tone had a duration of 140 ms; the amplitude at 40 ms after the beginning of the elicitor was taken as the steady-state amplitude. For the purpose of estimating steady-state amplitudes in the model, the longer duration stimuli were used to ensure that there were no transients in the samples used for averaging.