Hearing aids use dynamic range compression (DRC), a form of automatic gain control, to make quiet sounds louder and loud sounds quieter. Compression can improve listening comfort, but it can also cause unwanted distortion in noisy environments. It has been widely reported that DRC performs poorly in noise, but there has been little mathematical analysis of these noise-induced distortion effects. This work introduces a mathematical model to study the behavior of DRC in noise. By making simplifying assumptions about the signal envelopes, we define an effective compression function that models the compression applied to one signal in the presence of another. Using the properties of concave functions, we prove results about DRC that have been previously observed experimentally: that the effective compression applied to each sound in a mixture is weaker than it would have been for the signal alone; that uncorrelated signal envelopes become negatively correlated when compressed as a mixture; and that compression can reduce the long-term signal-to-noise ratio in certain conditions. These theoretical results are supported by software experiments using recorded speech signals.

## I. INTRODUCTION

Hearing aids often perform poorly in noisy environments, where people with hearing loss need help most. One challenge for hearing aids in noise is a nonlinear processing technique known as dynamic range compression (DRC), which improves audibility and comfort by making quiet sounds louder and loud sounds quieter (Allen, 2003; Kates, 2005; Souza, 2002; Villchur, 1973). Compression is used in all modern hearing aids, but it can cause unwanted distortion when applied to multiple overlapping sounds. For example, a sudden noise can reduce the gain applied to speech sounds. This effect is well documented empirically but has been little studied mathematically. To better understand DRC in noisy environments, this work applies tools from signal processing theory to model the effects of DRC on sound mixtures.

The auditory systems of people with hearing loss often have reduced dynamic range: Quiet sounds need to be amplified in order to be audible, but loud sounds can cause discomfort. Hearing aids with DRC apply level-dependent amplification so that the output signal has a smaller dynamic range than the input signal. A typical DRC system is shown in Fig. 1. An envelope detector tracks the level of the input signal over time in one or more frequency bands while a compression function adjusts the amplification to keep the output level within a comfortable range. Both the envelope detector and the compression function are nonlinear processes, so when the input contains sounds from multiple sources, changes in one component signal can affect the processing applied to the others.

This interaction between signals can be difficult to measure, but hearing researchers have found three quantifiable effects. First, noise can reduce the effect of a compressor, especially at low signal-to-noise ratios (SNR) (Souza *et al.*, 2006). The DRC system applies gain based on the stronger signal and has little effect on the dynamic range of the weaker signal. This effect can measured by comparing the overall dynamic ranges of the input and output signals (Braida *et al.*, 1982; Stone and Moore, 1992). Second, fluctuations in the input level of one component signal vary the output levels of other components. This interaction has been called across-source modulation (Stone and Moore, 2007) and can be measured using the correlation coefficient between output envelopes. Finally, at high SNR, compressors tend to amplify low-level noise more strongly than the higher-level signal of interest, which can reduce the long-term average SNR (Alexander and Masterson, 2015; Hagerman and Olofsson, 2004; Rhebergen *et al.*, 2009; Souza *et al.*, 2006).

The adverse effects of noise on DRC systems have been well documented empirically, but the problem has received little formal mathematical analysis. While experimental work is useful for studying the consequences of these effects, especially on human listeners, theoretical results can help to understand their causes. This work applies signal processing research methods to the DRC distortion problem: First, we make simplifying assumptions to develop a tractable mathematical model of a complex system. Next, we use that model to prove theorems that explain the behavior of the system. Finally, we validate those assumption-based theoretical results using experiments with a realistic system.

Compression systems are difficult to analyze because of the complex interactions between the envelope detector and compression function, both of which are nonlinear. Using the simplifying assumption that envelopes are additive in signal mixtures, we can separate the effects of the envelope detector from those of the compression function. To characterize the interaction between signals in a mixture, we introduce the effective compression function (ECF), which relates the input and output levels of one signal in the presence of another. The ECF is used to explain the three effects described above: that noise reduces the effect of compression (Sec. IV), that compression induces negative correlation between signal envelopes (Sec. V), and that compression can reduce long-term average SNR in certain conditions (Sec. VI).

Each section includes a theorem about the effect and simulation experiments that illustrate it. The theorems rely on the concavity, or downward curvature, of the compression function and, like many results in signal processing theory, take the form of inequalities. The experiments illustrate the predictions of each theorem and show how the results change when the assumptions are violated.

## II. DRC

Because most modern hearing aids are digital, we formulate the DRC system in discrete time. Let the sequence $x\u0303[t]$ be a sampled audio signal at the input of the DRC system, where *t* is the sample index. Let $y\u0303[t]$ be the output of the system.

### A. Filterbank and envelope detector

Compression is often performed separately in several frequency bands. A filterbank splits the signal into *B* channels corresponding to different bands, which may be linearly or nonlinearly spaced and may or may not overlap. Let $x[t,b]$ and $y[t,b]$ be the filterbank representations of $x\u0303[t]$ and $y\u0303[t]$, respectively, in channels $b=1,\u2026,B$.

The gain applied by DRC is calculated from the signal envelope, which tracks the signal level over time. Level is typically defined in terms of either magnitude ($|x|$) or power (*x*^{2}); this work uses power. Let the non-negative signal $vx[t,b]$ be the envelope of the input signal $x[t,b]$ at time index *t* and channel *b*. In the theoretical analysis presented here, the envelope is an abstract property of a signal, such as a statistical variance. In real DRC systems, the envelope is estimated from the observed signal, typically using a moving average.

Most DRC systems respond faster to increases in signal level (attack mode) than to decreases in signal level (release mode) in order to suppress sudden loud sounds. There are many ways of implementing an envelope detector (Giannoulis *et al.*, 2012). A representative detector is the nonlinear recursive filter (Kates, 2008),

for $b=1,\u2026,B,$ and where $\beta a$ and $\beta r$ are constants that determine the attack and release times.

Because envelope detection is a nonlinear process, it contributes to the distortion effects of DRC systems. The theorems in this work do not depend on the filterbank structure or the choice of attack and release time, but these parameters do affect the rate of fluctuation of the measured envelopes and therefore the distribution of envelope samples. Many nonlinear interaction effects are more severe for fast-acting and many-channel compression than for slow-acting and few-channel compression (Alexander and Masterson, 2015; Alexander and Rallapalli, 2017; Naylor and Johannesson, 2009; Plomp, 1988; Rallapalli and Alexander, 2019; Reinhart *et al.*, 2017), though these parameters do not necessarily impact speech intelligibility (Salorio-Corbetto *et al.*, 2020).

### B. Compression function

A compression function $Cb$ determines the instantaneous mapping between input level and target output level in each channel,

where $vy[t,b]$ is the target output level. The amplification applied in each channel is then

so that the output is the product

Note that the target output level $vy[t,b]$ is not necessarily equal to the measured envelope of $y[t,b]$ because the envelope is a moving average. Longer release times cause gains to lag behind short-term signal levels, especially for dynamic signals such as speech (Braida *et al.*, 1982; Stone and Moore, 1992).

Although compression functions are defined here in terms of input and output level (i.e., power), they are often visualized and described on a logarithmic scale, such as in decibels (dB). A typical “knee-shaped” compression function is shown in Fig. 2: It features a linear region in which gain is constant, a compressive region where the output level increases by less than the input level, and and a limiting region that prevents the output from exceeding a maximum safe level.

The strength of compression can be characterized by the compression ratio (CR), which is the inverse of the slope of the compression function on a log-log scale, as shown in Fig. 2. For example, in a 3:1 compressor, the output increases by 1 dB for every 3 dB increase in the input. For a constant $CR$, the compression function is given by the power-law relationship

where $G0[b]$ is a constant power gain factor. Thus, for a 3:1 compressor, the output level is proportional the cube root of the input level. In limiters, $Cb(v)$ is constant and so the CR is infinite.

While most compressors reported in the literature use some combination of linear, power-law, and limiting compression functions, many others are possible. To make our analysis as general as possible, we allow the compression function to be any mapping between non-negative numbers such that the output level grows no faster than the input level. More precisely, we require it to be a concave function.

**Definition 1.** A function $Cb(v)$ is a *compression function* if it is concave, non-negative, and nondecreasing for all *v > 0*.

In mathematics, a function *f*(*x*) is said to be concave if for any $\lambda \u2208[0,1]$ and any *x*_{1} and *x*_{2},

Note that Definition 1 includes non-differentiable functions such as knee-shaped compression curves. It excludes dynamic range expanders, which some hearing aids apply at low signal levels to reduce noise. The proofs in this work will also involve convex functions that satisfy Eq. (6) with the inequality reversed. Convex and concave functions are widely used to prove inequalities in signal processing and information theory (Cover and Thomas, 2006).

To describe how much a compression function reduces the dynamic range of a signal, we could compute its CR. Because the CR can be infinite, however, it is more convenient to work with its inverse, the compression slope.

**Definition 2.** For all points *v* at which a compression function $Cb(v)$ is differentiable, the *compression slope* $CSb(v)$ is the slope of $Cb(v)$ on a log-log scale,

For example, if $Cb(v)=G0[b]v\alpha $, then $CSb(v)=\alpha $ for all *v*. The smaller the compression slope, the more the dynamic range of the signal is reduced.

### C. Experimental methods

To validate the predictions of the mathematical model, each section of this work includes experiments using speech recordings and a software DRC system. The theoretical results in this work rely on simplifying assumptions, but the simulation experiments are more realistic and therefore illustrate the limitations of the model. Wherever possible, the experiments use methods and performance metrics from prior work in the literature.

Although the compression function varies with each experiment, all simulations in this work use the same envelope detector. The input is first processed by a short-time Fourier transform with 8 ms windows and 50% overlap. A frequency-domain filterbank splits the signals into 6 Mel-spaced bands from 0 to 8 kHz, which are roughly linearly spaced at lower frequencies and exponentially spaced at higher frequencies. Within each band, the envelopes are computed using the nonlinear recursive filter (1) with an attack time of 10 ms and a release time of 50 ms as defined by ANSI S3.22–1996 (ANSI, 1996). All speech signals are 60-s clips derived from the Voice Cloning Toolkit (VCTK) dataset of quasi-anechoic read speech (Veaux *et al.*, 2017). The figures in this work use logarithmic scales for envelope level. These levels are given in dB relative to the mean wideband signal level. That is, each speech signal has a mean level of 0 dB across channels.

## III. MODELING COMPRESSION OF SOUND MIXTURES

Hearing aids are often used in noisy environments with several simultaneous sound sources. The interactions between multiple signals are difficult to analyze because DRC involves two nonlinear operations: envelope detection and level-dependent amplification. To create a tractable model for sound mixtures, we make a simplifying assumption about the signal envelopes that allows us to separate the effects of these two nonlinearities, as shown in Fig. 3. Under this model, the filterbank and envelope detector determine the relationship between input signals and envelope values; they act independently on each component signal. Meanwhile, the compression function determines the output levels from these envelopes; it acts independently at each time index and within each channel. In this work, we focus on the compression function.

### A. Envelope model

Suppose that the input to the system is $x\u0303[t]=s\u03031[t]+s\u03032[t]$, where $s\u03031[t]$ and $s\u03032[t]$ are two discrete-time signals. For example, $s\u03031$ and $s\u03032$ could be two speech signals as captured at the listening device microphone, including any reverberation effects. Because a filterbank is a linear system, the filterbank representation of the input is

where $s1[t,b]$ and $s2[t,b]$ are the filterbank representations of $s\u03031[t]$ and $s\u03032[t]$, respectively.

Because envelope detection is a nonlinear process, the additivity property of Eq. (9) does not hold in general for the signal envelopes measured by practical envelope detectors. However, to simplify our analysis, the signal envelopes can be *modeled* as obeying additivity.

**Assumption 1.** *The envelopes* $vs1[t,b],\u2009vs2[t,b]$*, and* $vx[t,b]$ *of* $s1[t,b],\u2009s2[t,b]$*, and* $x[t,b]$*, respectively, satisfy*

This assumption is justified if we think of the envelopes as abstract properties of signals, such as parameters of a process that generates them, rather than as measurements. For example, suppose that $s1[t,b]$ and $s2[t,b]$ are sample functions of random processes that are uncorrelated with each other (Hajek, 2015). Then the variance of the mixture is given by $Var(x[t,b])=Var(s1[t,b])+Var(s2[t,b])$. If $vx[t,b]$ were any linear transformation of the sequence $Var(x[t,b])$, then the envelopes would satisfy Assumption 1. Because the variance is an ensemble average, not a time average, the processes need not be stationary or ergodic.

Of course, real compression systems cannot observe the underlying variance of a random process; they derive envelopes from recorded samples. The accuracy of the additive envelope model depends on the signals: Two sinusoids would strongly violate the assumption (Ludvigsen, 1993), while two signals that are disjoint across time and channels would satisfy it exactly. To test the accuracy of the assumption for a realistic envelope detector, we applied the software envelope detector described in Sec. II C to a mixture of two speech signals and compared the envelope of the mixture, $vx[t,b]$, to the sum of the envelopes of the component signals, $vs1[t,b]+vs2[t,b]$. Figure 4 shows a set of envelope samples drawn from different time frames and frequency channels plotted on a decibel scale. In this experiment, the assumption is accurate to within 1 dB for 93% of samples.

### B. Output model

Care is also required in analyzing the components of the output of a nonlinear system. Let $y\u0303[t]=r\u03031[t]+r\u03032[t]$, where $r\u03031[t]$ is the component of the output corresponding to $s\u03031[t]$ and $r\u03032[t]$ is the component corresponding to $s\u03032[t]$. For systems with the additivity property, like linear filters, these components can be calculated by applying the same system to $s\u03031$ and $s\u03032$. For nonlinear systems like DRC, each component of the output depends on all components of the input. In general, nonlinear distortion artifacts cannot be clearly attributed to one input signal or the other, and they cannot be easily classified as helpful or harmful to intelligibility (Ludvigsen, 1993). For the relatively mild compression used in hearing aids—compared to aggressive compression-based effects in electronic music, for example—a reasonable approach is to treat the nonlinear system as a time-varying linear system.

In this work, the output components are determined by calculating the level-dependent amplification sequence $g[t,b]$ based on the mixture $x[t,b]$, then applying it to each component,

for all time indices *t* and channels $b=1,\u2026,B$. This definition of the output components is used in the mathematical analysis below. Similarly, in the software simulations, the two input signals are stored in memory alongside their mixture and the amplification sequence is applied separately to each, allowing the output components to be computed exactly. This time-varying linear approach to computing output coefficients is conceptually related to the phase inversion technique of Hagerman and Olofsson (2004), which is often used in laboratory experiments with real hearing aids where the time-varying amplification sequence cannot be observed directly.

### C. Effective compression function

The additive models for the input envelopes and output signal components, while imperfect, allow us to study the dominant source of nonlinearity in a DRC system: the compression function. Although the signals $s1[t,b]$ and $s2[t,b]$ may have different levels, the amplification $g[t,b]$ applied to both of them is the same and is computed from the overall level of the input signal,

Under Assumption 1, the amplification is

resulting in the output levels

for channels $b=1,\u2026,B$. The gain and therefore the output levels are functions of both input signal levels, as illustrated in Fig. 5. The gain applied to $s1[t,b]$ in the presence of $s2[t,b]$ is weaker than it would have been for $s1[t,b]$ alone. To characterize this effect, we can define an effective compression function (ECF) that relates the input and output levels of one signal in the presence of another.

**Definition 3.** The ECF $C\u0302b(v1|v2)$ applied to a signal with level $v1>0$ in the presence of a signal with level $v2\u22650$ is given by

where $Cb(v)$ is the compression function applied to the mixture level $v1+v2$.

for $b=1,\u2026,B$. The ECF expresses the dependence between the levels of the two signal components. The ECF can be used to mathematically characterize the nonlinear interactions between signals in DRC systems, including the effective CR, the across-source modulation effect, and the SNR.

## IV. EFFECTIVE COMPRESSION PERFORMANCE

When DRC is applied to a mixture of multiple signals, it has a weaker effect on the dynamic range of each component signal than it would if they were processed independently. Intuitively, if a signal of interest is weaker than a noise source, then the noise level will determine the gain applied to both signals and the target signal will not be compressed. Even when the target signal has a higher level, the noise will cause the gain to decrease less than it should with respect to the target level.

To quantify the effect of noise on compression performance, we can measure the change in the output level of the target signal in response to a change in its input level and compare that relationship to the nominal CR. Even without noise, the long-term effective compression ratio (ECR) is generally lower than the nominal ratio because of the time-averaging effects of the envelope detector (Braida *et al.*, 1982; Stone and Moore, 1992). However, it has been observed that the ECR of a DRC system is further reduced in the presence of noise (Souza *et al.*, 2006). While this noise-induced reduction in CR has been previously measured as a long-term average for particular signals, here we show that it is a short-term effect caused by the concavity of the compression function. Although the magnitude of the reduction depends on the compression function and the signal characteristics, the effect occurs for every compression function and at every SNR.

Because the instantaneous CR can be infinite, we will instead use the effective compression slope, defined as the log-log slope of the ECF.

**Definition 4.** If $C\u0302b(v1|v2)$ is differentiable with respect to *v*_{1}, then the *effective compression slope* $CS\u0302b(v1|v2)$ is given by

Note that the effective compression slope defined here is a function of the two signal levels, so it provides a more complete description of the compression system than the long-term ECR. Furthermore, it only measures the instantaneous effects of interaction between signals, not the time-averaging effects of the envelope detector. The simplified envelope model allows us to analyze these two compression-weakening mechanisms separately.

### A. Noise reduces compression performance

Using the properties of the ECF, it can be shown that the effective compression slope from Definition 4 is always larger than the nominal compression slope from Definition 2—equivalently, the instantaneous ECR is always smaller than the nominal CR—meaning that, when applied to a mixture, the system is less compressive on each component signal than it would be if applied to the components separately,

Notably, this result applies to any pair of signal levels $vs1$ and $vs2$. Whereas the across-source modulation result of Sec. V relies on probabilistic averaging and the SNR result of Sec. VI uses time averaging, Eq. (23) holds for each individual envelope sample.

The proof relies on concavity. Because the lemmas and theorems in this work follow from the properties of compression functions, which act independently across time and frequency, the time and channel indices $[t,b]$ are omitted in their statements and proofs.

**Theorem 1**. *If a compression function* $C(v)$ *is differentiable at* $vx=v1+v2$*, then its effective compression slope satisfies*

*with equality if*$C(v)$*is linear or if*$v2=0$.

*Proof*. Because $C(v)$ is defined to be concave and non-negative for *v > 0*, it follows that

for all *v* at which $C$ is differentiable, with equality if $C$ is linear. The effective compression slope is given by

with equality if $C$ is linear or if $v2=0$. ◻

Suppose that *v*_{1} corresponds to a sound source of interest and *v*_{2} is the level of unwanted noise. The proof illustrates that the effective compression slope for the target increases with the level of the interfering signal. For example, in the limit as $v1/vx$ approaches 0, the slope from Eq. (28) approaches 1, so that the system applies linear gain to the target signal. At low SNR, the gain applied to both signals is determined by the noise. The theorem shows, however, that even at high SNR, the compression effect is slightly weaker.

### B. Experiments

Theorem 1 shows that under the simplified envelope model, noise always reduces the effect of compression on a signal of interest. To verify this result experimentally in a realistic system, the software DRC system described in Sec. II C was applied to a mixture of speech at a wideband level of 0 dB and varying levels of white Gaussian noise with a nominal CR of 3:1.

Figure 6 shows the effective compression performance of the system for three wideband SNRs. The dashed line shows the nominal compression function $Cb(v)=v1/3$ for all *b*. The solid curves are the ECFs $C\u0302b(vs1|vs2)$ predicted by the model for constant noise power $vs2$ equal to the variance of the Gaussian noise. The plotted points show speech input envelope samples and their corresponding output levels computed using the time-varying gain of the software DRC system. The curves align closely with the nominal compression function when the speech has higher level than the noise, but they are nearly linear when the noise has higher level.

The long-term ECR depends on the distribution of envelope samples. For target signals whose envelopes are usually above the noise level, the long-term ECR will be close to the nominal ratio. When the noise is usually more intense, as in the rightmost curve of Fig. 6, the long-term ECR will be close to unity. Using the method of Souza *et al.* (2006), which measures dynamic range between the 5th and 95th percentiles of input and output envelope samples, and averaging across signal bands, the long-term ECRs from the experiments here were 1.01 at –30 dB SNR, 1.17 at 0 dB, and 1.75 at +30 dB.

## V. ACROSS-SOURCE MODULATION DISTORTION

DRC creates distortion in mixtures because the presence of one signal alters the gain applied to another signal. It has been observed experimentally (Alexander and Masterson, 2015; Stone and Moore, 2004, 2007, 2008) that when two signals are mixed together and passed through a compressor, their output envelopes become negatively correlated: As one sound becomes louder, the other sound becomes quieter. The across-source modulation coeffient, a measure of this negative correlation, was found to be correlated with reduced speech intelligibility (Stone and Moore, 2007, 2008).

### A. Output levels are anticorrelated

The ECF can be used to show that if the input envelopes $vs1[t,b]$ and $vs2[t,b]$ are independent random processes, then the covariance between the output levels in each channel is negative,

The covariance is an ensemble mean over the distributions of the envelope samples $vr1[t,b]$ and $vr2[t,b]$. Although the covariance is often measured empirically using a time average, our mathematical analysis applies to each time index and channel independently.

We first show that the ECF is nondecreasing in one envelope and nonincreasing in the other.

**Lemma 1**. Any ECF $C\u0302(v1|v2)$ is nondecreasing in v_{1} and nonincreasing in v_{2} for $v1,v2\u22650$.

*Proof*. Because $C(v)$ is nondecreasing and *v*_{2} is non-negative, $C\u0302(v1|v2)=C(v1+v2)(v1/v1+v2)$ is the product of two nondecreasing functions of *v*_{1} and is therefore nondecreasing. Because $C(v)$ is concave and non-negative, $C(v)/v$ is nonincreasing for *v > *0. Then $C(v1+v2)/(v1+v2)$ is nonincreasing in *v*_{2}. ◻

Next, we will need the following result about functions of random variables. Let $E$ denote the expectation of a random variable, that is, its probabilistic mean.

**Lemma 2**. If *f*(*x*) is nondecreasing, *g*(*x*) is nonincreasing, *X* is a random variable, and $E[f(X)],\u2009E[g(X)]$, and $E[f(X)g(X)]$ exist, then

*Proof*. See Appendix A. ◻

We can now prove that independent envelopes become negatively correlated when compressed.

**Theorem 2**. *If* $C\u0302(v1|v2)$ *is an ECF and V _{1} and V_{2} are independent random variables, then*

*Proof*. Because $Cov(C\u0302(V1|V2),C\u0302(V2|V1))=E[C\u0302(V1|V2)C\u0302(V2|V1)]\u2212E[C\u0302(V1|V2)]E[C\u0302(V2|V1)]$, it is sufficient to show that

From Lemma 1, $C\u0302(v1|v2)$ is a nondecreasing function of *v*_{1} and a nonincreasing function of *v*_{2}. Let $E[X|Y]$ denote the conditional expectation of *X* given *Y*. From iterated expectation and application of Lemma 2, we have

Now, because *V*_{1} and *V*_{2} are independent, $EV1[C\u0302(V1|V2)|V2]$ is a nonincreasing function of *V*_{2} and $EV1[C\u0302(V2|V1)|V2]$ is a nondecreasing function of *V*_{2}. Applying Lemma 2 once more,

◻

For linear gain, the theorem holds with equality because $C\u0302(v1|v2)$ does not depend on *v*_{2}. The magnitude of the negative correlation depends on the compression function: Stronger compression causes the ECFs and the conditional expectations to increase or decrease more quickly, resulting in a stronger negative correlation. The channel structure and time constants of the envelope detector affect the correlation indirectly by altering the distributions of *V*_{1} and *V*_{2}.

### B. Experiments

To illustrate the negative correlation effect with a realistic envelope detector, the software DRC system from Sec. II C was applied to mixtures of speech and white Gaussian noise with a 3:1 CR. Each plot in Fig. 7 shows pairs of measured envelope samples for two signals: the input mixtures $(vs1[t,b],vs2[t,b])$ on the left and the output mixtures $(vr1[t,b],vr2[t,b])$ on the right. The top plots are for speech and white noise and the bottom plots are for two speech signals. The correlation coefficient *ρ* is computed on a linear scale and averaged across channels. The dashed curve shows the equilibrium level $vr1+vr2=1$, which indicates perfect negative correlation ($\rho =\u22121$).

The input levels are mostly uncorrelated between the two component signals, but DRC shifts the levels according to a vector field like that in Fig. 5, producing correlated output levels. Because the white noise has nearly constant envelope, the effect of DRC is most visible at high speech levels: When the speech signal is strong, both speech and noise are attenuated, bending the distribution of level pairs downward and producing a negative correlation. When the interfering signal is speech, which has a wide dynamic range, the signal components interact at all levels. At low instantaneous SNR, the weaker speech signal of interest is modulated according to the level of the stronger interfering speech.

## VI. OUTPUT SNR

Of the three nonlinear interaction effects discussed in this work, the impact of DRC on long-term SNR is both the most studied empirically and the most challenging to analyze mathematically. Hagerman and Olofsson (2004) showed that fast-acting compression improved the average output SNR of speech in babble noise at negative average input SNR but made it worse for positive input SNR. Souza *et al.* (2006) found that DRC reduced the SNR of speech in speech-shaped noise. Naylor and Johannesson (2009) showed that this effect depends on the type of noise, filterbank structure, and envelope time constants. Brons *et al.* (2015) and Miller *et al.* (2017) demonstrated the effect in hearing aids that include nonlinear noise reduction, which can interact with DRC in complex ways (Kortlang *et al.*, 2018). SNR changes appear to be more severe for fast compression (Alexander and Masterson, 2015; May *et al.*, 2018) and less severe with reverberation (Reinhart *et al.*, 2017).

It is important to remember that long-term SNR is not the same as intelligibility. Listening tests suggest that DRC can improve intelligibility with some types of noise but not others (Kowalewski *et al.*, 2018; Rhebergen *et al.*, 2017; Rhebergen *et al.*, 2009; Yund and Buckles, 1995).

### A. Output SNR for constant-envelope noise

While it is difficult to say much in general about the effect of compression on output SNR, we can prove a result for an important special case: a target signal with a time-varying envelope and a noise signal with constant envelope. Stationary white noise, for example, has constant variance, and its measured envelope fluctuates only slightly over time. Meanwhile, information-rich signals such as speech tend to vary rapidly. Many classic speech enhancement algorithms, such as spectral subtraction, assume that the noise spectrum is constant while the speech level varies (Loizou, 2013). When the mixture level is high, it is assumed that speech is present and the gain is increased, while at lower levels the output is attenuated to remove noise. Because these speech enhancement systems amplify high-level signals and attenuate low-level signals, they act as dynamic range *expanders*.

If a dynamic range expander can improve long-term SNR, it stands to reason that a compressor might make it worse. To see why, let us analyze the effect of compression on the average SNR over time. Because the envelope is proportional to the power of a signal component, the average SNR at the input is given by

and the average SNR at the output is

If the compression function were linear, then the input and output SNRs would be identical. For a concave compression function with convex gain, it can be shown that, if the noise envelope is constant, then the average output SNR is lower than the average input SNR,

Unlike in Sec. V, here the envelopes are not modeled as random processes and the quantities of interest are time averages, not ensemble averages.

The proofs in this section rely on an additional technical condition on the compression function. Not only must $Cb(v)$ be non-negative and concave, the gain function $Cb(v)/v$ must be convex. This condition is satisfied for many smooth compression functions, including linear, power-law, and logarithmic, but not for some functions with corners like that in Fig. 2. This condition ensures that the ECF is concave in its first argument and convex in its second.

**Lemma 3**. If $C(v)$ is a compression function and $C(v)/v$ is convex for all v > 0, then the ECF $C\u0302(v1|v2)$ is concave in v_{1} and convex in v_{2}.

*Proof*. See Appendix B. ◻

This property lets us take advantage of Jensen's inequality (Cover and Thomas, 2006), one form of which states that for any convex function *f*(*x*),

with equality if *f*(*x*) is linear or $x[t]$ is constant. The same property holds with the inequality reversed if *f*(*x*) is a concave function. Jensen's inequality allows us to prove that the average output SNR is no larger than the average input SNR.

**Theorem 3**. *If* $C(v)$ *is a compression function and* $C(v)/v$ *is convex for all v > *0, $v1[t]>0$ *for all t, and* $v2[t]=v\xaf2>0$ *for all t, then*

*with equality if*$v1[t]$*is constant or*$C$*is linear.*

*Proof*. Since $v2[t]$ is fixed, the output SNR can be written

The numerator is the mean over *t* of a concave function of $v1[t]$. By Jensen's inequality,

with equality when $C$ is linear or $v1[t]$ is constant. Similarly, the denominator is the mean over *t* of a convex function of $v1[t]$. Again applying Jensen's inequality,

with equality when $C$ is linear or $v1[t]$ is constant. Let $v\xaf1=meantv1[t]$. Since the numerator and denominator of Eq. (46) are both positive, we have

with equality when $C$ is linear or $v1[t]$ is constant. ◻

### B. Experiments

Because Theorem 3 requires stronger assumptions than the other theorems in this work, it is especially important to validate its predictions experimentally. Figure 8 compares the input and output SNRs for speech in white Gaussian noise—which has a relatively steady but not constant envelope—at different CRs. For this section, the software simulations use a knee-shaped compression function like that in Fig. 2, which is commonly used in hearing aids but violates the technical condition required for Lemma 3. The knee point is 40 dB below the wideband average speech level. Although the assumptions are violated, the results still show the behavior predicted by the theorem. The SNR-reducing effect is greatest at high input SNRs; at low input SNRs, the noise level determines the gain and the ECF is linear, so the SNR is not affected.

As an inequality, Theorem 3 does not predict the magnitude of SNR reduction, but the equality condition suggests that the effects are smaller for more linear compression functions. Indeed, the experiments show that higher CRs have stronger effects on SNR, which is consistent with results in the literature (Naylor and Johannesson, 2009; Rhebergen *et al.*, 2009).

Theorem 3 applies only to constant-envelope noise. When the target and noise signals both vary strongly with time, the weaker signal will be amplified more and the stronger signal less, pushing their average output levels closer together. Figure 9 shows the results of the SNR experiment with 3:1 knee-shaped compression and different noise types. With white noise, the long-term SNR is always reduced, as predicted by Theorem 3. With speech babble, generated by mixing 14 VCTK speech clips, the SNR is slightly increased at low input SNRs. When the target and interference signals are both single-talker speech signals, the long-term SNR is improved when it is negative but made worse when it is positive.

These results align well with those in the literature. Hagerman and Olofsson (2004) showed that fast-acting compression can improve negative SNRs but worsen SNRs near zero for speech in babble noise. Naylor and Johannesson (2009) found that output SNR is always reduced for speech in unmodulated noise, greatly reduced at positive input SNR and slightly reduced at negative input SNR for speech in modulated noise, and symmetrically increased at negative input SNR and decreased at positive input SNR for a mixture of two speech signals. Reinhart *et al.* (2017) performed experiments with different numbers of talkers and found that the SNR improvement at negative input SNRs declined with each additional interfering talker, consistent with the results for speech babble here.

## VII. DISCUSSION

The mathematical analysis presented here confirms the empirical evidence from the hearing literature that DRC causes unintended distortion in noise. The effects of this distortion depend on the characteristics of the signals, especially their relative levels. At low SNR, the ECF for the target signal becomes nearly linear and the dynamic range of that signal is not changed. At high SNR, the signal of interest is amplified by less than the noise, reducing average SNR. At all SNRs, the signal components modulate each other, compressing the weaker signal according to the level of the stronger component.

The theorems in this work apply to ideal envelopes that obey the additivity assumption. In that sense, they are optimistic predictions. Real DRC systems that use measured envelopes would exhibit even stronger interactions between signals. Further theoretial work is required to model distortion within the envelope detector, predict the effects of filterbank structure and envelope time constants, and show how these nonlinearities interact with those of the compression function.

Can anything be done to improve the performance of DRC systems in noise? The analysis shows that all these effects are caused by the concave curvature of the compression function, which is also what makes the system compressive. The results for effective compression performance and across-source modulation hold instantaneously, not just over time, and apply to any compression function and any combination of signals, even hypothetical ideal signals that have independent envelopes. It seems, then, that nonlinear interactions are inevitable whenever signals are compressed as a mixture.

A possible solution is to compress the component signals of a mixture independently, as music producers do when mixing instrumental and vocal recordings. Listening tests have shown improved intelligibility when signals are compressed before rather than after mixing (Rhebergen *et al.*, 2009; Stone and Moore, 2008). Of course, real hearing aids do not have access to the unmixed source signals, so a practical multisource compression system must perform source separation. Hassager *et al.* (2017) used a single-microphone classification method to separate direct from reverberant signal components, helping to preserve spatial cues that can be distorted by DRC. May *et al.* (2018) proposed a single-microphone separation system that applies fast-acting compression to speech components and slow-acting compression to noise components; listening experiments with an ideal separation algorithm improved both quality and intelligibility (Kowalewski *et al.*, 2020). Corey and Singer (2017) used a multimicrophone separation method to apply separate compression functions to each of several competing speech signals. The output exhibited better measures of across-source modulation distortion, effective compression performance, and SNR compared to a conventional system. The modeling framework described here could be applied to analyze the performance of these multisource compression systems and to devise new ones.

## VIII. CONCLUSIONS

The mathematical tools introduced in this work can help researchers to understand the distortion effects of conventional DRC systems in noise and to devise new approaches to nonlinear processing for mixtures of multiple signals. The additive envelope model allows the envelope detector and compression function to be analyzed independently, greatly reducing the complexity of the system. The ECF models interactions between signal envelopes at the input and output of any compression function, characterizing system behavior across all signal levels. It can be used to analyze instantaneous interactions or integrated into long-term or probabilistic models to study average effects.

Like the human auditory system itself, DRC is a complex nonlinear system that defies simple analysis. By modeling how DRC systems behave in the presence of noise, we can develop and analyze new strategies for nonlinear signal processing in the most challenging environments.

## ACKNOWLEDGMENTS

This research was supported by the National Science Foundation under Grant No. 1919257 and by an appointment to the Intelligence Community Postdoctoral Research Fellowship Program at the University of Illinois Urbana-Champaign, administered by Oak Ridge Institute for Science and Education through an interagency agreement between the U.S. Department of Energy and the Office of the Director of National Intelligence.

### APPENDIX A: PROOF OF LEMMA 2

**Lemma 2**. If *f*(*x*) is nondecreasing, *g*(*x*) is nonincreasing, *X* is a random variable, and $E[f(X)],\u2009E[g(X)]$, and $E[f(X)g(X)]$ exist, then

*Proof*. Because *f*(*x*) is nondecreasing and *g*(*x*) is nonincreasing, for every *x* and *y* we have

It is sufficient to show that $E[f(X)g(X)]\u2212E[f(X)]E[g(X)]\u22640$. If *X* has cumulative distribution function *P*(*x*), then

Equation (A5) swaps the order of integration using Fubini's theorem (Knapp, 2005) and line Eq. (A6) exchanges the integration variables *x* and *y*. ◻

### APPENDIX B: PROOF OF LEMMA 3

**Lemma 3**. If $C(v)$ is a compression function and $C(v)/v$ is convex for all v > 0, then the ECF $C\u0302(v1|v2)$ is concave in v_{1} and convex in v_{2}.

*Proof*. Starting with Definition 3 and letting $v1=\lambda p+(1\u2212\lambda )q$,

Because $C(v)$ is concave and $C(v)/v$ is convex,

Therefore, $C\u0302(v1|v2)$ is concave in *v*_{1}.

Similarly, letting $v2=\lambda p+(1\u2212\lambda )q$,

Therefore, $C\u0302(v1|v2)$ is convex in *v*_{2}. ◻