A wide variety of research and clinical assessments involve presenting speech stimuli in the presence of some kind of noise. Here, I selectively review two theoretical perspectives and discuss ways in which these perspectives may help researchers understand the consequences for listeners of adding noise to a speech signal. I argue that adding noise changes more about the listening task than merely making the signal more difficult to perceive. To fully understand the effects of an added noise on speech perception, we must consider not just how much the noise affects task difficulty, but also how it affects all of the systems involved in understanding speech: increasing message uncertainty, modifying attentional demand, altering affective response, and changing motivation to perform the task.

In the study of speech perception, when there is a need to make a listening task more difficult, a nearly ubiquitous method among researchers is simply to add noise. The added sound may be white noise (Sommers, 1996), speech-shaped noise (Banks , 2015; Holt , 2011; Surprenant and Watson, 2001), environmental noise (Klatte , 2007; Meyer , 2013), or “cafeteria” noise (Tajima , 1997), but it could also be multitalker babble of various sorts (Imai , 2005; Sommers , 2005) or even the speech of a single competing talker (Freyman , 2001). In these examples, we already see some potential problems with the use of the term “noise,” as it seems to cover a huge variety of acoustically distinct sounds. From a casual, mostly vernacular perspective, we might expect noise to have certain specific acoustic properties: perhaps high intensity, a broad frequency distribution, and little or no harmonicity. It might also have some affective qualities: It is probably annoying or aversive, and it certainly interferes with whatever it is we're trying to do. In speech perception research, there might be an even more general understanding of noise as any added sound that distorts or obscures or physically alters a target signal. The ubiquitous use of the term “masker” in speech perception research to refer to the added signal when discussing experimental methods reflects this perspective very clearly—a noise is then any signal that masks or obscures properties of a target sound. In addition, however, we can also include added sounds that interfere with speech perception even when they do not mask a target sound (Freyman , 2001; Ihlefeld and Shinn-Cunningham, 2008a, 2008b), as well as cases in which two sounds that provide the same amount of masking nevertheless have different effects on listeners (Helfer , 2010; Tun , 2002). Thus, there are many things that might constitute noise. Similarly, there are many reasons to add noise to a signal.

In many cases, the goal of studies involving added noise is to investigate how different properties of the noise itself may make speech perception or other behaviors more difficult. Such studies include research investigating informational masking (Dai , 2017; Francis , 2016; Freyman , 2001, 2007; Helfer and Freyman, 2014; Kidd , 2016; Kidd and Colburn, 2017; Koelewijn , 2015) and the role of glimpsing in perception of speech in noise (Culling and Stone, 2017; Stone and Canavan, 2016; Summers and Molis, 2004), studies of listening effort (Francis , 2016, 2021; Krueger , 2017; Ohlenforst , 2018; Sarampalis , 2009), simulations of hearing loss in typically hearing listeners (Bernarding , 2011; Humes , 1987; Richie , 2005; Sommers and Humes, 1993), and studies of noise sensitivity and annoyance, especially in realistic auditory environments (Lee , 2017; Lindvall and Västfjäll, 2013; Love , 2021).

In general, however, the primary consideration when adding noise seems to be that doing so makes the task more difficult, even if the research question is focused on determining how such a noise makes the task more difficult. And often noise level, or signal-to-noise ratio (SNR), is used as a proxy for difficulty, assuming that the main determiner of difficulty is how much the noise masks the target sound. Here, I argue that adding noise means more than masking and much more than just increasing task difficulty for listeners. I do not mean to suggest that adding sound to a target speech signal does not make it more difficult to understand. Nor do I mean to suggest that one should not add noise to target speech or that the nature of the noise should not matter. Rather, I wish to highlight some of the ways in which adding sound to target signal can fundamentally change how listeners process what they are hearing in ways that are not fully captured by just considering changes to the difficulty of the listening task itself and to suggest that these factors require us to think more deeply about how we interpret the results of studies on the perception of speech in noise.

Historically speaking, the use of the term noise in studies of speech perception seems to build on two potentially distinct concepts: noise as a source of uncertainty and noise as an unwanted sound. The more vernacular conceptualization of noise as a masking sound draws on both of these traditions to some degree, or, perhaps more properly, both of these traditions emphasize different aspects of the idea of noise as a masker (maskers obscure or distort the signal, making it less certain, but they are also annoying and unwanted). In any case, it is likely that modern researchers generally do not make these distinctions nearly as strongly as I will here for illustrative purposes, but making this distinction is a useful starting point for considering how our current thinking about noise might be worth reconsidering.

Within the study of speech perception, the historically dominant conceptualization of noise is found in the information-theoretic sense of a source of increased uncertainty regarding the identity of a transmitted signal (Fano, 1950; Shannon, 1949). Researchers initially working in this area were largely concerned with the problem of determining how reliable an information transmission channel could be, with particular focus on radio and telephony (Pierce, 1973). Thus, in the most basic sense, the “noise” that Shannon and colleagues were concerned with was not even necessarily acoustic. Indeed, given that the transmission channel was initially thought of as perhaps “… a pair of wires, a coaxial cable, a band of radio frequencies, etc.” (Shannon, 1949), the location of noise in the transmission channel as shown in Fig. 1 [after Shannon (1949), Fig. 1] suggests that acoustic noise was not necessarily a major consideration at the time. The addition of noise in this sense, then, refers simply to an increase in uncertainty of signal reception of some sort and could in principle include any source of uncertainty, even occurring elsewhere in the transmission chain, from source to receiver (inclusive).

FIG. 1.

Noise as a source of uncertainty in message transmission. After Fig. 1 of Shannon (1949).

FIG. 1.

Noise as a source of uncertainty in message transmission. After Fig. 1 of Shannon (1949).

Close modal

1. Additional uncertainty elsewhere in the speech chain

Perhaps due to the focus on transmission problems, there does not seem to have been strong consideration of what it might mean for uncertainty to arise at the transmitter (talker) or receiver level (i.e., entirely within the ear and brain of the listener). Although Shannon (1949) does mention the possibility of destination noise, early theorists seem not to have been as concerned as modern researchers with how perceptual uncertainty might arise from properties of the talker (information source) or within the mind or brain of a listener (receiver). For example, although Peterson (1952) considers the effects of transmission channel noise on the perception of the most basic acoustic properties of speech, his characterization of the process for perception of higher-level properties (phonetic features, phonemes, words, etc.) makes no mention of the possibility of increased uncertainty in these receiver-based domains (Peterson, 1952). Nevertheless, there is nothing in the information-theoretic conceptualization of noise that requires that uncertainty (noise) arise in the transmission channel alone. I highlight this point because a considerable variety of recent studies have begun to investigate challenges to speech perception that arise either prior to or later than the transmission channel, and there seems to be some trend toward treating these seemingly disparate challenges as representative of a single overarching problem (i.e., adverse conditions) (Mattys , 2012). It seems to me that this broader focus makes very good sense, as these conditions could quite plausibly be all considered simply sources of increased uncertainty, also known as noise.

For example, properties of a particular talker's speech may increase the uncertainty of message reception, as in the case of a talker with dysarthria or other motor-speech difficulties (Bent , 2016; Borrie , 2017) or a talker who is using an unfamiliar (to the listener) dialect (Clopper, 2021) or accent (Baese-Berk , 2020; Van Engen and Peelle, 2014). All of these serve to increase the listener's uncertainty about what speech sounds the talker is intending to produce.

Similarly, much of what makes speech perception difficult for individuals with hearing impairment could be considered in terms of increased receiver-located uncertainty. Changes in peripheral auditory function due to age (presbycusis), such as attenuation of higher frequencies, loudness compression, broadening of critical bands, and temporal jitter (among other consequences), all increase uncertainty in speech perception, though probably to different degrees at different stages of auditory/cognitive processing or with respect to different aspects of speech understanding (Gordon-Salant , 2006; Humes and Dubno, 2010; Pichora-Fuller and Souza, 2003). Thus, we might also consider uncertainty arising at either the talker or listener stages as a kind of “added noise” in the information-theoretic sense, even though it does not typically involve the addition of acoustic noise as such.

Taking this perspective would enable us to shift the emphasis of the investigation from the origin of the challenge (i.e., source, transmission channel, or receiver) to the cognitive/linguistic/neural mechanisms that are affected by the challenge. For example, a number of recent studies have attempted to explicitly distinguish mechanisms of speech perception that might be differently engaged in response to different sources of interference, especially talker-related vs transmission channel-related (acoustic) uncertainty (Adank , 2012; Alain , 2018; Francis , 2021; McLaughlin , 2018). Such studies have found significant and potentially meaningful differences in patterns of listeners' responses, but, as I have argued previously (Francis , 2021) and will argue in more detail later in this paper, although these findings suggest that listeners engage different processing mechanisms to cope with each challenge (Calandruccio , 2010; McLaughlin , 2018; Viswanathan, 2021), these differences may depend more on what stage(s) of mental processing is affected by the increased uncertainty than on the location or origin of the interference in the transmission channel.

In summary, there are significant benefits to considering within a single theoretical framework such seemingly disparate challenges to speech understanding as those arising at the source, transmission channel, and receiver of the spoken message, and the information-theoretic concept of noise as increased uncertainty in message recognition serves that purpose well. Researchers investigating the mechanisms that listeners use to cope with uncertainty have identified similarities as well as differences in the cognitive and neural processes involved in dealing with various sorts of “adverse conditions” (Borrie , 2017; Francis , 2021; Guediche , 2014; Mattys , 2012; McLaughlin , 2018), suggesting that listeners process the consequences of some sources of increased uncertainty independently of the stage of transmission at which they arise. In Sec. IV, I will argue that what likely matters most is not the nature of the source of uncertainty itself, but rather how any given source of uncertainty interferes with the mental processes that must be engaged to understand a spoken message. First, however, we must consider the other commonly understood meaning of the term “noise.”

In addition to the information-theoretic concept of noise as uncertainty, one can also find definitions of noise as unwanted sound (Basner , 2014; Fink, 2019; Richards, 1935). This characterization becomes particularly useful and important when relating the study of speech perception in noise to other considerations of the effects of noise on humans (Babisch , 2013; Basner , 2014; Kryter, 1972, 2013; Pretzsch , 2021). Here, I will argue that adopting this conceptualization of noise ultimately highlights the importance of a wide range of factors that must be considered carefully when thinking about how listeners accomplish speech perception in noise.

To begin with, though, it is important to note that characterizing noise as unwanted sound is fully compatible with the information-theoretic one precisely (though perhaps only) in the idealized experimental case in which a single listener is instructed to attend to and repeat as accurately as possible the speech of a single talker and in which the listener is assumed to be trying to do so to the best of their ability. In such a context, the goal of listening is solely to accurately receive and repeat a specific message. Thus, any sound whose presence increases the uncertainty of that reception is, by definition, unwanted (i.e., in conflict with the desired goal). In this default case, any interfering sound is by definition unwanted simply because it interferes with achieving a goal (i.e., correctly repeating the target speech). However, as I will argue in Sec. II B 1, “unwantedness” and “interference” can be considered distinctly, and doing so sets the stage for developing a much richer understanding of what is involved in listening to speech in the presence of noise.

1. Interference is unwanted

I start with separating the question of how a signal might interfere with reception of a target signal from the question of why a signal might be unwanted. The first is, obviously, one of the primary concerns of research involving speech perception in the presence of other sounds, and considerable work has already been done to understand the different ways in which properties of non-target sounds interfere with the perception of target speech. Beyond the obvious effects of acoustic interference (“energetic masking”), there is also “informational masking,” which, if anything, is an even more complex and less completely understood phenomenon (Durlach , 2003; Kidd , 2008; Kidd and Colburn, 2017; Shinn-Cunningham, 2008; Watson, 2005; Yost, 2007). Rather than going into further detail on this very large and sophisticated body of work, I will simply adopt the premise that there are many ways for one sound to interfere with the reception of another, and, as we shall see, some of these different manners of interference may have different consequences for listeners' response to the presence of interfering sounds.

With respect to why a signal might be unwanted, the general assumption seems to be that, if a sound interferes with accomplishing a listening task, then it will be unwanted. There are two reasonably common versions of this: one in which the sound actually reduces the likelihood of accomplishing the task (i.e., interference affects performance) and one in which the sound simply forces the listener to apply more mental computation to accomplish the goal (i.e., interference affects effort). This is an important distinction in recent research on listening effort, in which researchers have determined that even when two listening conditions do not differentially affect performance (i.e., target signal reception), they can nevertheless still strongly differ in how they affect listeners' assessment of how hard it is to accomplish the listening task. Such different conditions can also induce different patterns of physiological markers associated with attention and effort (Alhanbali , 2019; Brown and Strand, 2019; Francis and Love, 2020; McGarrigle , 2014; Peelle, 2018; Sarampalis , 2009). For present purposes, though, it is sufficient to note that there is a broad assumption that the unwantedness of a noise is presumed to be directly (though perhaps not linearly) related to the degree to which it interferes with recognizing the target speech. While this is, again, consistent with the information-theoretic idea of noise as a source of uncertainty and is therefore quite reasonable in the context of the default experimental paradigm in which the listener's goal is assumed to be entirely and only to accurately recognize the target speech, it is by no means the only reason that a sound may be unwanted. And, as I will describe below, the fact that a sound may be unwanted has consequences for how listeners behave, whether or not it introduces additional uncertainty in target speech recognition.

2. Unpleasant sounds are unwanted

In addition to being unwanted because it causes interference, a sound may be unwanted because it is intrinsically unpleasant. A wide variety of sounds are considered unpleasant by most people, though even here there is considerable variability across individuals (Cox, 2008). Psychoacoustic and noise quality research suggests that some acoustic properties may contribute to a sound's unpleasantness. For example, “griding” sounds (a metal tool scraped across slate, for example, or fingernails on a chalkboard) tend to have predominant energy in the 2500–5500 Hz range with a low (1–16 Hz) fluctuation (Halpern , 1986; Kumar , 2008). These sounds may be specifically unpleasant because they engage primitive alerting systems (Kumar , 2008) or evoke unpleasant haptic sensations (Cox, 2008; Ely, 1975). More generally, though, unpleasant sounds tend to have higher loudness (Zwicker and Fastl, 1999), sharpness (high frequency emphasis; von Bismarck, 1974), roughness (amplitude and frequency fluctuations in the 15–300 Hz range; Daniel and Weber, 1997; Vitale , 2020), fluctuation (quasi-periodic variability of amplitude and frequency in the 1–20 Hz range; Zwicker and Fastl, 1999), and tonality (related to harmonic-to-noise ratio; Lee , 2017).

In summary, the unpleasantness of many kinds of sounds has been attributed to basic psychophysiological responses to either their acoustic properties themselves (Kumar , 2008) or physical properties of actions associated with the generation of those acoustic properties (Cox, 2008) and has also been attributed to cultural and learned associations (Cox, 2008; Kumar , 2008). Importantly for present purposes, however, these sounds are considered to be unpleasant (and hence unwanted) irrespective of the degree to which they might interfere with a speech perception task. While this supposition warrants future research, considering that many of the acoustic properties that are identified as likely to be more annoying may also make them more similar to speech, this characterization suggests that unpleasantness alone might cause sounds to interfere with a listening task even if there is no direct interference with the acoustic signal of the target speech. That is, rather than becoming unwanted because they interfere with achieving a goal, intrinsically unpleasant sounds might interfere with achieving a goal precisely because they are unpleasant and therefore distracting. Although, to my knowledge, this hypothesis has yet to be tested directly, such distraction-based unwantedness may also occur with other sounds (De Coensel , 2009), not just intrinsically unpleasant ones (e.g., the barely heard sounds of an exciting movie heard through the wall from a neighboring apartment whilst one is trying to study), as I will discuss in Sec. III.

Here, I will use the term “auditory scene” to refer to the sonic environment a listener is exposed to in a given instance, including signals that are to be attended to (targets) and those that should be ignored (maskers or distractors). In principle, this term is similar to that of “soundscape,” though soundscapes are often considered over longer spans of time (minutes or hours) and often include some sense of social and environmental context as well as more subjective qualities than are intended to be included here (Botteldooren , 2018). In this section, I will argue that, once we introduce into the auditory scene a sound that can be distinguished from the target speech, many additional factors must be considered before we can draw strong conclusions about how the noise is interfering with recognition of the speech or why the noise may be unwanted.

The auditory system, like all sensory systems, fulfills a specific (range of) ecological function (Ramsier and Rauschecker, 2017). Of particular note for present purposes is the function of early warning (alerting). In humans and other primates, audition is sensitive over relatively large distances (unlike taste, touch, and internally directed senses, such as interoception and proprioception) and functions relatively effectively over an extremely wide angular range (unlike vision). It is also good at identifying the spatial origin of a signal (unlike olfaction), though not as good as vision, and is also always operating, even during sleep (like olfaction, but unlike vision). It thus arguably serves as an important “early warning system,” sensitive to the occurrence of events in the environment over large distances and essentially in all directions (Murphy , 2017), evaluating their relevance for action (Asutay and Västfjäll, 2016), and guiding action appropriately (Arnott and Alain, 2011). Therefore, when we change properties of an auditory scene, we must take into account how an auditory/perceptual system that is optimized for acquiring information about the world would treat that new scene. When we add noise to a target signal, we cannot simply assume that a listener will treat the new auditory scene as consisting simply of the same sound source that is inexplicably more difficult to recognize.

The concept of an auditory object has been discussed at length in previous work (Griffiths and Warren, 2004; Kubovy and Van Valkenburg, 2001; Shinn-Cunningham , 2017; Shinn-Cunningham, 2008). For present purposes, we can take this term to mean something like the mental representation of a set of acoustic phenomena that share sufficient spectrotemporal properties to be attributable to a single distal source and toward which attention may be directed (Alain and Arnott, 2000; Bizley and Cohen, 2013; Shinn-Cunningham , 2017). While the formation of an auditory object likely involves both pre-attentive and attentive processes (Backer and Alain, 2012; Fritz , 2007; Shamma , 2011; Shinn-Cunningham, 2008), once formed it is likely that auditory objects must be attended to some degree, at least while spare capacity remains available (Fairnie , 2016; Francis, 2010; Lavie, 2005) and perhaps even irrespective of the availability of perceptual capacity (Murphy , 2017). Therefore, an added noise can, simply by virtue of being represented as a distinct auditory object, cause the listener to engage cognitively with the auditory scene in a fundamentally different manner when the noise is present from when it is not. To the extent that an acoustic signal that is added to that of a target speech stream is spectrotemporally structured in a way that allows it to be treated as a separate auditory object, it becomes something to which attention may or even must be paid.

1. Auditory object formation

In the analysis of auditory scenes (Bregman, 1990), a variety of principles appear to govern, or at least strongly guide, the process by which the spectrotemporal properties that reach the ear are grouped together into separate objects (i.e., are attributed to separate distal sources) [see Shinn-Cunningham (2017) for a brief summary]. The formation of auditory objects is not necessarily perfect or complete, such that, for example, some spectrotemporal properties may be perceived both as distinct objects and as parts of a more complex object (e.g., the phenomenon of “duplex perception”; cf. Ciocca and Bregman, 1989), and it is possible that some spectrotemporal properties may remain unassigned to specific objects [see discussion by Shinn-Cunningham (2017) and Shinn-Cunningham (2007)]. However, when noise is present in a signal, two crucial questions arise: (1) Is the added noise structured in such a way as to engage the pre-attentive and attentional mechanisms underlying auditory object formation, and, if so, (2) how might the listener respond to the presence of an additional auditory object that is not the target?

To address the first question, I first consider work on the perception of auditory scenes (e.g., Bregman, 1990; McDermott, 2009; Shamma , 2011; Shinn-Cunningham , 2017) as well as more recent work characterizing the perception of auditory objects from an information-theoretic perspective (Kluender , 2019; Stilp , 2018). A thorough discussion of auditory scene analysis is beyond the scope of the present article, but a few basic generalizations are still possible. Frequency components that are temporally similar (i.e., have similar onset and/or offset timing or are amplitude modulated at the same rate) or are related in frequency (e.g., harmonics of the same fundamental) or are perceived as sharing a spatial location all tend to be grouped perceptually and perceived as separate streams or objects (Darwin, 1997; Shamma , 2011; Shinn-Cunningham , 2017). In particular, then, it seems as if acoustic phenomena that are more similar to one another and/or that are more predictable from one another will be perceived as belonging to the same auditory object.

The process of auditory object formation can therefore be seen as a particular instance of predictive processing, the extraction of regularities in the environment in the service of generating a hypothesis about the current and future state of the sensory environment (Kluender , 2019; Stilp , 2018; Winkler , 2009). As described by Stilp (2018), from this perspective, we can then consider the perception of speech in noise as a task requiring the detection of auditory objects with greater spectrotemporal regularity or structure (speech) with a context of sound(s) with a lower degree of structure (noise). To consider a few commonly used types of interfering signals, in the case of white noise, there is minimal structure in either time or frequency, while a single competing talker would have essentially the same degree of spectrotemporal structure as the target, and other sorts of maskers (e.g., multitalker babble, speech-shaped and -modulated noise) would be intermediate between the two [see discussion by Stilp (2022)]. That means, however, that the degree to which a noise is perceived as a distinct auditory object depends on its spectrotemporal regularity. If we want to minimize the “objectness” of a noise, we must use sound that is high in entropy (low in information, or spectrotemporal regularity) (Kluender , 2019). In Sec. IV, I address the question of how listeners might respond to a task in which there is more than one object detectable in the auditory scene, but it is important to note that future research should also consider the consequences of listening to poorly formed or incomplete objects as well.

To understand all of the effects of an additional auditory object on listeners, it is first necessary to consider what we know about how the auditory scene is processed by listeners. A useful figure derived from work by Hari Bharadwaj (2022) is shown in Fig. 2, elaborating on work by Shinn-Cunningham (2017). This schematic shows a series of stages of processing from initial acquisition of the signal at the cochlea (bottom) through to some abstract cognitive-linguistic stage of message understanding (top). The earlier (lower) stages correspond roughly to those referred to by Shinn-Cunningham (2008) as (1) object formation and (2) object selection, while the topmost stage represents (3) the “coping processes” or “repair strategies.” A similar mapping may be made to the processes described by Edwards (2016) based on the model of Rönnberg (2013), with stages (1) and (2) corresponding to mechanisms of external attention and (3) to internal attention [see discussion by Francis and Love (2020) and Strauss and Francis (2017); see also Heald and Nusbaum (2014)].

FIG. 2.

(Color online) Illustration of major stages in object formation and selection from Hari Bharadwaj (p.c.), identifying applications of attention and added marking of important stages for considering the effects of noise: (1) prior to/during object formation; (2) direction of selective attention to auditory objects; (3) higher-level processing of perceived objects.

FIG. 2.

(Color online) Illustration of major stages in object formation and selection from Hari Bharadwaj (p.c.), identifying applications of attention and added marking of important stages for considering the effects of noise: (1) prior to/during object formation; (2) direction of selective attention to auditory objects; (3) higher-level processing of perceived objects.

Close modal

What is important here is that there exist multiple stages of processing at which the presence of an added noise may interact with speech perception, these interactions may have different consequences for perception, and those consequences are what determine measurable outcomes [i.e., recognition performance, various indices of listening effort, etc. (Francis , 2016; Francis and Love, 2020; Francis and Oliver, 2018)].

For example, in the prototypical case in which a steady-state broadband speech-shaped noise is presented from the same speaker as a target signal, the noise may not be sufficiently distinguishable from the target for it to be treated as a separate auditory object in the object formation stage (point 1). Indeed, it seems likely that even perceived spatial separation alone is not sufficient to enable the formation of a completely distinct auditory object out of broadband noise (Freyman , 1999, 2001). If an added noise exhibits properties that help listeners segregate it from the auditory scene as a distinct object, for example, by having an identifiable onset or offset, spatial origin, or amplitude modulation (Bregman, 1990; Darwin, 1997; Shinn-Cunningham , 2017) or perhaps by otherwise having lower entropy than the background it appears in (Stilp , 2018), then it will be more likely to be treated as a separate object. The more object-like a concurrent sound is, the more likely it is to enable low-level processes of object formation and selection to improve message reception by allowing the listener to segregate or “stream off” (Bregman, 1990) the target speech from the noise more effectively (Gordon, 2000). If, however, the added noise cannot be successfully segregated from the target, as seems likely in the case of broadband steady-state noise, the representation of the target signal will contain a mixture of target and masker, leading to greater uncertainty in recognizing the linguistic elements of the signal at point 3. Such a mixed signal would support a larger number of possible, plausibly valid interpretations, which in turn would increase cognitive demand, reduce recognition accuracy, and/or introduce greater need for the application of postperceptual repair strategies to correct misperceptions [see discussion by Francis and Love (2020)].

In contrast, if the added noise can be successfully streamed off, as for example in the case of a target speech signal produced by one talker presented in the context of masking speech produced by a single, different, easily distinguished talker emanating from a distinct spatial location, the listener may be quite successful at separating the two sound sources into distinct auditory objects. The formation of separate auditory objects for target and masker in turn would facilitate the application of selective attention to suppress the representation of the distracting speech, causing the target signal to be represented relatively unambiguously and therefore recognized with little error or need for repair strategies. The relative difficulty that listeners have with treating target and masker signals as separate auditory objects likely underlies much of the distinction between energetic and informational masking (Brungart, 2005). In addition, however, because in this case the masking speech constitutes an intelligible speech signal, it plausibly demands the allocation of some attention as well (Lavie, 2005; Wöstmann and Obleser, 2016) and is likely to be processed and remembered if it is sufficiently clear (Wöstmann and Obleser, 2016) and the listener has sufficient attentional capacity (Tun , 2002).

Thus, a well-segregated single-talker masker and a poorly segregated broadband noise masker both are likely to interfere with speech perception, but in very different ways with potentially dissociable consequences for listeners. The broadband noise may primarily affect demand for the application of postperceptual repair strategies, while highly intelligible competing speech may place greater demand on the allocation of spatial selective attention and possibly late-occurring linguistic/memory processes [see, for example, discussion of two types of attentional effort by Strauss and Francis (2017) and discussion of post-perceptual processing by Winn and Teece (2021)].

Considering the consequences of different additive noise scenarios can also explain seemingly paradoxical cases, such as the benefit to performance that is sometimes observed with added low-level noise (Zeng , 2000) and noise masking in the workplace (Haapakangas , 2011). In the first case, benefits seem to accrue because of nonlinear changes to low-level neural responsivity in the presence of noisy input (Alain , 2009; Moss , 2004), and such effects are even seen when noise is added in other perceptual modalities, including somatosensation (Wuehr , 2018), though it is possible that these benefits only accrue in threshold or near-threshold perception. In the second case, the presence of noise seems to help to obscure task-irrelevant acoustic properties of the distractors, which may consist of footsteps, others' conversations, and even telephones. All of these may constitute “notice events” in the sense of De Coensel (2009) [see Love (2018)], yet they are either not recognized as a separate auditory object or are recognized as benign or even necessary ones [Haapakangas , 2011; Loewen and Suedfeld, 1992; Veitch , 2002; though see Lenne (2020)]. By raising the noise floor without enabling the perception of the noise as a distinct object, a noise masker may reduce the ability of task-irrelevant sounds to capture attention in a task-detrimental manner without incurring an emotional response to the presence of the masker itself as an unwanted sound [though cf. Lenne (2020)].

To understand how listeners might respond to the presence of a non-target auditory object during a speech perception task, we must consider the listener as a fully functioning organism. I have already introduced this idea in discussing the ecology of hearing, but here I extend that discussion to consider the function of perception more broadly. From an ethological perspective, the role of a nervous system is to facilitate action, to enable the organism to efficiently move toward desirable conditions and away from undesirable ones (Lang, 2000; Yost, 2007). The role of attention, then, is to orient toward properties of the environment that are relevant for making decisions about action (Bradley, 2009; Raymond, 2009). Thus, the fact that listeners can, or even must, direct attention toward auditory objects suggests that such objects engage relatively high-level decision-making processes—processes that are integrated with affective mechanisms associated with emotion and motivation.

Attention is typically conceived of as a limited capacity mechanism for selecting phenomena for further cognitive processing (Driver, 2001; Kahneman, 1973), modulating their relevance in pursuit of a particular goal (Bradley, 2009; Eckert , 2016; Raymond, 2009), and thereby contributing to how well the selected information can be processed (Chun , 2011; Wild , 2012). As such, the typical assumption is that exerting attention is perceived as effortful (Kahneman, 1973). It is in fact quite likely that many contexts in which attention is engaged are not in fact perceived as effortful [for example, puzzles and games and activities involving flow (Nakamura and Csikszentmihalyi, 2016 ); see Bruya and Tang (2018) and Inzlicht (2018) for discussion], and this has important implications for understanding the role of motivation in speech perception in adverse conditions, as discussed below. However, within the context of the kind of high-effort, high-attention tasks typically employed in speech perception research, increasing the complexity of the auditory scene increases demand on attentional processes as indicated by decreased susceptibility to interference from auditory distractors (Bertoli and Bodmer, 2014; Fairnie , 2016; Francis, 2010) and also by physiological markers associated with effort, especially the pupil dilation response.

The pupil dilation response refers to momentary, illumination-independent, event-related increase in the diameter of the pupil of the eye. The pupil dilation response is an autonomic response originally associated with general arousal related to task demand (Kahneman, 1973) and engagement (Kahneman , 1968). Subsequent research supports a connection to the allocation of selective attention (Wierda , 2012) and cognitive effort (Beatty, 1982; Granholm , 1996; Kahneman and Beatty, 1966), including listening effort (Kramer , 1997; Winn , 2018; Zekveld , 2018). Listening effort-related pupil dilation is strongly associated with activation in the locus ceruleus–norepinephrine (LC-NE) system (Koelewijn , 2015; Peelle, 2018; Wang , 2018). The LC-NE system, in turn, is associated with the deployment of cognitive resources (Aston-Jones and Cohen, 2005; Gilzenrat , 2010), further supporting the idea that increased pupil dilation under adverse listening conditions reflects greater mobilization of limited cognitive resources. Thus, once a noise is perceived as an auditory object, even before its relevance is evaluated, it becomes a target for attention and therefore also a potential source of effort.

1. Divided attention

In addition to the likelihood that even the mere existence of a non-target object in the auditory scene increases listening effort, we must also consider the value ascribed to that extraneous object by the listener within the ethological context of making decisions about moving toward positive stimuli and away from negative ones. Again, in the traditional case, any signal that is not the target that the listener is instructed to attend to is considered unwanted because, as we have established, it diverts attention from goal performance. However, I would argue the goal of a listener in daily life is rarely to attend to a single message to the exclusion of all other percepts (Winn and Teece, 2021; Zhang , 2021). Even in conditions in which one chooses to listen attentively to a single talker, for example, when attending a lecture or watching a movie or a play, one is very likely still alert for the possibility of hearing other sounds—the muttered comment of a skeptical colleague, for example, or a sound cue from off stage signaling the entrance of a new character.

Moreover, many instances of speech communication occur in complex auditory environments, such as while walking down a busy street or in locations such as restaurants, coffee shops, or the proverbial “cocktail party”; environments in which listeners may want or even need to attend to “noise” as well as to target speech. Although the original “cocktail party” research by Colin Cherry (Cherry, 1953) focused on the ability to attend to one stream to the exclusion of another, subsequent research quickly showed that listeners were often incapable of completely shutting out an irrelevant signal (Broadbent, 1952; Moray, 1959), leading to decades of research and debate on the nature of selective attention (Driver, 2001; Lavie, 2005; Price and Moncrieff, 2021; Shinn-Cunningham and Best, 2015). Currently, there is a general consensus that information can intrude from “unattended” channels (Broadbent, 1982) and that the degree to which this happens depends on a wide variety of properties of the signal(s) and the listener (Aydelott , 2015; Bargh, 1982; Vachon , 2020). In particular, though, it appears that signals that have a particular significance to the listeners [i.e., their name or ringtone (Moray, 1959; Röer , 2013, 2014)] are more likely to draw attention to themselves and thereby incur greater processing demands. While some research suggests this could be the case for emotional or unpleasant stimuli as well (Broadbent, 1977), there is also evidence that listeners tend to habituate to intrusive sounds (Banbury and Berry, 1997; Martin-Soelch , 2006), and even observing and hearing an individual scraping their nails down a chalkboard may be missed if attention is sufficiently occupied by another task (Wayand , 2005). Thus, it seems very likely that when the auditory scene includes sounds that can be perceived as distinct from the target speech, listeners will devote at least some attentional processing to them even when doing so incurs a greater demand on cognitive processing, but other circumstances, such as habituation and overall attentional capacity demand, are likely to be important as well.

2. Involuntary capture of attention

Auditory objects that attract attention and demand cognitive processing resources during the accomplishment of other tasks may also cause aversive responses, such as annoyance and distress, even when the primary task does not involve audition at all (Keus van de Poll and Sörqvist, 2016; Marsh , 2018), suggesting that listeners also evaluate non-target stimuli in terms of their orientation toward the task. For example, in a workplace context, sounds that listeners feel are not necessary tend to be identified as more annoying, while those that listeners feel they have less control over are perceived as more distracting (Kjellberg , 1996). Thus, the capture of attention by irrelevant sounds not only reduces the cognitive capacity available for processing task-relevant information (Fairnie , 2016; Francis, 2010), it engages emotional/evaluative processes that could result in an aversive emotional response due to frustration with being unnecessarily or uncontrollably distracted from a primary task (Haapakangas , 2011; Kjellberg , 1996; Röer , 2014; Sörqvist, 2014).

In the case of intrinsically unpleasant sounds, i.e., sounds that in themselves cause an aversive response in the listener (whether for physiological or learned reasons), the involuntary capture of attention may introduce an added level of negative response, as the unavoidable sound is perceived as intolerable. In the extreme case of misophonia (Jastreboff and Jastreboff, 2015; Kumar , 2017), triggering sounds seem to elicit an involuntary autonomic response that is interpreted as emotionally meaningful, e.g., disgusting or enraging. The heightened negative affective response then increases attention toward trigger sounds because emotion heightens sensory attention and predictions about the sensory environment (Smout , 2019). Increased attention to triggering sounds, in turn, leads to stronger autonomic responses, resulting in a vicious cycle that makes triggering sounds increasingly intolerable (Dozier, 2015; Jastreboff and Jastreboff, 2015). Such an effect of anticipation may also arise in less pathological contexts as well. For example, Ely (1975) found that listeners who knew they would hear nails on a chalkboard showed increasing autonomic responses across repeated presentations, while uninformed listeners exhibited no change in autonomic response over time, suggesting that awareness and perhaps anticipation of the nature of the unpleasant sound increased the strength of the aversive response over time.

Thus, when a non-target auditory object is present in an auditory scene, even when it does not interact with the primary task in any acoustic manner, we must consider that it will likely attract attention to itself, especially if it exhibits some of the acoustic properties discussed above as being attentionally capturing. By attracting attention, it may interrupt performance of the primary task, even if this task is not auditory in nature, causing irritation or displeasure and increasing demand for the task (Sörqvist, 2014). Even just pulling attention away from the primary task, therefore, increases the overall cognitive demand on the listener, potentially reducing the availability of resources needed to accomplish the speech perception task and resulting in poorer performance, introducing a further sense of increased listening effort, and increasing frustration. In addition, however, engaging with the irrelevant sound as a separate object seems likely to engage evaluative processes that may result in strong affective response to the presence of the sound that may or may not be related to the degree to which it interferes with a target speech perception task (as in the cycle described for misophonia, but, presumably, to a less emotional degree). All these potential factors should be considered when adding noise to a target signal.

Just as different sounds may evoke different kinds and degrees of emotional response, so may the sense of exerting effort, and there is a growing awareness that in many tasks motivation is at least as significant as cognitive effort itself (Bruya and Tang, 2018; Kurzban, 2016; Pichora-Fuller , 2016). Following the argument outlined in previous work (Francis and Love, 2020), the willingness to exert effort depends on motivation (Richter, 2016; Richter , 2016), and motivation, especially for action, in turn depends on mechanisms that regulate the expenditure of limited resources to maintain homeostasis (Barrett and Simmons, 2015; Kleckner , 2017; Touroutoglou , 2019). Simply put, motivation to accomplish a task depends on the ongoing assessment of whether accomplishing that task is worth expending the resources that must be expended to accomplish it relative to the current availability and predicted future demand for those same resources (Eckert , 2016; Kurzban, 2016; McLaughlin , 2021; Schneider , 2019; Westbrook and Braver, 2015). The ongoing, moment-by-moment assessment of resource availability vs demand (current and projected) is reflected in the physiological property known as core affect (Barrett, 2006; Duncan and Barrett, 2007), which also provides the physiological basis for the internal states that we identify as emotions (Barrett and Bliss‐Moreau, 2009). Thus, effort, emotion, and motivation are linked in that expended effort that is perceived as failing to achieve a desired goal feels bad and is demotivating (Shenhav , 2021; Venables and Fairclough, 2009).

When the presence of noise increases demand on processing resources, listeners must decide whether it is still worth it to continue with the task. If the task is sufficiently important, they may continue performing it even while assessing that it is no longer worth the cost. Following Alhanbali (2018), I argue that it is this demotivating emotional response to the fruitless expenditure of effort that underlies the kind of negative sense that discussions of listening effort typically evoke rather than the expenditure of effort as such (Alhanbali , 2018; Francis and Love, 2020; Hornsby, 2013; Kramer , 2006; Pichora-Fuller , 2016; Winn and Teece, 2021). Nevertheless, if the increased effort is attributable to the presence of an identifiable additional noise (a distinct auditory object, i.e., someone else talking, or a noisy air conditioning system), then the listener may be annoyed specifically at that noise.

In summary, how much a non-target auditory object is unwanted depends not just on how much it interferes with the formation and/or interpretation of the target signal (interference), but also on whether or not it is intrinsically unpleasant or otherwise aversive to the listener (aversion), whether and how much it interferes with some outcome that the listener desires (distraction), and how much that outcome is actually wanted (motivation). All of these factors may affect the listener's performance on a listening task as well as their evaluation of whether it is worthwhile to continue performing the task, and they do so through distinct but not mutually exclusive mechanisms.

In 1981 Ann Cutler published a paper entitled “Making up materials is a confounded nuisance, or: Will we able to run any psycholinguistic experiments at all in 1990?” (Cutler, 1981). This short paper drew attention to many of the myriad ways in which new discoveries were complicating the development of stimulus sets for spoken word recognition tasks and argued for a deeper consideration of these complex interactions in future research. In a similar way, I hope I have shown here that adding noise to speech perception tasks is thoroughly confounded—certainly with auditory processing, but also with attention, effort, affect, and motivation. Nevertheless, research investigating speech perception in noise remains possible and even desirable. The move toward studying speech perception in different kinds of noise is at least partly motivated by the desire to investigate behavior in listening contexts that are more ecologically valid, in the sense of more like those found in everyday life, and I think this is a laudable and necessary goal (Beechey, 2022; Keidser , 2020). However, in doing so, we must also move away from considering added noise as simply a way to dial up the difficulty of a listening task.

The first step is to recognize that adding a new signal to the auditory scene is just one of many ways to increase uncertainty and that, even if we merely wish to add uncertainty to the process of recognizing a target signal, we must nevertheless ask, “What kind of uncertainty is appropriate for my research question?” Speech is composed of different acoustic events, some of which will be more or less affected by different properties of the source or receiver and more or less obscured by specific patterns of acoustic noise or other sorts of signal manipulation. These events are further bound together into phonetic features and higher-level units that, themselves, depend on combinations of spectral and temporal information and thus are likely to be made differentially uncertain by different kinds of noise or manipulation unfolding over different intervals of time. Uncertainty at the phonetic feature level may be compensated for in a different way than lexical uncertainty, for example, and listeners may react differently to the need for different sorts of compensatory processes. The locus of interference within the process of understanding speech (Fig. 2) is, therefore, an important consideration in understanding the effect of added uncertainty and the strategies that listeners may adopt to cope with it.

In addition, however, if the source of uncertainty is an acoustic signal, we must ask whether it can be perceived as an auditory object distinct from that of the target speech. If so, we must consider the possibility of additional demands imposed on selective attention and the processing of the auditory scene and the possibility that the noise itself may be considered unpleasant either due to its own inherent properties or because it is perceived as causing difficulties in achieving a desired goal. And in all cases, we must consider the potential for an impact on motivation from emotional responses both to the noise-related reduction in performance and/or increase in listening effort and to the awareness of the presence of the noise itself. Whether we prefer a concept of noise closer to the “source of increased uncertainty” or one closer to “unwanted sound,” ultimately, we must consider that uncertainty in speech perception may arise at many simultaneous levels of processing and that the unwanted nature of a sound may have significant implications for task performance that extend far beyond simple errors in the perception of speech.

1.
Adank
,
P.
,
Davis
,
M. H.
, and
Hagoort
,
P.
(
2012
). “
Neural dissociation in processing noise and accent in spoken language comprehension
,”
Neuropsychologia
50
(
1
),
77
84
.
2.
Alain
,
C.
, and
Arnott
,
S. R.
(
2000
). “
Selectively attending to auditory objects
,”
Front. Biosci.
5
(
3
),
d202
d212
.
3.
Alain
,
C.
,
Du
,
Y.
,
Bernstein
,
L. J.
,
Barten
,
T.
, and
Banai
,
K.
(
2018
). “
Listening under difficult conditions: An activation likelihood estimation meta-analysis
,”
Hum. Brain Mapp.
39
(
7
),
2695
2709
.
4.
Alain
,
C.
,
Quan
,
J.
,
McDonald
,
K.
, and
Van Roon
,
P.
(
2009
). “
Noise-induced increase in human auditory evoked neuromagnetic fields
,”
Eur. J. Neurosci.
30
(
1
),
132
142
.
5.
Alhanbali
,
S.
,
Dawes
,
P.
,
Lloyd
,
S.
, and
Munro
,
K. J.
(
2018
). “
Hearing handicap and speech recognition correlate with self-reported listening effort and fatigue
,”
Ear Hear.
39
(
3
),
470
474
.
6.
Alhanbali
,
S.
,
Dawes
,
P.
,
Millman
,
R. E.
, and
Munro
,
K. J.
(
2019
). “
Measures of listening effort are multidimensional
,”
Ear Hear.
40
(
5
),
1084
1097
.
7.
Arnott
,
S. R.
, and
Alain
,
C.
(
2011
). “
The auditory dorsal pathway: Orienting vision
,”
Neurosci. Biobehav. Rev.
35
(
10
),
2162
2173
.
192.
Aston-Jones
,
G.
, and
Cohen
,
J. D.
(
2005
). “
An integrative theory of locus coeruleus-norepinephrine function: Adaptive gain and optimal performance
,”
Annu. Rev. Neurosci.
28
,
403
–450.
8.
Asutay
,
E.
, and
Västfjäll
,
D.
(
2016
). “
Auditory attentional selection is biased by reward cues
,”
Sci. Rep.
6
(
1
),
36989
.
9.
Aydelott
,
J.
,
Jamaluddin
,
Z.
, and
Nixon Pearce
,
S.
(
2015
). “
Semantic processing of unattended speech in dichotic listening
,”
J. Acoust. Soc. Am.
138
(
2
),
964
975
.
10.
Babisch
,
W.
,
Pershagen
,
G.
,
Selander
,
J.
,
Houthuijs
,
D.
,
Breugelmans
,
O.
,
Cadum
,
E.
,
Vigna-Taglianti
,
F.
,
Katsouyanni
,
K.
,
Haralabidis
,
A. S.
,
Dimakopoulou
,
K.
,
Sourtzi
,
P.
,
Floud
,
S.
, and
Hansell
,
A. L.
(
2013
). “
Noise annoyance—A modifier of the association between noise level and cardiovascular health?
,”
Sci. Total Environ.
452
,
50
57
.
11.
Backer
,
K. C.
, and
Alain
,
C.
(
2012
). “
Orienting attention to sound object representations attenuates change deafness
,”
J. Exp. Psychol. Hum. Percept. Perform.
38
(
6
),
1554
1566
.
12.
Baese-Berk
,
M. M.
,
McLaughlin
,
D. J.
, and
McGowan
,
K. B.
(
2020
). “
Perception of non-native speech
,”
Lang. Linguist. Compass
14
(
7
),
e12375
.
13.
Banbury
,
S.
, and
Berry
,
D. C.
(
1997
). “
Habituation and dishabituation to speech and office noise
,”
J. Exp. Psychol. Appl.
3
(
3
),
181
195
.
14.
Banks
,
B.
,
Gowen
,
E.
,
Munro
,
K. J.
, and
Adank
,
P.
(
2015
). “
Cognitive predictors of perceptual adaptation to accented speech
,”
J. Acoust. Soc. Am.
137
(
4
),
2015
2024
.
15.
Bargh
,
J. A.
(
1982
). “
Attention and automaticity in the processing of self-relevant information
,”
J. Pers. Soc. Psychol.
43
(
3
),
425
436
.
16.
Barrett
,
L. F.
(
2006
). “
Solving the emotion paradox: Categorization and the experience of emotion
,”
Pers. Soc. Psychol. Rev.
10
(
1
),
20
46
.
17.
Barrett
,
L. F.
, and
Bliss‐Moreau
,
E.
(
2009
). “
Affect as a psychological primitive
,”
Adv. Exp. Soc. Psychol.
41
,
167
218
.
[PubMed]
18.
Barrett
,
L. F.
, and
Simmons
,
W. K.
(
2015
). “
Interoceptive predictions in the brain
,”
Nat. Rev. Neurosci.
16
(
7
),
419
429
.
19.
Basner
,
M.
,
Babisch
,
W.
,
Davis
,
A.
,
Brink
,
M.
,
Clark
,
C.
,
Janssen
,
S.
, and
Stansfeld
,
S.
(
2014
). “
Auditory and non-auditory effects of noise on health
,”
Lancet
383
(
9925
),
1325
1332
.
20.
Beatty
,
J.
(
1982
). “
Task-evoked pupillary responses, processing load, and the structure of processing resources
,”
Psychol. Bull.
91
(
2
),
276
292
.
21.
Beechey
,
T.
(
2022
). “
Ecological validity, external validity, and mundane realism in hearing science
,”
Ear Hear.
43
(
5
),
1395
1401
.
22.
Bent
,
T.
,
Baese-Berk
,
M.
,
Borrie
,
S. A.
, and
McKee
,
M.
(
2016
). “
Individual differences in the perception of regional, nonnative, and disordered speech varieties
,”
J. Acoust. Soc. Am.
140
(
5
),
3775
3786
.
23.
Bernarding
,
C.
,
Strauss
,
D. J.
,
Latzel
,
M.
,
Hannemann
,
R.
,
Chalupper
,
J.
, and
Corona-Strauss
,
F. I.
(
2011
). “
Simulations of hearing loss and hearing aid: Effects on electrophysiological correlates of listening effort
,” in
Proceedings of the 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society
, Boston, MA (August 30–September 3), pp.
2319
2322
.
24.
Bertoli
,
S.
, and
Bodmer
,
D.
(
2014
). “
Novel sounds as a psychophysiological measure of listening effort in older listeners with and without hearing loss
,”
Clin. Neurophysiol.
125
(
5
),
1030
1041
.
193.
Bharadwaj
,
H.
(
2022
). (private communication).
25.
Bizley
,
J. K.
, and
Cohen
,
Y. E.
(
2013
). “
The what, where and how of auditory-object perception
,”
Nat. Rev. Neurosci.
14
(
10
),
693
707
.
26.
Borrie
,
S. A.
,
Baese-Berk
,
M.
,
Van Engen
,
K.
, and
Bent
,
T.
(
2017
). “
A relationship between processing speech in noise and dysarthric speech
,”
J. Acoust. Soc. Am.
141
(
6
),
4660
4667
.
27.
Botteldooren
,
D.
,
Andringa
,
T. C.
,
Aspuru
,
I.
,
Brown
,
L.
,
Dubois
,
D.
,
Guastavino
,
C.
,
Kang
,
J.
,
Lavandier
,
C.
,
Nilsson
,
M.
,
Preis
,
A.
, and
Schulte-Fortkamp
,
B.
(
2018
). “
From sonic environment to soundscape
,” in
Soundscape and the Built Environment
(
CRC
,
Boca Raton, FL
), pp.
31
56
.
28.
Bradley
,
M. M.
(
2009
). “
Natural selective attention: Orienting and emotion
,”
Psychophysiology
46
(
1
),
1
11
.
29.
Bregman
,
A. S.
(
1990
).
Auditory Scene Analysis: The Perceptual Organization of Sound
(
MIT
,
Cambridge, MA
).
30.
Broadbent
,
D. E.
(
1952
). “
Failures of attention in selective listening
,”
J. Exp. Psychol.
44
(
6
),
428
433
.
31.
Broadbent
,
D. E.
(
1977
). “
The hidden preattentive processes
,”
Am. Psychol.
32
(
2
),
109
118
.
32.
Broadbent
,
D. E.
(
1982
). “
Task combination and selective intake of information
,”
Acta Psychol.
50
(
3
),
253
290
.
33.
Brown
,
V. A.
, and
Strand
,
J. F.
(
2019
). “
Noise increases listening effort in normal-hearing young adults, regardless of working memory capacity
,”
Lang. Cognit. Neurosci.
34
(
5
),
628
640
.
34.
Brungart
,
D. S.
(
2005
). “
Informational and energetic masking effects in multitalker speech perception
,” in
Speech Separation by Humans and Machines
, edited by
P.
Divenyi
(
Springer
,
New York
), pp.
261
267
.
35.
Bruya
,
B.
, and
Tang
,
Y.-Y.
(
2018
). “
Is attention really effort? Revisiting Daniel Kahneman's influential 1973 book Attention and Effort
,”
Front. Psychol.
9
,
1133
.
36.
Calandruccio
,
L.
,
Dhar
,
S.
, and
Bradlow
,
A. R.
(
2010
). “
Speech-on-speech masking with variable access to the linguistic content of the masker speech
,”
J. Acoust. Soc. Am.
128
(
2
),
860
869
.
37.
Cherry
,
E. C.
(
1953
). “
Some experiments on the recognition of speech, with one and with two ears
,”
J. Acoust. Soc. Am.
25
(
5
),
975
979
.
38.
Chun
,
M. M.
,
Golomb
,
J. D.
, and
Turk-Browne
,
N. B.
(
2011
). “
A taxonomy of external and internal attention
,”
Annu. Rev. Psychol.
62
,
73
101
.
39.
Ciocca
,
V.
, and
Bregman
,
A. S.
(
1989
). “
The effects of auditory streaming on duplex perception
,”
Percept. Psychophys.
46
(
1
),
39
48
.
40.
Clopper
,
C. G.
(
2021
). “
Perception of dialect variation
,” in
The Handbook of Speech Perception
(
Wiley
,
London
), pp.
333
364
.
41.
Cox
,
T. J.
(
2008
). “
Scraping sounds and disgusting noises
,”
Appl. Acoust.
69
(
12
),
1195
1204
.
42.
Culling
,
J. F.
, and
Stone
,
M. A.
(
2017
). “
Energetic masking and masking release
,” in
The Auditory System at the Cocktail Party
, edited by
J. C.
Middlebrooks
,
J. Z.
Simon
,
A. N.
Popper
, and
R. R.
Fay
(
Springer
,
New York
), pp.
41
73
.
43.
Cutler
,
A.
(
1981
). “
Making up materials is a confounded nuisance, or: Will we able to run any psycholinguistic experiments at all in 1990?
Cognition
10
,
65
70
.
44.
Dai
,
B.
,
McQueen
,
J. M.
,
Hagoort
,
P.
, and
Kösem
,
A.
(
2017
). “
Pure linguistic interference during comprehension of competing speech signals
,”
J. Acoust. Soc. Am.
141
(
3
),
EL249
EL254
.
45.
Daniel
,
P.
, and
Weber
,
R.
(
1997
). “
Psychoacoustical roughness: Implementation of an optimized model
,”
Acta Acust. united Acust.
83
(
1
),
113
123
.
46.
Darwin
,
C. J.
(
1997
). “
Auditory grouping
,”
Trends Cogn. Sci.
1
(
9
),
327
333
.
47.
De Coensel
,
B.
,
Botteldooren
,
D.
,
De Muer
,
T.
,
Berglund
,
B.
,
Nilsson
,
M. E.
, and
Lercher
,
P.
(
2009
). “
A model for the perception of environmental sound based on notice-events
,”
J. Acoust. Soc. Am.
126
(
2
),
656
665
.
48.
Dozier
,
T. H.
(
2015
). “
Etiology, composition, development and maintenance of misophonia: A conditioned aversive reflex disorder
,”
Psychol. Thought
8
(
1
),
114
129
.
49.
Driver
,
J.
(
2001
). “
A selective review of selective attention research from the past century
,”
Br. J. Psychol.
92
(
1
),
53
78
.
50.
Duncan
,
S.
, and
Barrett
,
L. F.
(
2007
). “
Affect is a form of cognition: A neurobiological analysis
,”
Cogn. Emot.
21
(
6
),
1184
1211
.
51.
Durlach
,
N. I.
,
Mason
,
C. R.
,
Kidd
,
G.
,
Arbogast
,
T. L.
,
Colburn
,
H. S.
, and
Shinn-Cunningham
,
B. G.
(
2003
). “
Note on informational masking (L)
,”
J. Acoust. Soc. Am.
113
(
6
),
2984
2987
.
52.
Eckert
,
M. A.
,
Teubner-Rhodes
,
S.
, and
Vaden
,
K. I.
(
2016
). “
Is listening in noise worth it? The neurobiology of speech recognition in challenging listening conditions
,”
Ear Hear.
37
(
Suppl. 1
),
101S
110S
.
53.
Edwards
,
B.
(
2016
). “
A model of auditory-cognitive processing and relevance to clinical applicability
,”
Ear Hear.
37
,
85S
91S
.
54.
Ely
,
D. J.
(
1975
). “
Aversiveness without pain: Potentiation of imaginai and auditory effects of blackboard screeches
,”
Bull. Psychon. Soc.
6
(
3
),
295
296
.
55.
Fairnie
,
J.
,
Moore
,
B. C. J.
, and
Remington
,
A.
(
2016
). “
Missing a trick: Auditory load modulates conscious awareness in audition
,”
J. Exp. Psychol. Hum. Percept. Perform.
42
(
7
),
930
938
.
56.
Fano
,
R. M.
(
1950
). “
The information theory point of view in speech communication
,”
J. Acoust. Soc. Am.
22
(
6
),
691
696
.
57.
Fink
,
D.
(
2019
). “
A new definition of noise: Noise is unwanted and/or harmful sound. Noise is the new ‘secondhand smoke,’ 
Proc. Mtgs. Acoust.
39
(
1
),
050002
.
58.
Francis
,
A. L.
(
2010
). “
Improved segregation of simultaneous talkers differentially affects perceptual and cognitive capacity demands for recognizing speech in competing speech
,”
Atten. Percept. Psychophys.
72
(
2
),
501
516
.
59.
Francis
,
A. L.
,
Bent
,
T.
,
Schumaker
,
J.
,
Love
,
J.
, and
Silbert
,
N.
(
2021
). “
Listener characteristics differentially affect self-reported and physiological measures of effort associated with two challenging listening conditions
,”
Atten. Percept. Psychophys.
83
(
4
),
1818
1841
.
60.
Francis
,
A. L.
, and
Love
,
J.
(
2020
). “
Listening effort: Are we measuring cognition or affect, or both?
,”
WIREs Cogn. Sci.
11
(
1
),
e1514
.
61.
Francis
,
A. L.
,
MacPherson
,
M. K.
,
Chandrasekaran
,
B.
, and
Alvar
,
A. M.
(
2016
). “
Autonomic nervous system responses during perception of masked speech may reflect constructs other than subjective listening effort
,”
Front. Psychol.
7
,
263
.
63.
Francis
,
A. L.
, and
Oliver
,
J.
(
2018
). “
Psychophysiological measurement of affective responses during speech perception
,”
Hear. Res.
369
,
103
119
.
64.
Freyman
,
R. L.
,
Balakrishnan
,
U.
, and
Helfer
,
K. S.
(
2001
). “
Spatial release from informational masking in speech recognition
,”
J. Acoust. Soc. Am.
109
(
5
),
2112
2122
.
66.
Freyman
,
R. L.
,
Helfer
,
K. S.
, and
Balakrishnan
,
U.
(
2007
). “
Variability and uncertainty in masking by competing speech
,”
J. Acoust. Soc. Am.
121
(
2
),
1040
1046
.
67.
Freyman
,
R. L.
,
Helfer
,
K. S.
,
McCall
,
D. D.
, and
Clifton
,
R. K.
(
1999
). “
The role of perceived spatial separation in the unmasking of speech
,”
J. Acoust. Soc. Am.
106
(
6
),
3578
3588
.
68.
Fritz
,
J. B.
,
Elhilali
,
M.
,
David
,
S. V.
, and
Shamma
,
S. A.
(
2007
). “
Auditory attention—Focusing the searchlight on sound
,”
Curr. Opin. Neurobiol.
17
(
4
),
437
455
.
194.
Gilzenrat
,
M. S.
,
Nieuwenhuis
,
S.
,
Jepma
,
M.
, and
Cohen
,
J. D.
(
2010
). “
Pupil diameter tracks changes in control state predicted by the adaptive gain theory of locus coeruleus function
,”
Cognit., Affective, Behav. Neurosci.
10
(
2
),
252
–269.
69.
Gordon
,
P. C.
(
2000
). “
Masking protection in the perception of auditory objects
,”
Speech Commun.
30
(
4
),
197
206
.
70.
Gordon-Salant
,
S.
,
Yeni-Komshian
,
G. H.
,
Fitzgibbons
,
P. J.
, and
Barrett
,
J.
(
2006
). “
Age-related differences in identification and discrimination of temporal cues in speech segments
,”
J. Acoust. Soc. Am.
119
(
4
),
2455
2466
.
71.
Granholm
,
E.
,
Asarnow
,
R. F.
,
Sarkin
,
A. J.
, and
Dykes
,
K. L.
(
1996
). “
Pupillary responses index cognitive resource limitations
,”
Psychophysiology
33
(
4
),
457
461
.
72.
Griffiths
,
T. D.
, and
Warren
,
J. D.
(
2004
). “
What is an auditory object?
,”
Nat. Rev. Neurosci.
5
(
11
),
887
892
.
73.
Guediche
,
S.
,
Blumstein
,
S.
,
Fiez
,
J.
, and
Holt
,
L.
(
2014
). “
Speech perception under adverse conditions: Insights from behavioral, computational, and neuroscience research
,”
Front. Syst. Neurosci.
7
,
126
.
74.
Haapakangas
,
A.
,
Kankkunen
,
E.
,
Hongisto
,
V.
,
Virjonen
,
P.
,
Oliva
,
D.
, and
Keskinen
,
E.
(
2011
). “
Effects of five speech masking sounds on performance and acoustic satisfaction: Implications for open-plan offices
,”
Acta Acust. united Acust.
97
(
4
),
641
655
.
75.
Halpern
,
D. L.
,
Blake
,
R.
, and
Hillenbrand
,
J.
(
1986
). “
Psychoacoustics of a chilling sound
,”
Percept. Psychophys.
39
(
2
),
77
80
.
76.
Heald
,
S.
, and
Nusbaum
,
H.
(
2014
). “
Speech perception as an active cognitive process
,”
Front. Syst. Neurosci.
8
,
35
.
77.
Helfer
,
K. S.
,
Chevalier
,
J.
, and
Freyman
,
R. L.
(
2010
). “
Aging, spatial cues, and single- versus dual-task performance in competing speech perception
,”
J. Acoust. Soc. Am.
128
(
6
),
3625
3633
.
78.
Helfer
,
K. S.
, and
Freyman
,
R. L.
(
2014
). “
Stimulus and listener factors affecting age-related changes in competing speech perception
,”
J. Acoust. Soc. Am.
136
(
2
),
748
759
.
79.
Holt
,
R. F.
,
Kirk
,
K. I.
, and
Hay-McCutcheon
,
M.
(
2011
). “
Assessing multimodal spoken word-in-sentence recognition in children with normal hearing and children with cochlear implants
,”
J. Speech. Lang. Hear. Res.
54
(
2
),
632
658
.
80.
Hornsby
,
B. W. Y.
(
2013
). “
The effects of hearing aid use on listening effort and mental fatigue associated with sustained speech processing demands
,”
Ear Hear.
34
(
5
),
523
534
.
81.
Humes
,
L. E.
,
Dirks
,
D. D.
,
Bell
,
T. S.
, and
Kincaid
,
G. E.
(
1987
). “
Recognition of nonsense syllables by hearing‐impaired listeners and by noise‐masked normal hearers
,”
J. Acoust. Soc. Am.
81
(
3
),
765
773
.
82.
Humes
,
L. E.
, and
Dubno
,
J. R.
(
2010
). “
Factors affecting speech understanding in older adults
,” in
The Aging Auditory System
, edited by
S.
Gordon-Salant
,
R. D.
Frisina
,
A. N.
Popper
, and
R. R.
Fay
(
Springer
,
New York
), pp.
211
257
.
83.
Ihlefeld
,
A.
, and
Shinn-Cunningham
,
B.
(
2008a
). “
Spatial release from energetic and informational masking in a divided speech identification task
,”
J. Acoust. Soc. Am.
123
(
6
),
4380
4392
.
84.
Ihlefeld
,
A.
, and
Shinn-Cunningham
,
B.
(
2008b
). “
Spatial release from energetic and informational masking in a selective speech identification task
,”
J. Acoust. Soc. Am.
123
(
6
),
4369
4379
.
85.
Imai
,
S.
,
Walley
,
A. C.
, and
Flege
,
J. E.
(
2005
). “
Lexical frequency and neighborhood density effects on the recognition of native and Spanish-accented words by native English and Spanish listeners
,”
J. Acoust. Soc. Am.
117
(
2
),
896
907
.
86.
Inzlicht
,
M.
,
Shenhav
,
A.
, and
Olivola
,
C. Y.
(
2018
). “
The effort paradox: Effort is both costly and valued
,”
Trends Cogn. Sci.
22
(
4
),
337
349
.
87.
Jastreboff
,
P. J.
, and
Jastreboff
,
M. M.
(
2015
). “
Decreased sound tolerance: Hyperacusis, misophonia, diplacousis, and polyacousis
,”
Handb. Clin. Neurol.
129
,
375
387
.
88.
Kahneman
,
D.
(
1973
).
Attention and Effort
(
Prentice-Hall
,
Englewood Cliffs, NJ
).
89.
Kahneman
,
D.
, and
Beatty
,
J.
(
1966
). “
Pupil diameter and load on memory
,”
Science
154
(
3756
),
1583
1585
.
90.
Kahneman
,
D.
,
Peavler
,
W. S.
, and
Onuska
,
L.
(
1968
). “
Effects of verbalization and incentive on the pupil response to mental activity
,”
Can. J. Psychol.
22
(
3
),
186
196
.
91.
Keidser
,
G.
,
Naylor
,
G.
,
Brungart
,
D. S.
,
Caduff
,
A.
,
Campos
,
J.
,
Carlile
,
S.
,
Carpenter
,
M. G.
,
Grimm
,
G.
,
Hohmann
,
V.
,
Holube
,
I.
,
Launer
,
S.
,
Lunner
,
T.
,
Mehra
,
R.
,
Rapport
,
F.
,
Slaney
,
M.
, and
Smeds
,
K.
(
2020
). “
The quest for ecological validity in hearing science: What it is, why it matters, and how to advance it
,”
Ear Hear.
41
(
Suppl. 1
),
5S
19S
.
92.
Keus van de Poll
,
M.
, and
Sörqvist
,
P.
(
2016
). “
Effects of task interruption and background speech on word processed writing
,”
Appl. Cognit. Psychol.
30
(
3
),
430
439
.
93.
Kidd
,
G.
, and
Colburn
,
H. S.
(
2017
). “
Informational masking in speech recognition
,” in
The Auditory System at the Cocktail Party
, edited by
J. C.
Middlebrooks
,
J. Z.
Simon
,
A. N.
Popper
, and
R. R.
Fay
(
Springer
,
New York
), pp.
75
109
.
94.
Kidd
,
G.
,
Mason
,
C. R.
,
Richards
,
V. M.
,
Gallun
,
F. J.
, and
Durlach
,
N. I.
(
2008
). “
Informational masking
,” in
Auditory Perception of Sound Sources
, edited by
W. A.
Yost
,
A. N.
Popper
, and
R. R.
Fay
(
Springer
,
New York
), pp.
143
189
.
95.
Kidd
,
G.
,
Mason
,
C. R.
,
Swaminathan
,
J.
,
Roverud
,
E.
,
Clayton
,
K. K.
, and
Best
,
V.
(
2016
). “
Determining the energetic and informational components of speech-on-speech masking
,”
J. Acoust. Soc. Am.
140
(
1
),
132
144
.
96.
Kjellberg
,
A.
,
Landström
,
U.
,
Tesarz
,
M.
,
Söderberg
,
L.
, and
Akerlund
,
E.
(
1996
). “
The effects of nonphysical noise characteristics, ongoing task and noise sensitivity on annoyance and distraction due to noise at work
,”
J. Environ. Psychol.
16
(
2
),
123
136
.
97.
Klatte
,
M.
,
Meis
,
M.
,
Sukowski
,
H.
, and
Schick
,
A.
(
2007
). “
Effects of irrelevant speech and traffic noise on speech perception and cognitive performance in elementary school children
,”
Noise Health
9
(
36
),
64
74
.
98.
Kleckner
,
I. R.
,
Zhang
,
J.
,
Touroutoglou
,
A.
,
Chanes
,
L.
,
Xia
,
C.
,
Simmons
,
W. K.
,
Quigley
,
K. S.
,
Dickerson
,
B. C.
, and
Feldman Barrett
,
L.
(
2017
). “
Evidence for a large-scale brain system supporting allostasis and interoception in humans
,”
Nat. Hum. Behav.
1
(
5
),
0069
.
99.
Kluender
,
K. R.
,
Stilp
,
C. E.
, and
Lucas
,
F. L.
(
2019
). “
Long-standing problems in speech perception dissolve within an information-theoretic perspective
,”
Atten. Percept. Psychophys.
81
(
4
),
861
883
.
100.
Koelewijn
,
T.
,
de Kluiver
,
H.
,
Shinn-Cunningham
,
B. G.
,
Zekveld
,
A. A.
, and
Kramer
,
S. E.
(
2015
). “
The pupil response reveals increased listening effort when it is difficult to focus attention
,”
Hear. Res.
323
,
81
90
.
101.
Kramer
,
S. E.
,
Kapteyn
,
T. S.
,
Festen
,
J. M.
, and
Kuik
,
D. J.
(
1997
). “
Assessing aspects of auditory handicap by means of pupil dilatation
,”
Int. J. Audiol.
36
(
3
),
155
164
.
102.
Kramer
,
S. E.
,
Kapteyn
,
T. S.
, and
Houtgast
,
T.
(
2006
). “
Occupational performance: Comparing normally-hearing and hearing-impaired employees using the Amsterdam Checklist for Hearing and Work
,”
Int. J. Audiol.
45
(
9
),
503
512
.
103.
Krueger
,
M.
,
Schulte
,
M.
,
Zokoll
,
M. A.
,
Wagener
,
K. C.
,
Meis
,
M.
,
Brand
,
T.
, and
Holube
,
I.
(
2017
). “
Relation between listening effort and speech intelligibility in noise
,”
Am. J. Audiol.
26
(
3S
),
378
392
.
104.
Kryter
,
K. D.
(
1972
). “
Non-auditory effects of environmental noise
,”
Am. J. Public Health
62
(
3
),
389
398
.
105.
Kryter
,
K. D.
(
2013
).
The Effects of Noise on Man
(
Elsevier
,
Amsterdam
).
106.
Kubovy
,
M.
, and
Van Valkenburg
,
D.
(
2001
). “
Auditory and visual objects
,”
Cognition
80
(
1
),
97
126
.
107.
Kumar
,
S.
,
Forster
,
H. M.
,
Bailey
,
P.
, and
Griffiths
,
T. D.
(
2008
). “
Mapping unpleasantness of sounds to their auditory representation
,”
J. Acoust. Soc. Am.
124
(
6
),
3810
3817
.
108.
Kumar
,
S.
,
Tansley-Hancock
,
O.
,
Sedley
,
W.
,
Winston
,
J. S.
,
Callaghan
,
M. F.
,
Allen
,
M.
,
Cope
,
T. E.
,
Gander
,
P. E.
,
Bamiou
,
D.-E.
, and
Griffiths
,
T. D.
(
2017
). “
The brain basis for misophonia
,”
Curr. Biol.
27
(
4
),
527
533
.
109.
Kurzban
,
R.
(
2016
). “
The sense of effort
,”
Curr. Opin. Psychol.
7
,
67
70
.
110.
Lang
,
P. J.
(
2000
). “
Emotion and motivation: Attention, perception, and action
,”
J. Sport Exerc. Psychol.
22
(
s1
),
S122
S140
.
111.
Lavie
,
N.
(
2005
). “
Distracted and confused?: Selective attention under load
,”
Trends Cogn. Sci.
9
(
2
),
75
82
.
112.
Lee
,
J.
,
Francis
,
J. M.
, and
Wang
,
L. M.
(
2017
). “
How tonality and loudness of noise relate to annoyance and task performance
,”
Noise Cont. Eng. J.
65
(
2
),
71
82
.
113.
Lenne
,
L.
,
Chevret
,
P.
, and
Marchand
,
J.
(
2020
). “
Long-term effects of the use of a sound masking system in open-plan offices: A field study
,”
Appl. Acoust.
158
,
107049
.
114.
Lindvall
,
J.
, and
Västfjäll
,
D.
(
2013
). “
The effect of interior aircraft noise on pilot performance
,”
Percept. Mot. Skills
116
(
2
),
472
490
.
115.
Loewen
,
L. J.
, and
Suedfeld
,
P.
(
1992
). “
Cognitive and arousal effects of masking office noise
,”
Environ. Behav.
24
(
3
),
381
395
.
116.
Love
,
J.
,
Sollmann
,
L.
,
Niehl
,
A.
, and
Francis
,
A. L.
(
2018
). “
Physiological orienting response, noise sensitivity, and annoyance from irrelevant background sound
,”
Proc. Mtgs. Acoust.
35
(
1
),
040002
.
117.
Love
,
J.
,
Sung
,
W.
, and
Francis
,
A. L.
(
2021
). “
Psychophysiological responses to potentially annoying heating, ventilation, and air conditioning noise during mentally demanding work
,”
J. Acoust. Soc. Am.
150
(
4
),
3149
3163
.
118.
Marsh
,
J. E.
,
Ljung
,
R.
,
Jahncke
,
H.
,
MacCutcheon
,
D.
,
Pausch
,
F.
,
Ball
,
L. J.
, and
Vachon
,
F.
(
2018
). “
Why are background telephone conversations distracting?
J. Exp. Psychol. Appl.
24
(
2
),
222
235
.
119.
Martin-Soelch
,
C.
,
Stöcklin
,
M.
,
Dammann
,
G.
,
Opwis
,
K.
, and
Seifritz
,
E.
(
2006
). “
Anxiety trait modulates psychophysiological reactions, but not habituation processes related to affective auditory stimuli
,”
Int. J. Psychophysiol.
61
(
2
),
87
97
.
120.
Mattys
,
S. L.
,
Davis
,
M. H.
,
Bradlow
,
A. R.
, and
Scott
,
S. K.
(
2012
). “
Speech recognition in adverse conditions: A review
,”
Lang. Cogn. Proc.
27
(
7
),
953
978
.
121.
McDermott
,
J. H.
(
2009
). “
The cocktail party problem
,”
Curr. Biol.
19
(
22
),
R1024
R1027
.
122.
McGarrigle
,
R.
,
Munro
,
K. J.
,
Dawes
,
P.
,
Stewart
,
A. J.
,
Moore
,
D. R.
,
Barry
,
J. G.
, and
Amitay
,
S.
(
2014
). “
Listening effort and fatigue: What exactly are we measuring? A British Society of Audiology Cognition in Hearing Special Interest Group ‘white paper,’ 
Int. J. Audiol.
53
(
7
),
433
445
.
123.
McLaughlin
,
D. J.
,
Baese-Berk
,
M. M.
,
Bent
,
T.
,
Borrie
,
S. A.
, and
Van Engen
,
K. J.
(
2018
). “
Coping with adversity: Individual differences in the perception of noisy and accented speech
,”
Atten. Percept. Psychophys.
80
(
6
),
1559
1570
.
124.
McLaughlin
,
D. J.
,
Braver
,
T. S.
, and
Peelle
,
J. E.
(
2021
). “
Measuring the subjective cost of listening effort using a discounting task
,”
J. Speech. Lang. Hear. Res.
64
(
2
),
337
347
.
125.
Meyer
,
J.
,
Dentel
,
L.
, and
Meunier
,
F.
(
2013
). “
Speech recognition in natural background noise
,”
PLoS One
8
(
11
),
e79279
.
126.
Moray
,
N.
(
1959
). “
Attention in dichotic listening: Affective cues and the influence of instructions
,”
Q. J. Exp. Psychol.
11
(
1
),
56
60
.
127.
Moss
,
F.
,
Ward
,
L. M.
, and
Sannita
,
W. G.
(
2004
). “
Stochastic resonance and sensory information processing: A tutorial and review of application
,”
Clin. Neurophysiol.
115
(
2
),
267
281
.
128.
Murphy
,
S.
,
Spence
,
C.
, and
Dalton
,
P.
(
2017
). “
Auditory perceptual load: A review
,”
Hear. Res.
352
,
40
48
.
195.
Nakamura
,
J.
, and
Csikszentmihalyi
,
M.
(
2016
). “
The experience of flow: Theory and research
,” in
The Oxford Handbook of Positive Psychology
, 3rd ed., edited by C. R. Snyder et al. (Oxford University Press, Oxford), pp. 279–296.
129.
Ohlenforst
,
B.
,
Wendt
,
D.
,
Kramer
,
S. E.
,
Naylor
,
G.
,
Zekveld
,
A. A.
, and
Lunner
,
T.
(
2018
). “
Impact of SNR, masker type and noise reduction processing on sentence recognition performance and listening effort as indicated by the pupil dilation response
,”
Hear. Res.
365
,
90
99
.
130.
Peelle
,
J. E.
(
2018
). “
Listening effort: How the cognitive consequences of acoustic challenge are reflected in brain and behavior
,”
Ear Hear.
39
(
2
),
204
214
.
132.
Peterson
,
G. E.
(
1952
). “
Information theory: 2. Applications of information theory to research in experimental phonetics
,”
J. Speech Hear. Disord.
17
(
2
),
175
188
.
133.
Pichora-Fuller
,
M. K.
,
Kramer
,
S. E.
,
Eckert
,
M. A.
,
Edwards
,
B.
,
Hornsby
,
B. W. Y.
,
Humes
,
L. E.
,
Lemke
,
U.
,
Lunner
,
T.
,
Matthen
,
M.
,
Mackersie
,
C. L.
,
Naylor
,
G.
,
Phillips
,
N. A.
,
Richter
,
M.
,
Rudner
,
M.
,
Sommers
,
M. S.
,
Tremblay
,
K. L.
, and
Wingfield
,
A.
(
2016
). “
Hearing impairment and cognitive energy: The framework for understanding effortful listening (FUEL)
,”
Ear Hear.
37
,
5S
27S
.
134.
Pichora-Fuller
,
M. K.
, and
Souza
,
P. E.
(
2003
). “
Effects of aging on auditory processing of speech
,”
Int. J. Audiol.
42
(
Suppl. 2
),
11
16
.
135.
Pierce
,
J.
(
1973
). “
The early days of information theory
,”
IEEE Trans. Inform. Theory
19
(
1
),
3
8
.
136.
Pretzsch
,
A.
,
Seidler
,
A.
, and
Hegewald
,
J.
(
2021
). “
Health effects of occupational noise
,”
Curr. Pollut. Rep.
7
(
3
),
344
358
.
137.
Price
,
C. N.
, and
Moncrieff
,
D.
(
2021
). “
Defining the role of attention in hierarchical auditory processing
,”
Audiol. Res.
11
(
1
),
112
128
.
138.
Ramsier
,
M. A.
, and
Rauschecker
,
J. P.
(
2017
). “
Primate audition: Reception, perception, and ecology
,” in
Primate Hearing and Communication
, edited by
R. M.
Quam
,
M. A.
Ramsier
,
R. R.
Fay
, and
A. N.
Popper
(
Springer
,
New York
), pp.
47
77
.
139.
Raymond
,
J.
(
2009
). “
Interactions of attention, emotion and motivation
,”
Prog. Brain Res.
176
,
293
308
.
140.
Richards
,
H.
(
1935
). “
The problem of noise
,”
J. R. Soc. Arts
83
(
4305
),
625
637
, available at https://www.jstor.org/stable/41360462.
141.
Richie
,
C.
,
Kewley-Port
,
D.
, and
Coughlin
,
M.
(
2005
). “
Vowel perception by noise masked normal-hearing young adults
,”
J. Acoust. Soc. Am.
118
(
2
),
1101
1110
.
142.
Richter
,
M.
(
2016
). “
The moderating effect of success importance on the relationship between listening demand and listening effort
,”
Ear Hear.
37
,
111S
117S
.
143.
Richter
,
M.
,
Gendolla
,
G. H. E.
, and
Wright
,
R. A.
(
2016
). “
Three decades of research on motivational intensity theory: What we have learned about effort and what we still don't know
,”
Adv. Motiv. Sci.
3
,
149
186
.
144.
Röer
,
J. P.
,
Bell
,
R.
, and
Buchner
,
A.
(
2013
). “
Self-relevance increases the irrelevant sound effect: Attentional disruption by one's own name
,”
J. Cogn. Psychol.
25
(
8
),
925
931
.
145.
Röer
,
J. P.
,
Bell
,
R.
, and
Buchner
,
A.
(
2014
). “
Please silence your cell phone: Your ringtone captures other people's attention
,”
Noise Health
16
(
68
),
34
39
.
146.
Rönnberg
,
J.
,
Lunner
,
T.
,
Zekveld
,
A.
,
Sörqvist
,
P.
,
Danielsson
,
H.
,
Lyxell
,
B.
,
Dahlström
,
Ö.
,
Signoret
,
C.
,
Stenfelt
,
S.
,
Pichora-Fuller
,
M. K.
, and
Rudner
,
M.
(
2013
). “
The Ease of Language Understanding (ELU) model: Theoretical, empirical, and clinical advances
,”
Front. Syst. Neurosci.
7
,
31
.
147.
Sarampalis
,
A.
,
Kalluri
,
S.
,
Edwards
,
B.
, and
Hafter
,
E.
(
2009
). “
Objective measures of listening effort: Effects of background noise and noise reduction
,”
J. Speech. Lang. Hear. Res.
52
(
5
),
1230
1240
.
148.
Schneider
,
E. N.
,
Bernarding
,
C.
,
Francis
,
A. L.
,
Hornsby
,
B. W. Y.
, and
Strauss
,
D. J.
(
2019
). “
A quantitative model of listening related fatigue
,” in
Proceedings of the 2019 9th International IEEE/EMBS Conference on Neural Engineering (NER)
, San Francisco, CA (March 20–23), pp.
619
622
.
149.
Shamma
,
S. A.
,
Elhilali
,
M.
, and
Micheyl
,
C.
(
2011
). “
Temporal coherence and attention in auditory scene analysis
,”
Trends Neurosci.
34
(
3
),
114
123
.
150.
Shannon
,
C. E.
(
1949
). “
Communication in the presence of noise
,”
Proc. IRE
37
(
1
),
10
21
.
151.
Shenhav
,
A.
,
Prater Fahey
,
M.
, and
Grahek
,
I.
(
2021
). “
Decomposing the motivation to exert mental effort
,”
Curr. Dir. Psychol. Sci.
30
(
4
),
307
314
.
152.
Shinn-Cunningham
,
B.
, and
Best
,
V.
(
2015
). “
Auditory selective attention
,” in
The Handbook of Attention
, edited by
J.
Fawcett
,
E.
Risko
, and
A.
Kingstone
(
MIT
,
Cambridge, MA
), pp.
99
118
.
153.
Shinn-Cunningham
,
B.
,
Best
,
V.
, and
Lee
,
A. K. C.
(
2017
). “
Auditory object formation and selection
,” in
The Auditory System at the Cocktail Party
, edited by
J. C.
Middlebrooks
,
J. Z.
Simon
,
A. N.
Popper
, and
R. R.
Fay
(
Springer
,
New York
), pp.
7
40
.
154.
Shinn-Cunningham
,
B. G.
(
2008
). “
Object-based auditory and visual attention
,”
Trends Cogn. Sci.
12
(
5
),
182
186
.
155.
Shinn-Cunningham
,
B. G.
,
Lee
,
A. K. C.
, and
Oxenham
,
A. J.
(
2007
). “
A sound element gets lost in perceptual competition
,”
Proc. Natl. Acad. Sci. U.S.A.
104
(
29
),
12223
12227
.
156.
Smout
,
C. A.
,
Tang
,
M. F.
,
Garrido
,
M. I.
, and
Mattingley
,
J. B.
(
2019
). “
Attention promotes the neural encoding of prediction errors
,”
PLoS Biol.
17
(
2
),
e2006812
.
157.
Sommers
,
M. S.
(
1996
). “
The structural organization of the mental lexicon and its contribution to age-related declines in spoken-word recognition
,”
Psychol. Aging
11
(
2
),
333
341
.
158.
Sommers
,
M. S.
, and
Humes
,
L. E.
(
1993
). “
Auditory filter shapes in normal‐hearing, noise‐masked normal, and elderly listeners
,”
J. Acoust. Soc. Am.
93
(
5
),
2903
2914
.
159.
Sommers
,
M. S.
,
Tye-Murray
,
N.
, and
Spehar
,
B.
(
2005
). “
Auditory-visual speech perception and auditory-visual enhancement in normal-hearing younger and older adults
,”
Ear Hear.
26
(
3
),
263
275
.
160.
Sörqvist
,
P.
(
2014
). “
On interpretation and task selection in studies on the effects of noise on cognitive performance
,”
Front. Psychol.
5
,
1249
.
161.
Stilp
,
C. E.
,
Kiefte
,
M.
, and
Kluender
,
K. R.
(
2018
). “
Discovering acoustic structure of novel sounds
,”
J. Acoust. Soc. Am.
143
(
4
),
2460
2473
.
196.
Stilp
,
C. E.
,
Shorey
,
A. E.
, and
King
,
C. J.
(
2022
). “
Nonspeech sounds are not all equally good at being non-speech
,”
J. Acoust. Soc. Am.
(to be published).
162.
Stone
,
M. A.
, and
Canavan
,
S.
(
2016
). “
The near non-existence of ‘pure’ energetic masking release for speech: Extension to spectro-temporal modulation and glimpsing
,”
J. Acoust. Soc. Am.
140
(
2
),
832
842
.
163.
Strauss
,
D. J.
, and
Francis
,
A. L.
(
2017
). “
Toward a taxonomic model of attention in effortful listening
,”
Cogn. Affect. Behav. Neurosci.
17
(
4
),
809
825
.
164.
Summers
,
V.
, and
Molis
,
M. R.
(
2004
). “
Speech recognition in fluctuating and continuous maskers
,”
J. Speech. Lang. Hear. Res.
47
(
2
),
245
256
.
165.
Surprenant
,
A. M.
, and
Watson
,
C. S.
(
2001
). “
Individual differences in the processing of speech and nonspeech sounds by normal-hearing listeners
,”
J. Acoust. Soc. Am.
110
(
4
),
2085
2095
.
166.
Tajima
,
K.
,
Port
,
R.
, and
Dalby
,
J.
(
1997
). “
Effects of temporal correction on intelligibility of foreign-accented English
,”
J. Phon.
25
(
1
),
1
24
.
167.
Touroutoglou
,
A.
,
Andreano
,
J. M.
,
Adebayo
,
M.
,
Lyons
,
S.
, and
Barrett
,
L. F.
(
2019
). “
Motivation in the service of allostasis: The role of anterior mid-cingulate cortex
,”
Adv. Motiv. Sci.
6
,
1
25
.
168.
Tun
,
P. A.
,
O'Kane
,
G.
, and
Wingfield
,
A.
(
2002
). “
Distraction by competing speech in young and older adult listeners
,”
Psychol. Aging
17
(
3
),
453
467
.
169.
Vachon
,
F.
,
Marsh
,
J. E.
, and
Labonté
,
K.
(
2020
). “
The automaticity of semantic processing revisited: Auditory distraction by a categorical deviation
,”
J. Exp. Psychol. Gen.
149
(
7
),
1360
1397
.
170.
Van Engen
,
K. J.
, and
Peelle
,
J. E.
(
2014
). “
Listening effort and accented speech
,”
Front. Hum. Neurosci.
8
,
577
.
171.
Veitch
,
J. A.
,
Bradley
,
J. S.
,
Legault
,
L. M.
,
Norcross
,
S. G.
, and
Svec
,
J. M.
(
2002
).
Masking Speech in Open-Plan Offices with Simulation Ventilation Noise: Noise-Level and Spectral Composition Effects on Acoustic Satisfaction
, Internal Report (Institute for Research in Construction) No. IRC-IR-846 (
National Research Council of Canada
,
Ottawa, Canada
), p.
53
.
172.
Venables
,
L.
, and
Fairclough
,
S. H.
(
2009
). “
The influence of performance feedback on goal-setting and mental effort regulation
,”
Motiv. Emot.
33
(
1
),
63
74
.
173.
Viswanathan
,
V.
(
2021
). “
Neurophysiological mechanisms of speech intelligibility under masking and distortion
,” Ph.D. thesis,
Purdue University Graduate School
,
West Lafayette, IN
.
174.
Vitale
,
C.
,
De Stefano
,
P.
,
Lolatto
,
R.
, and
Bianchi
,
A. M.
(
2020
). “
Physiological responses related to pleasant and unpleasant sounds
,” in
Proceedings of the 2020 IEEE 20th Mediterranean Electrotechnical Conference (MELECON)
, Palermo, Italy (June 16–18), pp.
330
334
.
175.
von Bismarck
,
G.
(
1974
). “
Timbre of steady sounds: A factorial investigation of its verbal attributes
,”
Acta Acust. united Acust.
30
(
3
),
146
159
.
176.
Wang
,
Y.
,
Kramer
,
S. E.
,
Wendt
,
D.
,
Naylor
,
G.
,
Lunner
,
T.
, and
Zekveld
,
A. A.
(
2018
). “
The pupil dilation response during speech perception in dark and light: The involvement of the parasympathetic nervous system in listening effort
,”
Trends Hear.
22
,
2331216518816603
.
177.
Watson
,
C. S.
(
2005
). “
Some comments on informational masking
,”
Acta Acust. united Acust.
91
(
3
),
502
512
.
178.
Wayand
,
J. F.
,
Levin
,
D. T.
, and
Varakin
,
D. A.
(
2005
). “
Inattentional blindness for a noxious multimodal stimulus
,”
Am. J. Psychol.
118
(
3
),
339
352
.
179.
Westbrook
,
A.
, and
Braver
,
T. S.
(
2015
). “
Cognitive effort: A neuroeconomic approach
,”
Cogn. Affect. Behav. Neurosci.
15
(
2
),
395
415
.
180.
Wierda
,
S. M.
,
van Rijn
,
H.
,
Taatgen
,
N. A.
, and
Martens
,
S.
(
2012
). “
Pupil dilation deconvolution reveals the dynamics of attention at high temporal resolution
,”
Proc. Natl. Acad. Sci. U.S.A.
109
(
22
),
8456
8460
.
181.
Wild
,
C. J.
,
Yusuf
,
A.
,
Wilson
,
D. E.
,
Peelle
,
J. E.
,
Davis
,
M. H.
, and
Johnsrude
,
I. S.
(
2012
). “
Effortful listening: The processing of degraded speech depends critically on attention
,”
J. Neurosci.
32
(
40
),
14010
14021
.
182.
Winkler
,
I.
,
Denham
,
S. L.
, and
Nelken
,
I.
(
2009
). “
Modeling the auditory scene: Predictive regularity representations and perceptual objects
,”
Trends Cogn. Sci.
13
(
12
),
532
540
.
183.
Winn
,
M. B.
, and
Teece
,
K. H.
(
2021
). “
Listening effort is not the same as speech intelligibility score
,”
Trends Hear.
25
,
23312165211027688
.
184.
Winn
,
M. B.
,
Wendt
,
D.
,
Koelewijn
,
T.
, and
Kuchinsky
,
S. E.
(
2018
). “
Best practices and advice for using pupillometry to measure listening effort: An introduction for those who want to get started
,”
Trends Hear.
22
,
2331216518800869
.
185.
Wöstmann
,
M.
, and
Obleser
,
J.
(
2016
). “
Acoustic detail but not predictability of task-irrelevant speech disrupts working memory
,”
Front. Hum. Neurosci.
10
,
538
.
186.
Wuehr
,
M.
,
Boerner
,
J. C.
,
Pradhan
,
C.
,
Decker
,
J.
,
Jahn
,
K.
,
Brandt
,
T.
, and
Schniepp
,
R.
(
2018
). “
Stochastic resonance in the human vestibular system—Noise-induced facilitation of vestibulospinal reflexes
,”
Brain Stimul.
11
(
2
),
261
263
.
187.
Yost
,
W. A.
(
2007
). “
Perceiving sounds in the real world: An introduction to human complex sound perception
,”
Front. Biosci.
12
(
9
),
3461
3467
.
188.
Zekveld
,
A. A.
,
Koelewijn
,
T.
, and
Kramer
,
S. E.
(
2018
). “
The pupil dilation response to auditory stimuli: Current state of knowledge
,”
Trends Hear.
22
,
2331216518777174
.
189.
Zeng
,
F.-G.
,
Fu
,
Q.-J.
, and
Morse
,
R.
(
2000
). “
Human hearing enhanced by noise
,”
Brain Res.
869
(
1
),
251
255
.
190.
Zhang
,
Y.
,
Lehmann
,
A.
, and
Deroche
,
M.
(
2021
). “
Disentangling listening effort and memory load beyond behavioural evidence: Pupillary response to listening effort during a concurrent memory task
,”
PLoS One
16
(
3
),
e0233251
.
191.
Zwicker
,
E.
, and
Fastl
,
H.
(
1999
).
Psychoacoustics
(
Springer
,
Berlin
).