Two experiments explored the effects of abrupt transitions in timbral properties [amplitude modulation (AM), pure tones vs narrow-band noises, and attack/decay envelope] on streaming. Listeners reported continuously the number of streams heard during 18-s-long alternating low- and high-frequency (LHL–) sequences (frequency separation: 2–6 semitones) that underwent a coherent transition at 6 s or remained unchanged. In experiment 1, triplets comprised unmodulated pure tones or 100%-depth AM was created using narrowly spaced tone pairs (dyads: 30- or 50-Hz modulation). In experiment 2, triplets comprised narrow-band noises, dyads, or pure tones with quasi-trapezoidal envelopes (10/80/10 ms), fast attacks and slow decays (10/90 ms), or vice versa (90/10 ms). Abrupt transitions led to direction-dependent changes in stream segregation. Transitions from modulated to unmodulated (or slower-modulated) tones, from noise bands to pure tones, or from slow- to fast-attack tones typically caused substantial loss of segregation (resetting), whereas transitions in the opposite direction mostly caused less or no resetting. Furthermore, for the smallest frequency separation, transitions in the latter direction usually led to increased segregation (overshoot). Overall, the results are reminiscent of the perceptual asymmetries found in auditory search for targets with or without a salient additional feature (or greater activation of that feature).
I. INTRODUCTION
The sound reaching our ears is often a mixture arising from more than one source in the environment, and so the auditory system is commonly faced with the task of inferring which parts of the incoming stimulation come from a given source. The perceptual process of grouping together those acoustic elements assumed to have a common origin and separating them from other elements is known as auditory scene analysis (Bregman, 1990). An important aspect of solving this scene analysis problem concerns how sound elements are grouped and separated over time, a process called auditory stream formation and segregation (Bregman and Campbell, 1971). Most studies of stream segregation have used rapid sequences of sounds that alternate between two subsets (A and B). Sequences of this kind can be heard either as one stream of sounds moving back and forth in whatever perceptual property distinguishes them (e.g., pitch) or as two independent streams, each containing only the elements from one or the other subset. The organization heard from moment-to-moment is bistable, with spontaneous switches occurring between integrated and segregated percepts (e.g., Pressnitzer and Hupé, 2006), but the proportion of time for which stream segregation is heard varies with sequence properties. For sequences whose acoustic and rhythmic properties remain constant throughout, the tendency for stream segregation also builds up over many seconds (e.g., Bregman, 1978; Anstis and Saida, 1985). The aim of the experiments reported here is to extend our understanding of the impact of sudden change on prior accumulated build-up, with a focus on acoustic properties for which the direction of change has asymmetric effects on subsequent streaming (Rogers and Bregman, 1998; Rajasingam , 2021).
For a given rate of presentation, the overall likelihood of stream segregation depends on the perceived similarity of the A and B subsets (Moore and Gockel, 2002, 2012). The more dissimilar the sounds are perceived to be, across a range of acoustic properties, the more likely they are to be segregated into separate streams. The extent to which different acoustic properties determine the likelihood of stream segregation has usually been explored by manipulating the magnitude of the difference between the A and B subsets on one or more acoustic dimensions and measuring its impact on streaming. Studies of this kind have shown that introducing peripheral channeling cues—differences between subsets in which auditory filters are activated and to what extent (Hartmann and Johnson, 1991)—promotes stream segregation. Indeed, most streaming experiments have used sequences in which the A and B subsets are pure tones that have different frequencies or occupy different frequency ranges (e.g., Bregman and Campbell, 1971; van Noorden, 1975), or are complex tones evoking different excitation patterns through differences in spectral shape or in the fundamental frequency of their resolved harmonics (e.g., Bregman , 1990). Differences of this kind provide the most robust cues for stream segregation. Nonetheless, salient differences in other acoustic properties that produce small or negligible peripheral channeling cues can increase, or even induce, stream segregation. These include differences between the A and B subsets in their attack and decay characteristics (e.g., Singh and Bregman, 1997), envelope type (steady pure tones vs fluctuating narrow-band noises; e.g., Cusack and Roberts, 2000), or their pitch value and strength for a set of unresolved harmonics with a common passband (e.g., Vliegen , 1999; Roberts , 2002).
The time course of build-up is very different from that of the strong segregation-promoting effect of preceding a sequence of alternating-frequency tones with either the A or B subset alone, which occurs much more rapidly (Rogers and Bregman, 1993; Roberts , 2008; Haywood and Roberts, 2013) and appears to have a different origin (see, e.g., Rajasingam , 2018). Build-up for alternating sequences decays over a few seconds if the sequence is interrupted by silence (Bregman, 1978; Beauvois and Meddis, 1997). Accounts of this slow build-up have been offered based on its proposed functional role as a conservative process of evidence accumulation that two different sources of sound are active rather than one (Bregman, 1978) or on its proposed physiological basis—e.g., slow adaptation leading to a narrowing of the receptive fields of frequency-tuned neurons in auditory cortex (Micheyl , 2005; see also Rankin , 2017).
Outside the realm of fire and car alarms, it is unusual to hear long sequences of unchanging sounds. Nonetheless, the impact on subsequent stream segregation of sudden coherent changes to an ongoing sequence—i.e., the imposition of the same change on the properties of the A and B subsets—has so far received relatively little attention. Few acoustic properties have been explored in this context, but the results to date reveal how little is known about the dynamics of stream segregation and its functional role in everyday listening. Most experiments on the effects of abrupt transitions on alternating sequences of lower- (L) and higher- (H) frequency pure tones have examined the impact of changing the center frequency of the sequence while preserving the frequency separation of the subsets (Anstis and Saida, 1985; Rajasingam , 2021) or of changes in lateralization of the subsets as manipulated via ear of presentation, interaural level difference, or interaural time difference cues (Rogers and Bregman, 1998). The result of sudden changes in these properties is a rapid loss of the build-up accumulated prior to the transition, either partial or complete. Elucidating whether this loss results from a passive failure of build-up to transfer to the different properties of the post-transition sounds or from an active resetting process can be hard to determine (see, e.g., Haywood and Roberts, 2010), but whatever its cause this loss is usually referred to as resetting.
For sudden changes in center frequency or lateralization, the effect of the direction of the transition on subsequent segregation is broadly symmetrical. There is, however, one property for which a clear asymmetry has long been established—the effect of transitions in level (Rogers and Bregman, 1998; Rajasingam , 2021). Sudden increases in level (softer-to-louder transitions) lead to partial resetting but decreases (louder-to-softer transitions) have little or no effect. Rogers and Bregman (1998) proposed that this asymmetry occurred because abrupt increases in level can indicate that a new source has become active, encouraging a one-stream interpretation from which evidence for segregation must build-up anew, whereas abrupt decreases do not.
It has recently been shown that abrupt changes in another acoustic property—the presence or absence of amplitude modulation (AM)—can lead to an even more striking asymmetry (Rajasingam , 2021, experiment 3). Specifically, changes from amplitude-modulated to steady (unmodulated) pure tones led to substantial resetting but changes from unmodulated to modulated tones led to little or no resetting and sometimes, particularly for the smallest high-low frequency difference tested [Δf = 4 semitones (ST)], to increased segregation compared with a sequence that was modulated from the start. To our knowledge, such an overshoot effect had not previously been reported. There is also no obvious equivalent of the functional account offered by Rogers and Bregman (1998) for the asymmetry observed for sudden transitions in level—why should a change from modulated to unmodulated sounds be interpreted as a new event when a change in the opposite direction is not? Other contexts in which a similar asymmetry has been observed include auditory search, in which a modulated target is easier to detect among unmodulated distractors than the reverse (e.g., Asemi , 2003; Cusack and Carlyon, 2003), modulation detection interference, in which the detection or discrimination of modulation on one carrier is impaired by the presence of another modulated carrier but is largely unaffected when the additional carrier is unmodulated (e.g., Yost and Sheft, 1989; Wilson , 1990; Carlyon, 2000), and masking, in which it is easier to detect a fluctuating noise added to a steady pure or complex tone than vice versa (e.g., Hellman, 1972; Gockel , 2002).
An advantage of the long sequences (20 s) and continuous reporting of perception used by Rajasingam (2021) is that the evolution of listeners' responses to a sudden transition could be observed (cf. the short test sequences and one judgment per trial method used by Rogers and Bregman, 1998). However, it was evident from the results that the inclusion of three abrupt transitions over the course of the tone sequence partly curtailed the measurement of the unfolding changes in the perceptual organization following each transition. Moreover, as acknowledged by Rajasingam (2021), the most appropriate reference condition was obvious only for the first transition, because accumulated build-up at later transitions would already have been influenced by earlier ones. This proved problematic when attempting to establish whether any increase in stream segregation observed after a later transition represented significant overshoot. Also, the Δf range used (4–8 ST) meant that there was limited scope for overshoot effects to be revealed for the larger values tested, owing to the extent to which segregation would already have built up before the transition, leading to ceiling effects.
The two experiments reported here adapted the approach taken by Rajasingam (2021) by using stimuli with only one transition in a long sequence and a less segregation-promoting Δf range (2–6 ST). These experiments extend the investigation of the directional effects of transitions on judgments of the number of streams heard by introducing sudden changes in a wider range of acoustic properties. These properties were selected to allow salient changes in timbre without introducing substantial peripheral-channeling cues and included transitions in AM rate, envelope type (steady vs fluctuating), and envelope shape (attack/decay). All these properties have been examined previously in the context of differences between the L and H subsets of sound sequences but not, to our knowledge, in the context of sudden coherent changes in both subsets. Implications of the results for our understanding of the dynamics of stream segregation are considered.
II. EXPERIMENT 1
This experiment used the adapted methodology outlined above to extend the exploration of the effects on subsequent streaming of sudden transitions between modulated and unmodulated tones (Rajasingam , 2021). In that study, the AM rate of the modulated tones was always 50 Hz, whereas here AM rates of 30 and 50 Hz were used. This allowed the creation not only of sequences involving transitions between unmodulated and modulated tones but also sequences involving transitions between tones with different AM rates. There is some evidence that differences in AM rate between the A and B subsets can initiate or increase stream segregation. For example, Grimault 2002) used sequences of ABA– triplets of broadband noise for which the AM rate of the A subset was fixed at 100 Hz and the AM rate of the B subset was varied across sequences from 100 to 800 Hz. They found that two streams were almost always heard when the AM rate of the B subset was more than double that of the A subset, indicating segregation based on differences in AM rate. Similar findings have since been reported by Dolležal (2012), using ABA– triplets of sinusoidal carriers, and by Nie and Nelson (2015), using alternating pairs of bandpass-filtered noise (A and B). Dolležal (2012) demonstrated that the effects of differences in AM rate on stream segregation arose primarily from temporal cues. To our knowledge, however, the effects on subsequent streaming of sudden coherent changes in AM rate applied simultaneously to both A and B subsets have not previously been investigated.
Exploring the effects of transitions between tones with different AM rates is important because Rajasingam (2021) proposed that the asymmetries they observed may have a similar origin to those seen in perceptual search tasks. In particular, the time needed to detect a modulated-tone target among pure-tone distractors is largely unaffected by the number of distractors, indicating “pop out,” whereas the time needed to detect a pure-tone target among modulated distractors increases with the number of distractors (e.g., Asemi , 2003; Cusack and Carlyon, 2003). This account of the asymmetry in perceptual search relies on whether the target has a salient additional “feature” not present in the distractors—in this case, modulation. By analogy, the directional effect of a transition between unmodulated and modulated tones depends on whether a new salient feature is present after the transition. Given that a transition between different rates of AM does not involve the introduction or removal of an additional feature, exploring the effects of these transitions on streaming can be used to assess the proposal of Rajasingam (2021).
A. Method
1. Listeners
Listeners, mostly students, were recruited from Macquarie University and the University of Cambridge. All gave informed consent and received payment for taking part. They were first tested using a screening audiometer (Macquarie: AS208, Cambridge: Infinity 2.0; Interacoustics, Middelfart, Denmark) to ensure that their audiometric thresholds at 0.5, 1, 2, and 4 kHz did not exceed 20 dB hearing level. All listeners passed this screening. They next took part in a training session designed to familiarize them with the task and stimuli before proceeding to the main session; exclusion criteria were predefined in relation to a listener's profile of responses in the reference conditions (see Sec. II A 3). Twelve listeners (six males, ten from Macquarie) successfully completed the experiment (mean age = 21.8 years, range = 19–28). This research was approved by the Macquarie University Human Research Ethics Committee (Reference No. 5201700786) and the Cambridge Psychology Research Ethics Committee (Application No. PRE.2019.093).
2. Stimuli and conditions
Each stimulus sequence was 18 s long (45 LHL– triplets) and comprised an induction sequence (6 s, 15 triplets) followed by a longer test sequence (12 s, 30 triplets). The acoustic properties of the two sequences were identical in the reference conditions, leading to a seamless whole, but were different in the experimental conditions, leading to a sudden transition at 6 s. The test sequences were sufficiently long to explore the unfolding consequences of an abrupt change in tone properties at the induction-test boundary. Each constituent tone and the silence at the end of each triplet was 100 ms long, giving an onset-to-onset duration between triplets of 400 ms. This rate of presentation is known to facilitate streaming based on frequency separation (e.g., Bregman and Campbell, 1971; van Noorden, 1975). The center frequency of the L tones was kept constant at 1 kHz and that of the H tones was set according to the desired value of Δf, which was 2, 4, or 6 ST. The center frequency of the H tones for these Δfs was 1125, 1259, and 1414 Hz, respectively. This range was chosen to enhance the likelihood of demonstrating increases in segregation (overshoot), as well as decreases (resetting), following a sudden transition in properties.
Two types of tone were used—pure tones and narrowly spaced pairs of pure tones known as dyads—which allowed sequences to be constructed involving abrupt transitions in the presence or rate of AM, and hence in timbre, between the induction and test sequences without introducing appreciable excitation-pattern cues. Pure tones were shaped using 10-ms raised-cosine ramps. Tone dyads were constructed by adding two pure tones of equal level and centered (either ±15 or ±25 Hz) on the frequency of their pure-tone counterparts. Each constituent tone was attenuated by 3 dB relative to its pure-tone counterpart, such that the root mean square (rms) power of each pure tone and corresponding dyad was the same. One constituent tone began in sine phase and the other in negative sine phase; their addition with 30-Hz or 50-Hz separation gave exactly 3 cycles (slower-modulation dyads) of full-depth AM or 5 cycles (faster-modulation dyads), respectively, over 100 ms. No further envelope shaping was required for the dyads; the rising half of the first cycle of modulation and the falling half of the last cycle acted as onset and offset ramps. These half cycles corresponded to ramp durations of 10.0 ms and 16.7 ms, respectively, for the faster- and slower-modulation stimuli. Given that the center frequency of the L-tone dyads was 1000 Hz (H tones = 2, 4, or 6 ST above), the two components were always unresolved (equivalent rectangular bandwidth of the auditory filter at 1000 Hz ≈ 132 Hz; Glasberg and Moore, 1990) and the average excitation pattern of each dyad and its pure-tone counterpart was almost identical. The strong sinusoidal AM of the dyads resulting from the interaction of the two components within the same auditory filter gave them a distinctive timbre and the difference in AM rate (30 Hz or 50 Hz) for the two types of dyad (3 or 5 cycles) was easily discernible.1
There were nine conditions; three were reference cases (C1–C3) and six were experimental (C4–C9). In C1–C3, all triplets throughout were composed of pure tones (PT), slower-modulation dyads (SD), or faster-modulation dyads (FD), respectively. All permutations of the three types of tone were represented in the experimental conditions; the composition of the induction and test sequences, respectively, was PT and SD (C4), SD and PT (C5), PT and FD (C6), FD and PT (C7), SD and FD (C8), and FD and SD (C9). Stimuli were synthesized in matlab 2018 b (Mathworks, Natick, MA) at a sampling rate of 48 kHz; sequences for each combination of condition and Δf were pre-assembled, stored as WAV files, and played back at 16-bit resolution. Diotic presentation was used throughout, and all tones were presented at 70 dB sound pressure level (SPL). At Macquarie University, sounds were presented over Sennheiser 380 Pro headphones (Hannover, Germany) via an Audio Express sound card (MOTU, Cambridge, MA). Output levels were calibrated using a type 2250 sound-level meter (Brüel and Kjaer, Naerum, Denmark) and an RA0045 microphone (GRAS, Holte, Denmark) coupled to the earphones with a type 43AG ear simulator (GRAS). At the University of Cambridge, sounds were presented over Sennheiser HD-600 headphones via an RME Fireface UCX audio system (Haimhausen, Germany). Output levels were calibrated with reference to a 1-kHz full-scale signal and headphone sensitivity data using an MDO3024 oscilloscope (Tektronix, Beaverton, OR).
3. Procedure
At both sites, listeners completed the experiment in a custom-built double-walled sound-attenuating chamber (IAC Acoustics, Naperville, IL). They were free to take breaks between blocks of trials if desired. Completing all stages (audiometry, training, and main experiment) usually took ∼2½ hours, divided into two sessions. The experiment was run using a program written in Python 3.8 (Peirce , 2019) using the PsychoPy toolbox (version 2020.2.3), a software package designed for precise stimulus presentation and key press recording (see Bridges , 2020). Each trial was initiated 0.5 s after the listener pressed the space bar on the keyboard. Listeners were instructed to monitor the sequence continuously throughout; they were asked to indicate as soon as possible whether they were hearing integration (one stream) or segregation (two streams) by pressing either the “Q” or “P” keys, respectively (the meaning of these keys was explained during training and supported by on-screen visual feedback). Thereafter, listeners were asked to press the appropriate key every time their perception of the sequence changed. They were asked to avoid listening actively for either integration or segregation, but simply to report which of the two percepts they heard at that moment; on occasions when the percept was ambiguous, listeners were asked to report the more dominant impression (Haywood and Roberts, 2013; Rajasingam , 2018, 2021). At the end of each trial, there was a 5-s pause before listeners could initiate the next trial. Combined with the trial-initiation delay (0.5 s), this ensured a minimum silent gap of 5.5 s during which any prior build-up would decay before the start of the next trial (Bregman, 1978).
Each combination of condition (nine levels) and Δf (three levels) was presented ten times in the main experiment, once in each block, giving 270 trials. Stimuli were presented in a newly randomized order in each block for each listener. Training consisted of completing a single block of trials just prior to starting the main experiment; the same approach was used as practice to begin the second session. Using three different Δfs also provided a useful means of predefining criteria for excluding data. It is well established in the literature that, for a given rate of presentation, an increase in the frequency separation between subsets of pure tones increases the tendency to hear two streams (Miller and Heise, 1950; van Noorden, 1975; Anstis and Saida, 1985). Therefore, for a listener's data to be included, the mean overall extent of segregation for the reference conditions (C1–C3) had to rise when Δf was increased from 2 ST to 4 ST and rise again when Δf was increased from 4 ST to 6 ST. Two listeners were excluded and replaced in this experiment.
4. Data analysis and availability
Once a response key was first pressed on a given trial, the associated value (one stream or two) was held until the other key was pressed or until the end of that trial. Response data from each trial were divided into eighteen 1-s-long time bins (i.e., 0–1 s, 1–2 s, …, 17–18 s).2 For each time bin, the percentage of time for which the listener reported the test sequence as segregated was computed from the timings of individual key presses. This value was recorded only if the listener's first response had occurred before a given time bin or within the first 0.5 s of that time bin. Owing to the small number of trials meeting this criterion for the 0–1 s time bin, responses made during that interval were used only in the context of calculations involving subsequent time bins; the 0–1 s time bin was excluded from all further analysis and graphical representation (Haywood and Roberts, 2013; Rajasingam , 2018, 2021).
For each listener, the data for each time bin were averaged across trial blocks separately for each combination of condition and Δf. Each mean was computed only from those trials for which that time bin met the acceptance criterion described above. Following the approach of our previous studies, on occasions when one of these means was missing for a particular listener (ten cases, corresponding to ∼0.2% of all time bins, and all occurring within the 1–2 s time bin), the missing value was replaced with the mean obtained from the other listeners. Finally, the data were averaged across listeners, for each combination of condition and Δf, to yield the overall mean percentage of time for which the sequence was heard as segregated for each time bin. This measure of the average time course of stream segregation over the sequence is used to display the results.
The effects of stimulus type per se were explored by comparing the extent of stream segregation across the reference conditions for the full duration of the sequences (in 1-s time bins, excluding 0–1 s). The time-series data obtained from the calculations described above were analyzed using repeated-measures analysis of variance (ANOVA); departures from sphericity were addressed using the Greenhouse-Geisser correction. The measure of effect size reported here is partial eta squared ( ). Comparisons among conditions were conducted using three factors—frequency separation between the L and H sounds (Δf), stimulus type (S), and time interval (T, with levels corresponding to time bins 1–2 s to 17–18 s). All ANOVAs reported here were computed using SPSS (SPSS statistics version 21, IBM Corp., Armonk, NY).
The effects of different abrupt transitions in acoustic properties at 6 s were explored by comparing the extent of stream segregation in the test sequence for each experimental condition with that for the appropriate reference case (i.e., the same stimulus type). Two approaches were used to make these comparisons. The main approach was to estimate the overall impact of different transitions on judgments of the test sequence. To do this, segregation scores were collapsed across time bins 7–8 s to 14–15 s to create a single 8-s-long interval; the last three time bins were omitted because any effects of the transition had mostly been lost by then and time bin 6–7 s was omitted owing to the usual short delay in the initial response to the transition at 6 s. Comparisons among the corresponding difference scores for this interval were conducted using a repeated-measures ANOVA with three factors—Δf, stimulus pair (P), and direction of change (D)—where stimulus pair specifies the stimulus types defining the transition and direction specifies the order of those two types (i.e., which was the inducer and which the test sequence). The latter factor was organized according to which direction of change for a given pair was predicted to be more integration-promoting for subsequent streaming judgments (i.e., to cause more resetting); these predictions were based on a combination of previous research (Rajasingam , 2021), pilot observations, and theoretical principles. The effects of individual transitions at different Δfs were explored further with planned pairwise comparisons (two-tailed) conducted using the restricted least-significant-difference test (Snedecor and Cochran, 1967; Keppel and Wickens, 2004).
In the second approach, comparisons were made between the experimental and reference cases for successive 1-s time bins throughout the sequence, using a bootstrapping analysis in matlab 2018 b to explore the evolution of the impact of the transition on subsequent judgments of streaming. This analysis estimated for each time bin the probability that the measured difference in segregation between the experimental condition and the relevant reference case was significantly different from zero. Note that the reference case used for comparison changed at 6 s, to reflect the transition in the experimental condition. The data were resampled 5000 times to ensure a stable estimate of the confidence intervals, and the Holm-Bonferroni method was used to correct the significance level for multiple comparisons. The research data and stimuli underlying this publication are available online from a repository hosted by Aston University.3
B. Results and discussion
The results averaged across listeners are shown in Fig. 1 (see supplementary material for alternative versions4). The top panels display the results for the reference (no transition) conditions, in which stimulus type remained constant throughout the 18-s sequence; the remaining panels display the results for the experimental (transition) conditions, in which there was an abrupt change in stimulus properties 6 s after the sequence began. The upper middle, lower middle, and bottom panels show the effects of abrupt transitions (in both directions) between stimulus types SD and PT, FD and PT, and SD and FD, respectively. For each stimulus pair, the transition directions predicted to be less or more integration-promoting are indicated in the insets as [+] (higher segregation) and [–] (lower segregation), respectively. These panels also reproduce the results for the corresponding reference cases. Time bins for which the bootstrapping analysis showed a significant difference (see Sec. II A 4) from the relevant reference case are indicated by filled black symbols; there were no cases in which this happened prior to the transition. Results for the reference and experimental conditions are considered in turn.
Inspection of the top panels of Fig. 1 shows clear evidence that stream segregation built up over time in the reference conditions, and was greater in rate of increase and final extent for larger Δfs. The observed patterns are in accord with research suggesting a faster initial phase of build-up over the first 10 s of a sequence followed by a slower phase thereafter (e.g., Anstis and Saida, 1985; Haywood and Roberts, 2013). There is also some indication that the dyad reference conditions (C2 and C3) were associated with more stream segregation than the pure-tone condition (C1) earlier on in the sequence, but that the difference between them tended to decline over time; a similar pattern was observed by Rajasingam (2021). No consistent effect of AM rate was apparent. The ANOVA for the reference conditions is presented in Table I. Two of the three factors influenced streaming as highly significant main effects—stream segregation was greater for larger frequency separations (means: 2 ST = 20.2%, 4 ST = 62.3%, and 6 ST = 75.9%; p < 0.001) and tended to increase over time (p < 0.001). There was no main effect of stimulus type (p = 0.148), but the stimulus type × time interval interaction was significant (p = 0.034), which is in accord with the difference between the response profiles for dyad and pure-tone sequences described above. The initial tendency towards greater segregation for the dyad sequences compared with otherwise-matched pure-tone sequences may have been a consequence of the modulated L and H tones sharing a common AM rate rather than a common relative bandwidth—the former is associated with more stream segregation than the latter for sequences of this kind (Cusack and Roberts, 1999) and so the difference in relative bandwidth may have acted as an additional segregation cue. The only other significant interaction term was Δf × time interval (p < 0.001), which arose mainly from the greater rate of change of stream segregation for larger Δfs during the initial phase of build-up.
Factor . | df . | F . | p . | η2p . |
---|---|---|---|---|
Frequency separation in sequence (Δf) | (1.069, 11.761) | 54.056 | <0.001 | 0.831 |
Stimulus type (S) | (1.419, 15.606) | 2.252 | 0.148 | 0.170 |
Time interval (T) | (1.692, 18.614) | 35.105 | <0.001 | 0.761 |
Δf × S | (2.339, 25.731) | 0.791 | 0.482 | 0.067 |
Δf × T | (5.014, 55.153) | 9.166 | <0.001 | 0.455 |
S × T | (5.262, 57.883) | 2.571 | 0.034 | 0.189 |
Δf × S × T | (4.880, 53.682) | 0.608 | 0.690 | 0.052 |
Factor . | df . | F . | p . | η2p . |
---|---|---|---|---|
Frequency separation in sequence (Δf) | (1.069, 11.761) | 54.056 | <0.001 | 0.831 |
Stimulus type (S) | (1.419, 15.606) | 2.252 | 0.148 | 0.170 |
Time interval (T) | (1.692, 18.614) | 35.105 | <0.001 | 0.761 |
Δf × S | (2.339, 25.731) | 0.791 | 0.482 | 0.067 |
Δf × T | (5.014, 55.153) | 9.166 | <0.001 | 0.455 |
S × T | (5.262, 57.883) | 2.571 | 0.034 | 0.189 |
Δf × S × T | (4.880, 53.682) | 0.608 | 0.690 | 0.052 |
Inspection of the other panels of Fig. 1 shows that sudden changes from modulated to unmodulated tones, or tones modulated at a slower rate, typically led to decreased subsequent segregation (i.e., to resetting of build-up), particularly for the larger values of Δf. Sudden changes in the opposite direction were less integration-promoting—the transition typically had little effect on subsequent streaming for the larger Δfs, and for a 2-ST Δf there was clear evidence of a tendency for stream segregation to increase, rather than to fall, following the transition (i.e., overshoot). The effect on streaming was typically greatest after ∼3 s for transitions causing resetting, but not until after ∼5 s for transitions causing overshoot. The time constant for decay of the effect of transition was considerably longer, but that effect was usually diminished, and sometimes lost altogether, by the last few seconds of the test sequence. Most of the 1-s time bins in the transition conditions identified in the bootstrapping analysis as significantly different from their references align with the resetting and overshoot effects described above, but the 7–8 s and 8–9 s time bins following the PT-to-FD transition in the 6-ST case are exceptions. Those cases can be accounted for in terms of the higher segregation for FD stimuli established prior to the transition, with which the sequence changing from PT to FD needs a few seconds to catch up. The last two time bins following the PT-to-SD transition in the 6-ST case are an exception without obvious explanation.
The ANOVA exploring the effects of abrupt changes in stimulus type is presented in Table II. The main effects of frequency separation and direction of change were highly significant (p < 0.001), and the main effect of the stimulus pair approached but did not quite reach significance (p = 0.061). None of the interaction terms were significant. The effects of individual transitions were explored further using pairwise comparisons that are summarized in Table III (parts a and b) for all 18 combinations of transition type (i.e., two directions per stimulus pair) and Δf. The mean values quoted correspond to the change in segregation scores, in percentage points (% pts), produced by the transition case relative to its reference over the selected 8.0-s time interval. Figure 2 displays these means and corresponding inter-subject standard errors across Δf for each stimulus pair in a separate panel; open and filled symbols indicate results for the transition directions predicted to be less and more integration-promoting, respectively. The top axis summarizes the corresponding mean difference scores between the two directions.
Factor . | df . | F . | p . | η2p . |
---|---|---|---|---|
Frequency separation in sequence (Δf) | (1.964, 21.605) | 33.651 | <0.001 | 0.754 |
Stimulus pair (P) | (1.499, 16.491) | 3.614 | 0.061 | 0.247 |
Direction of change (D) | (1.000, 11.000) | 28.770 | <0.001 | 0.723 |
Δf × P | (3.111, 34.226) | 1.804 | 0.163 | 0.141 |
Δf × D | (1.547, 17.012) | 0.068 | 0.892 | 0.006 |
P × D | (1.472, 16.191) | 1.820 | 0.197 | 0.142 |
Δf × P × D | (2.243, 24.671) | 0.121 | 0.907 | 0.011 |
Factor . | df . | F . | p . | η2p . |
---|---|---|---|---|
Frequency separation in sequence (Δf) | (1.964, 21.605) | 33.651 | <0.001 | 0.754 |
Stimulus pair (P) | (1.499, 16.491) | 3.614 | 0.061 | 0.247 |
Direction of change (D) | (1.000, 11.000) | 28.770 | <0.001 | 0.723 |
Δf × P | (3.111, 34.226) | 1.804 | 0.163 | 0.141 |
Δf × D | (1.547, 17.012) | 0.068 | 0.892 | 0.006 |
P × D | (1.472, 16.191) | 1.820 | 0.197 | 0.142 |
Δf × P × D | (2.243, 24.671) | 0.121 | 0.907 | 0.011 |
Part (a): Results for more integration-promoting transitions (differences re reference cases) . | |||
---|---|---|---|
Transition type . | Δf = 2 ST (mean, p) . | Δf = 4 ST (mean, p) . | Δf = 6 ST (mean, p) . |
SD to PT | +3.4% pts, 0.569 | −16.5% pts, 0.035 | −21.0% pts, <0.001 |
FD to PT | −1.2% pts, 0.810 | −28.7% pts, <0.001 | −35.5% pts, <0.001 |
FD to SD | −0.7% pts, 0.929 | −13.3% pts, 0.026 | −18.0% pts, 0.008 |
Part (a): Results for more integration-promoting transitions (differences re reference cases) . | |||
---|---|---|---|
Transition type . | Δf = 2 ST (mean, p) . | Δf = 4 ST (mean, p) . | Δf = 6 ST (mean, p) . |
SD to PT | +3.4% pts, 0.569 | −16.5% pts, 0.035 | −21.0% pts, <0.001 |
FD to PT | −1.2% pts, 0.810 | −28.7% pts, <0.001 | −35.5% pts, <0.001 |
FD to SD | −0.7% pts, 0.929 | −13.3% pts, 0.026 | −18.0% pts, 0.008 |
Part (b): Results for less integration-promoting transitions (differences re reference cases) . | |||
---|---|---|---|
Transition type . | Δf = 2 ST (mean, p) . | Δf = 4 ST (mean, p) . | Δf = 6 ST (mean, p) . |
PT to SD | +20.6% pts, 0.003 | +0.1% pts, 0.992 | −5.0% pts, 0.177 |
PT to FD | +25.4% pts, <0.001 | −4.1% pts, 0.552 | −5.5% pts, 0.081 |
SD to FD | +20.0% pts, 0.010 | +3.2% pts, 0.650 | −2.4% pts, 0.466 |
Part (b): Results for less integration-promoting transitions (differences re reference cases) . | |||
---|---|---|---|
Transition type . | Δf = 2 ST (mean, p) . | Δf = 4 ST (mean, p) . | Δf = 6 ST (mean, p) . |
PT to SD | +20.6% pts, 0.003 | +0.1% pts, 0.992 | −5.0% pts, 0.177 |
PT to FD | +25.4% pts, <0.001 | −4.1% pts, 0.552 | −5.5% pts, 0.081 |
SD to FD | +20.0% pts, 0.010 | +3.2% pts, 0.650 | −2.4% pts, 0.466 |
All transitions predicted to be more integration-promoting (part a) led to substantial and significant resetting for 4-ST and 6-ST Δfs (range: –13.3% pts to –35.5% pts, p = 0.035—p < 0.001), and all transitions predicted to be less integration-promoting (part b) led to substantial and significant overshoot for the 2-ST Δf (range: +20.0% pts to +25.4% pts, p = 0.010—p < 0.001). The remaining cases shown in parts a and b were associated with only small and non-significant effects. Importantly, sudden changes in the rate of AM produced directional effects like those observed for transitions between unmodulated and modulated tones. When collapsed across Δf and stimulus pair, the transitions predicted to be more integration-promoting were associated on average with 20.4% pts less segregation than their counterparts. Transitions between PT and FD stimuli were associated with the largest mean difference scores. The overall pattern of response profiles suggests a summation of the effects of three factors—one based on the direction of the transition (with greater resetting following transitions from more to less modulated tones), one based on Δf (less resetting is possible for small Δfs) and one that biases subsequent streaming judgments towards segregation. This proposal is considered further in Sec. IV.
Two other factors that might have affected the results merit brief comment. The tone dyads may have generated audible distortion products (e.g., Goldstein, 1967) that introduced greater than anticipated differences in peripheral channeling between the dyads and corresponding pure tones. However, as noted by Rajasingam (2021), there is no obvious mechanism by which these distortion products might account for the strong directional effects observed for transitions between these stimulus types. Also, the loudness of corresponding modulated and unmodulated tones with the same rms power may not have been identical (Moore , 1999). However, any difference between them would have been small compared with that resulting from the 12-dB transitions in level used previously (Rogers and Bregman, 1998; Rajasingam , 2021), and so seems unlikely to account for the pronounced asymmetries on streaming judgments observed here.
One aspect of the current results that differs somewhat from the nearest equivalent data (Rajasingam , 2021, experiment 3) concerns the effect of Δf. That experiment included two conditions in which each tone sequence contained three transitions, rather than just one as here, between exactly corresponding FD and PT stimuli. It is not straightforward to identify a suitable reference case against which to assess the effects of second and subsequent transitions, but for one of those conditions, there was an initial transition from PT to FD stimuli 5.2 s after the sequence began, which is only 0.8 s earlier than for the corresponding case in the current study. This first transition was associated with significant overshoot for a Δf of 4 ST (the smallest value tested), whereas overshoot in the experiment reported here required a smaller Δf (2 ST) to be significant, albeit with the greater size anticipated for the inclusion of the 2-ST case. Notwithstanding the other contextual differences between the two experiments, this discrepancy suggests that the extent of the tendency towards more segregated judgments for smaller Δfs may have been influenced by the Δf range to which the listeners were exposed.
The finding that transitions in AM rate produced asymmetric effects on streaming like those seen for transitions between unmodulated and modulated tones casts some doubt on the notion that these asymmetries are akin to those found in perceptual search tasks (e.g., Asemi , 2003; Cusack and Carlyon, 2003). This is because it seems highly plausible that a modulated target would stand out among unmodulated distractors, owing to the presence of an additional feature (AM), whereas a change in AM rate does not constitute an additional feature. Nonetheless, a modified version of this account remains possible in which the asymmetry can arise from one set of sounds causing greater activation on a given perceptual dimension, rather than necessarily possessing an extra feature. Cusack and Carlyon (2003) came to this conclusion in the context of their finding that longer sounds were easier to select from short distractors than vice versa in an auditory search task. In the context of the current streaming study, this implies that higher rates of AM lead to greater activation on some perceptual dimensions than do lower rates. Presumably, greater activation on a given dimension equates to higher salience, and hence to easier attentional selection. Counterintuitively, however, applying the same logic to the effects of sudden changes in level on subsequent streaming implies that louder sounds lead to less activation than softer ones. This conundrum is considered further in Sec. IV.
III. EXPERIMENT 2
Given that sudden changes in level and in the presence or rate of AM are the only changes thus far known to be associated with clear directional effects on subsequent stream segregation, this asymmetry was explored further by extending the range of stimulus types tested. Specifically, we examined the effects of transitions between fluctuating sounds (narrow-band noises) and non-fluctuating sounds (pure tones), and between pure tones with different envelope shapes (damped vs ramped). Transitions between these stimulus types do not involve substantial peripheral-channeling cues. The effects of transitions between PT and FD stimuli were also measured in the same context to act as a benchmark for comparison with the effects of the other transitions. The inclusion of transitions between narrow-band noises and pure tones was motivated by evidence of a strong asymmetry between these stimuli in perceptual search tasks (narrow-band noises are more salient; Asemi , 2003) and pilot observations made with our streaming task. There is also a marked difference in timbre between these stimuli, which is known to facilitate stream segregation when the L and H subsets are distinguished from one another in this way (e.g., Cusack and Roberts, 2000).
The inclusion of transitions between damped and ramped tones was motivated by evidence that the attack duration of a sound is an important determinant of its timbre (e.g., Iverson, 1995) and that differences in attack/decay profile across the L and H subsets of a pure-tone sequence increase their tendency to segregate (Singh and Bregman, 1997). Our initial hypothesis was that the percussive character of damped sounds, with their more definite perceived onset, might act as an additional feature, such that transitions from damped to ramped (i.e., loss of that feature) might be more integration-promoting than vice versa. However, pilot observations indicated the opposite pattern. Perhaps relevant here is the finding of Cusack and Carlyon (2003) for auditory search, noted above, that longer sounds are easier to select from short distractors than vice versa. This is because ramped sounds are known to be perceived as longer than corresponding damped sounds when they have equal intensity and physical duration (e.g., Stecker and Hafter, 2000; Wang , 2014).
A. Method
Except where described, the same method was used as for experiment 1. Twelve listeners (four males, nine from Macquarie, mean age = 23.9 years, range = 18–32) took part and successfully completed the experiment; no listeners were excluded and replaced. Eight listeners had previously completed experiment 1. In this experiment, triplets composed of pure tones or faster-modulation dyads were retained in the same form as for experiment 1, but triplets composed of narrow-band noises (NB) with 100-ms quasi-trapezoidal amplitude envelopes (including ±10 ms raised cosine ramps) or pure tones with asymmetric envelopes were also used. The latter sounds had linear ramps and were either “damped” with a fast attack (FA, 10 ms) and slow decay (90 ms) or “ramped” with a slow attack (SA, 90 ms) and fast decay (10 ms). Narrow-band noises were created by the addition of sine waves spaced 1 Hz apart with a random starting phase for each component; each token was generated using a different set of random phases. The equal-amplitude passband was ±1 ST relative to the center frequency—e.g., the lower and upper cutoff frequencies for a band with a 1000-Hz center frequency were 943.9 and 1059.5 Hz. To avoid generating edge pitches (von Békésy, 1963), components were also present outside the passband. These components were attenuated by 80 dB/octave and those below −60 dB were excluded. For each condition involving noise bursts, a hundred unique sequences were generated and stored as WAV files, from which one was selected at random on any given trial with the constraint that no unique sequence could be presented twice to the same listener. All stimuli had the same rms power and presentation level (70 dB SPL).
There were eleven conditions, five of which were reference cases (C1–C5) and six were experimental (C6–C11). In C1–C5, all triplets throughout were composed of stimulus types PT, SA, FA, NB, or FD, respectively. Only a subset of the possible permutations was represented in the experimental conditions; the composition of the induction and test sequences, respectively, was PT and FD (C6), FD and PT (C7), PT and NB (C8), NB and PT (C9), SA and FA (C10), and FA and SA (C11). Each combination of condition (11 levels) and Δf (three levels) was presented ten times in the main experiment, once in each block, giving 330 trials. Listeners usually took ∼3 h to complete the experiment over two sessions. Time-series data were computed from listeners' responses as for experiment 1. When an individual mean was missing (five cases, ∼0.1% of the data and all within the 1–2 s time bin), the missing value was replaced as before using mean imputation. Again, the results were analyzed using repeated-measures ANOVAs and bootstrapping.
B. Results and discussion
The results averaged across listeners are shown in Fig. 3 (see supplementary material for alternative versions4). As before, the top panels display the results for the reference conditions and the remaining panels display the results for the experimental conditions. The upper middle, lower middle, and bottom panels show the effects of abrupt transitions (in both directions) between stimulus types FD and PT, NB and PT, and SA and FA, respectively, relative to their reference cases. Experimental time bins for which there was a significant difference from the reference case in the bootstrapping analysis are shown by filled black symbols. Results for the reference and experimental conditions are considered in turn.
Figure 3 again shows clear evidence of build-up in the reference conditions, and for that build-up to be greater for larger Δfs. There is also an indication that, compared with the other reference conditions, the NB case (C4) caused a slower rate of build-up and a smaller increase in stream segregation with increasing Δf. The differences between dyad and pure-tone sequences observed in experiment 1 were largely absent here. Consistent with Singh and Bregman (1997), there was little evidence to suggest systematic differences in stream segregation between sequences of FA and SA tones. The apparent dip in reported segregation around 10 s for the SA condition (C2) when Δf = 4 ST does not have any obvious cause—it cannot easily be attributed to specific listeners and is assumed to have occurred by chance.
The ANOVA for the reference conditions is presented in Table IV. Stream segregation was greater for larger Δfs (means: 2 ST = 25.7%, 4 ST = 55.1%, and 6 ST = 68.0%; p < 0.001) and tended to increase over time (p < 0.001). There was no main effect of stimulus type (p = 0.468), but both two-way interaction terms involving it were significant. The main cause of the stimulus type × time interval (p = 0.004) and Δf × stimulus type (p = 0.017) interactions is probably the pattern of results for the NB condition. The former probably reflects the shallower response profiles seen for the NB condition and the latter probably reflects the slower rise in segregation with increasing Δf for that condition. Although PT and NB sequences have occasionally been included in the same experiments (e.g., Dannenbring and Bregman, 1976), we are not aware of any studies that have compared the build-up of stream segregation for alternating sequences of pure tones with that for otherwise equivalent narrow-band noises. In addition, the Δf × time interval interaction was highly significant (p < 0.001); as for experiment 1, this arose mainly from the greater rate of change of stream segregation for larger Δfs during the initial phase of build-up. The three-way interaction term was not significant.
Factor . | df . | F . | p . | η2p . |
---|---|---|---|---|
Frequency separation in sequence (Δf) | (1.342, 14.766) | 27.678 | <0.001 | 0.716 |
Stimulus type (S) | (1.415, 15.564) | 0.690 | 0.468 | 0.059 |
Time interval (T) | (1.803, 19.838) | 31.903 | <0.001 | 0.744 |
Δf × S | (3.580, 39.377) | 3.580 | 0.017 | 0.246 |
Δf × T | (4.880, 53.681) | 11.619 | <0.001 | 0.514 |
S × T | (6.741, 74.155) | 3.310 | 0.004 | 0.231 |
Δf × S × T | (7.690, 84.588) | 1.727 | 0.107 | 0.136 |
Factor . | df . | F . | p . | η2p . |
---|---|---|---|---|
Frequency separation in sequence (Δf) | (1.342, 14.766) | 27.678 | <0.001 | 0.716 |
Stimulus type (S) | (1.415, 15.564) | 0.690 | 0.468 | 0.059 |
Time interval (T) | (1.803, 19.838) | 31.903 | <0.001 | 0.744 |
Δf × S | (3.580, 39.377) | 3.580 | 0.017 | 0.246 |
Δf × T | (4.880, 53.681) | 11.619 | <0.001 | 0.514 |
S × T | (6.741, 74.155) | 3.310 | 0.004 | 0.231 |
Δf × S × T | (7.690, 84.588) | 1.727 | 0.107 | 0.136 |
The effects of transitions are shown in the other panels of Fig. 3. Sudden changes from FD to PT, from NB to PT, or from SA to FA (ramped to damped) typically led to decreased subsequent segregation (i.e., resetting of build-up). This was especially true for the larger Δfs tested, but there was also more evidence of resetting overall when Δf was 2 ST than was apparent in experiment 1. The latter was mainly attributable to the NB-to-PT transition, for which there was more build-up prior to the change and hence more scope for resetting. Sudden changes in the opposite direction were less integration-promoting for all transitions tested, but the exact pattern varied across stimulus pairs and Δf. For transitions from PT to FD or NB, overshoot was characteristic for the 2-ST case, but the effect on subsequent streaming was typically small for larger values of Δf. The outcome was somewhat different for FA-to-SA transitions. There was a hint of overshoot for the 2-ST case, little effect at 4 ST, but clear evidence of resetting at 6 ST. Overall, the time constants for the development and decay of the impact of a transition on streaming appear broadly similar to those observed in experiment 1, except perhaps for a slightly faster and more complete decay by the last few seconds of the sequence. Again, most time bins in the transition conditions identified in the bootstrapping analysis as significantly different from their references align with the resetting and overshoot effects described earlier, but the last three time bins following the FA-to-SA transition in the 6-ST case are an exception without obvious explanation.
The ANOVA exploring abrupt changes in stimulus type is presented in Table V. The main effects of frequency separation and direction of change were highly significant (p < 0.001), but there was no significant effect of stimulus pair (p = 0.619). The only significant interaction term was stimulus pair × direction of change (p = 0.005). This interaction was mainly a result of changes in the size of the difference scores for different stimulus pairs, particularly the much smaller effect of transition direction for the FA and SA pair. The effects of individual transitions over the selected 8.0-s time interval were explored further using the pairwise comparisons summarized in Table VI (parts a and b). Figure 4 displays the mean changes in segregation score and corresponding inter-subject standard errors across Δf for each stimulus pair in a separate panel; the top axis again summarizes the mean difference scores between the two transition directions.
Factor . | df . | F . | p . | η2p . |
---|---|---|---|---|
Frequency separation in sequence (Δf) | (1.715, 18.863) | 16.200 | <0.001 | 0.596 |
Stimulus pair (P) | (1.811, 19.925) | 0.460 | 0.619 | 0.040 |
Direction of change (D) | (1.000, 11.000) | 123.718 | <0.001 | 0.918 |
Δf × P | (2.504, 27.541) | 1.874 | 0.165 | 0.146 |
Δf × D | (1.455, 16.006) | 1.818 | 0.198 | 0.142 |
P × D | (1.781, 19.587) | 7.316 | 0.005 | 0.399 |
Δf × P × D | (2.587, 28.457) | 0.302 | 0.795 | 0.027 |
Factor . | df . | F . | p . | η2p . |
---|---|---|---|---|
Frequency separation in sequence (Δf) | (1.715, 18.863) | 16.200 | <0.001 | 0.596 |
Stimulus pair (P) | (1.811, 19.925) | 0.460 | 0.619 | 0.040 |
Direction of change (D) | (1.000, 11.000) | 123.718 | <0.001 | 0.918 |
Δf × P | (2.504, 27.541) | 1.874 | 0.165 | 0.146 |
Δf × D | (1.455, 16.006) | 1.818 | 0.198 | 0.142 |
P × D | (1.781, 19.587) | 7.316 | 0.005 | 0.399 |
Δf × P × D | (2.587, 28.457) | 0.302 | 0.795 | 0.027 |
Part (a): Results for more integration-promoting transitions (differences re reference cases) . | |||
---|---|---|---|
Transition type . | Δf = 2 ST (mean, p) . | Δf = 4 ST (mean, p) . | Δf = 6 ST (mean, p) . |
FD to PT | −15.3% pts, 0.008 | −20.2% pts, 0.004 | −26.5% pts, 0.002 |
NB to PT | −24.3% pts, <0.001 | −33.1% pts, <0.001 | −29.3% pts, <0.001 |
SA to FA | −2.8% pts, 0.393 | −19.9% pts, <0.001 | −19.3% pts, 0.007 |
Part (a): Results for more integration-promoting transitions (differences re reference cases) . | |||
---|---|---|---|
Transition type . | Δf = 2 ST (mean, p) . | Δf = 4 ST (mean, p) . | Δf = 6 ST (mean, p) . |
FD to PT | −15.3% pts, 0.008 | −20.2% pts, 0.004 | −26.5% pts, 0.002 |
NB to PT | −24.3% pts, <0.001 | −33.1% pts, <0.001 | −29.3% pts, <0.001 |
SA to FA | −2.8% pts, 0.393 | −19.9% pts, <0.001 | −19.3% pts, 0.007 |
Part (b): Results for less integration-promoting transitions (differences re reference cases) . | |||
---|---|---|---|
Transition type . | Δf = 2 ST (mean, p) . | Δf = 4 ST (mean, p) . | Δf = 6 ST (mean, p) . |
PT to FD | +22.7% pts, 0.002 | +7.4% pts, 0.372 | −3.1% pts, 0.276 |
PT to NB | +18.4% pts, 0.057 | +10.8% pts, 0.028 | +3.0% pts, 0.453 |
FA to SA | +13.2% pts, 0.054 | −7.5% pts, 0.066 | −14.8% pts, 0.002 |
Part (b): Results for less integration-promoting transitions (differences re reference cases) . | |||
---|---|---|---|
Transition type . | Δf = 2 ST (mean, p) . | Δf = 4 ST (mean, p) . | Δf = 6 ST (mean, p) . |
PT to FD | +22.7% pts, 0.002 | +7.4% pts, 0.372 | −3.1% pts, 0.276 |
PT to NB | +18.4% pts, 0.057 | +10.8% pts, 0.028 | +3.0% pts, 0.453 |
FA to SA | +13.2% pts, 0.054 | −7.5% pts, 0.066 | −14.8% pts, 0.002 |
Other than the SA-to-FA case when Δf was 2 ST, all transitions predicted to be more integration-promoting (part a) led to substantial and significant resetting (range: –15.3% pts to –33.1% pts, p = 0.008—p < 0.001). A more complex pattern emerged for the transitions predicted to be less integration-promoting (part b). For the 2-ST case, there was clear and significant overshoot for the PT-to-FD transition (+22.7% pts, p = 0.002); for the other stimulus pairs the overshoot was smaller and approached but did not quite reach significance (PT to NB: +18.4% pts, p = 0.057; FA to SA: +13.2% pts, p = 0.054). This outcome is unsurprising given the variability in the data. For the larger Δfs, most cases were associated with only small and non-significant effects, but there were two exceptions. First, the PT-to-NB transition led to significant overshoot for the 4-ST case. Second, the FA-to-SA transition led to significant resetting for the 6-ST case. More generally, transitions between FA and SA stimuli were associated with considerably smaller mean difference scores than for the other stimulus pairs. This outcome is consistent with the idea that the perceived dissimilarity between FA and SA stimuli was smaller than for the other stimulus pairs, which may account for the clear resetting observed for the 6-ST case (see also Sec. IV). When collapsed across Δf and stimulus pair, the transitions predicted to be more integration-promoting on average led to 26.7% pts less segregation than their counterparts. As for experiment 1, the overall pattern of response profiles suggests a summation of the effects of three factors, one based on transition direction, one on Δf, and one that biases subsequent streaming judgments towards segregation.
The results for the FD-to-PT and PT-to-FD transitions across Δf were similar to those for their exact counterparts in experiment 1. The results for the NB-to-PT and PT-to-NB transitions had broadly similar profiles across Δf to their counterparts involving FD stimuli, suggesting that the underlying asymmetry in the effect of changes between fluctuating and steady stimuli, and vice versa, is like that for changes between modulated and unmodulated tones. Of course, it should be acknowledged that the amplitude fluctuations of a noise band become slower and more salient as its bandwidth narrows. There will inevitably be some differences in peripheral channeling cues between corresponding pure tones and narrow-band noises with a 2-ST passband, and there may also be small differences in loudness when matched for rms power. However, there is again no obvious mechanism by which these differences might account for the strong directional effects observed for these transitions.
The finding here of directional effects, albeit smaller ones, of transitions between pure tones with different amplitude envelopes (damped vs ramped) further extends the range of stimulus types known to produce these asymmetric effects on stream segregation. If the basis for the observed effects of direction did arise from differences in perceived duration, these will have been small compared with the differences in physical duration between targets and distractors used by Cusack and Carlyon (2003), which might account for why the difference scores seen here were relatively small. Another factor meriting comment is that the perceptual center of a sound (P center; Morton , 1976) is influenced by its amplitude envelope, such that the P centers of damped sounds (for which the energy peak occurs early) are before those of ramped sounds (e.g., Howell, 1984). Hence, a sudden change from damped to ramped sounds affects the rhythm of a physically isochronous sequence by slightly lengthening the perceived beat duration across the transition (and vice versa for ramped to damped). However, this effect is quite small and does not seem to offer any obvious explanation for the asymmetry seen at smaller Δfs for the direction of the transition.
IV. GENERAL DISCUSSION
The experiments reported here have established that sudden coherent changes in a range of acoustic properties without substantial peripheral-channeling cues can lead to asymmetric effects on subsequent streaming. One direction of change may lead to major loss of built-up segregation (resetting) whereas the other leads to little or no resetting, and in some cases may promote rather than reset stream segregation (overshoot). Transitions from unmodulated to modulated tones (see also Rajasingam , 2021) or to noise bands, or from slower to faster rates of AM were less integration-promoting than the reverse changes. Transitions from fast-attack (damped) to slow-attack (ramped) amplitude envelopes also tended to be less integration-promoting than vice versa, but the effect of direction was smaller at all Δfs (and negligible at 6 ST). Overall, the effect of the less integration-promoting direction was usually manifest as overshoot for smaller Δfs and as reduced resetting for larger Δfs. Overshoot is a striking phenomenon in this context—it is easy to see why a sudden change in one or other subset of sounds might promote subsequent streaming (e.g., Rajasingam , 2018), but it is not immediately obvious why a sudden change in both subsets might also do so. More generally, the asymmetries observed are hard to explain using classical accounts of streaming, because the extent of perceived dissimilarity between the two stimulus types in a pair does not depend on the direction of the transition. Instead, we propose an account based on the different effects of switching attention towards or away from sounds possessing a salient additional feature (or with greater activation on that perceptual dimension).
Overall, the response profiles obtained here after abrupt transitions suggest a summation of the effects of three broadly independent factors—one based on the direction of the transition (one strongly integration-promoting, the other having little effect), one based on Δf (for smaller values, there is little prior build-up and so little resetting is possible), and one causing a subsequent bias towards segregation lasting several seconds (analogous to the effect of preceding an alternating-frequency sequence with a constant-frequency sequence corresponding to one or other subset). In this framework, overshoot is observed only when the scope for loss of segregation is minimized (i.e., the less integration-promoting transition direction combined with a small Δf), such that the third factor dominates. The effects of all these factors for the transition types tested are clearly visible in Figs. 2 and 4. The outcomes for the FA and SA pair can be accommodated within this framework by assuming a smaller difference in salience between the constituents than for the other pairs. This would have led to reduced asymmetry—i.e., FA-to-SA transitions would have retained more integration-promoting properties—such that, following a transition, the increased scope for loss of segregation for sequences with larger Δfs soon became the dominant factor.
Comparison of the results of the current study (Δf = 2–6 ST) with those of Rajasingam (2021; Δf = 4–8 ST) suggests that the effect of Δf is influenced by context—overshoot requires a smaller Δf to be manifest when the overall mean Δf for the stimulus set is lower (or the highest and lowest values are lower). Context effects arising from differences in stimulus range are often found in perception research. Other kinds of context effects involving Δf have been reported for sequences of alternating-frequency tones—less streaming was perceived for a given Δf in the current trial with increasing Δf in the previous trial (e.g., Snyder , 2008; Snyder , 2009).
As far as we are aware, there is only one other finding in the streaming literature akin to the overshoot that can occur after a sudden change in the properties of an alternating-frequency sequence. Rankin (2017) used pure-tone sequences composed of six LHL– triplets into which a perturbation could be introduced—either an additional tone with a frequency 2 ST above that of the H tones was inserted in the usual gap between the third and fourth triplets (distractor case) or the H tone of the third triplet was replaced by a tone 2-ST higher (deviant case). Listeners were asked to report at the end of each trial whether the final triplet was heard as one stream or two. Compared with the standard case (no perturbation), the distractor and deviant cases both promoted stream segregation. Nonetheless, there are important differences in stimulus configuration and task between the two studies. Modifying our stimulus configuration to include transients rather than transitions (e.g., a single deviant triplet starting at 6 s) and comparing the effects of changes affecting both subsets (deviant triplet) with those affecting only one subset (deviant L or H tone) should help illuminate the relationship between our results and those of Rankin (2017). More generally, the results reported by Rankin (2017) may correspond to the effects of the third factor in the framework described above. One possible account of why some of the transitions in the current study may have produced a similar segregation-promoting effect is considered later in this discussion.
Our understanding of the neurophysiological bases for the build-up and resetting of stream segregation has not changed greatly since the study of Micheyl (2005). By this account, build-up is assumed to arise from slow adaptation leading to a progressive narrowing of the receptive fields of frequency-tuned neurons in primary auditory cortex (A1). This leads to the emergence of two distinct subpopulations—one responding to the L subset and the other to the H subset—and hence to the perception of two streams. The slow adaptation is presumably the result of long-term synaptic depression (e.g., Pressnitzer , 2008); resetting triggered by sudden change is assumed to reflect fast recovery from this adaptation. This account can be adapted to explain the directional effects of ±12-dB transitions seen by Rogers and Bregman (1998) and Rajasingam (2021). Specifically, a rising transition (softer-to-louder) raises neural activity, partly offsetting the build-up of adaptation; this widens the A1 receptive fields again leading to greater overlap between populations tuned to the L and H frequencies, favoring a one-stream interpretation until further adaptation erodes it again. A falling transition (louder-to-softer) cannot raise neural activity, but it may not necessarily lower greatly the cortical response. This is because rising transients usually affect central neural responses more than falling ones, leading in this instance to little or no effect on cortical receptive field bandwidths and hence on streaming. However, this kind of account does not seem able to explain the overshoot observed for some kinds of transition at smaller Δfs. Transitions involving the loss or gain of a new stimulus feature (e.g., the presence or absence of modulation or fluctuation) presumably involve the activation of other populations of neurons coding those features, and hence more complex physiological models will be needed to explain the asymmetric effects on streaming observed for those properties.
It has long been known that focused attention can influence auditory streaming, but the extent to which the build-up of segregation takes place automatically or requires focused attention (e.g., Carlyon , 2001) remains to be answered fully. Recent research suggests that at least part of this build-up occurs automatically because subsequent recovery from the reduced segregation caused by performing a competing auditory task during the first part of an alternating sequence occurs more rapidly than the initial build-up when that sequence is attended throughout (Billig and Carlyon, 2016). It is well established, however, that sudden switches in attentional focus affect subsequent streaming (Cusack , 2004; Thompson , 2011; Rajasingam , 2018). In the context of the current study, a possible candidate for the general bias towards segregation after a transition is that the sudden change always began with the L subset of sounds. This may have triggered a switch in attention to that subset. Indeed, there is evidence that preceding an ABA– sequence with only a single tone, matching one or other subset, can promote subsequent segregation (Haywood and Roberts, 2013). In this conception, the directional effects on streaming found for transitions between, e.g., unmodulated and modulated tones arise from an attentional asymmetry (Rajasingam , 2021).
As noted earlier, the auditory search literature (e.g., Asemi , 2003; Cusack and Carlyon, 2003) indicates that some sounds have additional features that are salient and elicit exogenous orienting (e.g., modulated sounds) whereas others do not (e.g., unmodulated sounds). By this account, a switch in attention away from sounds with primary attention-grabbing properties (loss of a salient feature or reduced activation on that dimension) leads to resetting but a switch towards them (gain or greater activation of a salient feature) does not. Where there is less difference in salience between the stimulus types, the asymmetry may soon be eclipsed by the effects of other factors (e.g., as we have proposed for the FA and SA pair at larger Δfs). By this account, the broader question of what constitutes a relevant additional feature remains to be answered—e.g., why is a percussive onset apparently not such a feature? Further research with the task and stimulus configuration used here might be extended from amplitude- to frequency-modulated (FM) tones. FM is highly salient and strongly associated with asymmetry in auditory search tasks (e.g., Cusack and Carlyon, 2003). On that basis, transitions from steady pure tones to FM tones would be predicted to cause strong overshoot for smaller Δfs.
In conclusion, for sequences of sounds in which there is a correlated transition in acoustic properties—i.e., where the high- and low-frequency subsets change together to the same extent on the same dimension—the effect of a sudden change can be influenced not only by the property being altered but also by the direction of that change. Whether subsequent stream segregation is increased or decreased, compared with the no-transition reference case, appears to depend on the balance between the factors promoting resetting (e.g., transitions from modulated to unmodulated sounds favor integration) and those favoring a general bias towards stream segregation (probably through switches in attention). Further research is needed to uncover the basis for these directional asymmetries in the underlying neurophysiology and cognitive architecture of auditory stream segregation. This approach should help elucidate further the functional significance of stream segregation in the changing auditory scenes encountered in daily life.
ACKNOWLEDGMENTS
This research was supported by Aston University's Visiting Scholars' Scheme, which part-funded a research visit by B.R. to N.R.H. at the Hearing Hub, Macquarie University, Sydney, Australia, in November and December 2019. Our thanks go to James Hill for his assistance with data collection and analysis for a pilot version of experiment 1 to explore the effects of sudden transitions in the rate of AM on subsequent stream segregation. We are also grateful to James Rankin for drawing our attention to the effects of introducing single perturbations into a sequence of pure-tone triplets, as reported in his 2017 article, and to Brian Moore and Stephen McAdams for their comments on an earlier version of this manuscript. A preliminary poster presentation on the experiments presented here was given at the Ear and Hear Meeting (Southampton, United Kingdom, September 2022). This article is dedicated to the memory of Al Bregman (1936–2023), whose research and seminal book on auditory scene analysis has inspired our work and continues to inspire others in this field.
Note that a pair of unresolved components also creates frequency modulation at the output of the cochlear filters, which depends on the relative amplitude of each component (Hartmann, 1998).
Note that the edges of successive time bin edges were adjacent but strictly non-overlapping – i.e., the labels 0-1 s, 1-2 s, etc., describe the cases 0 s to <1 s, 1 s to <2 s, etc.
See https://doi.org/10.17036/researchdata.aston.ac.uk.00000601 (Last viewed July 10, 2023).
See supplementary material at https://doi/org/10.1121/10.0020172 for alternative versions of Figs. 1 and 3.