Using molecular psychophysics, temporal loudness weights were measured for 2-s, 1-kHz tones with flat, increasing and decreasing time-intensity profiles. While primacy and recency effects were observed for flat profile stimuli, the so-called “level dominance” effect was observed for both increasing and decreasing profile stimuli, fully determining their temporal weights. The weighs obtained for these profiles were basically zero for all but the most intense parts of these sounds. This supports the view that the “level dominance” effect is prominent with intensity-varying sounds and that it persists over time since temporal weights are not affected by the direction of intensity change.
Various studies have shown that the loudness of a sound could be affected by its temporal intensity profile. In particular, it has been found that sounds increasing in level over time—also referred to as up-ramps—are perceived as louder (Susini et al., 2007) or changing more in loudness (Neuhoff, 1998; Olsen et al., 2010) than sounds decreasing over time—also referred to as down-ramps. Though several hypotheses have been proposed, the causes of these findings, involving perceptual phenomena specifically related to time-varying sounds, are still not clearly identified (e.g., Pastore and Flint, 2011). It has been shown that the assessment of the global loudness of a time-varying sound is heavily guided by the maximum stimulus level and by its position toward the end of the sound (Susini et al., 2002, 2007, 2010). These results led us to the idea that the loudness difference observed between up-ramps and down-ramps stems from a dissimilar temporal loudness weighting between the two types of sounds.
So far, temporal weights of loudness have been assessed mainly for sounds with flat intensity profiles (e.g., Pedersen and Ellermeier, 2008; Oberfeld and Plank, 2011; Oberfeld et al., 2012). Using experimental designs based on molecular psychophysics (Berg, 1989), these studies show that the beginning and, to a lesser extent, the end of flat-intensity sounds receive the greatest temporal weights of loudness. In studies with non-steady profile sounds, a so-called “level dominance” effect was observed, the initial high-level components receiving greater attention from the listeners (Lutfi and Jesteadt, 2006; Oberfeld and Plank, 2011).
In the present experiment, the molecular approach was applied to the study of temporal weighting of loudness for sounds increasing and decreasing in intensity. We hypothesized that the high-level components would be more heavily weighted in the case of increasing intensity stimuli, as both “level dominance” and recency effects apply to them. A discrepancy in temporal weighting could thus help to explain why a loudness difference is observed between the two types of sounds.
II. Materials and methods
Eight volunteer participants (5 women and 3 men; age 21–29 yr) took part in the experiment. None reported having hearing problems. They gave their informed written consent prior to the experiment and were paid for their participation. The participants were naive with respect to the hypotheses under test.
Stimuli were 1-kHz pure tones lasting 2 s. They were all made of 16 consecutive 125-ms stationary segments with levels drawn independently from normal-truncated distributions (SD = 2 dB, restricted to Mean ± 5 dB). The mean values (in decibels) of the distributions were chosen in order to follow one of the three profiles under study. While increasing and decreasing profiles were examined for the particular purpose of this study, the flat profile was studied mainly to serve as reference in the following analyses and in order to compare our results with those of previous studies.
The means of the distributions from which the segment levels of the flat-level profile stimuli were drawn were all equal to 80 dB sound pressure level (SPL), resulting in level-fluctuating sounds, as shown in the left panel of Fig. 1. Regarding the increasing intensity stimuli, the means of the 16 distributions were chosen so that they created a globally increasing profile ranging from 65 to 80 dB SPL (see Fig. 1, middle panel). This profile was temporally reversed to create the [80–65 dB SPL] decreasing intensity stimuli (Fig. 1, right panel). Linear rise and fall times of 10 ms were imposed on the beginning and end of all stimuli. Inter-segment level variations were smoothed using 10-ms half periods of sinusoidal functions to avoid spectral splatter that occurs when there are abrupt changes in the sound intensity of pure tones. A fixed level increment of 0.9 dB was subtracted or added to each segment to create respectively low and high versions of these stimuli for each profile (see Fig. 1). This value was chosen according to a preliminary experiment that indicated a 1.8 dB difference in mean level between the low and high versions of the stimuli (i.e., mean profile ± 0.9 dB) resulting in a mean discrimination score of 65% for the three profiles.
The stimuli were generated at a sampling rate of 44.1 kHz with 16-bit resolution using MATLAB. Sounds were converted using a RME Fireface 800 soundcard, amplified using a Lake People G-95 Phoneamp amplifier and presented diotically through headphones (Beyerdynamic DT 270 PRO). Levels were calibrated using a Brüel & Kjær 2238 Mediator sound-level meter placed at a distance of 4 cm from the right (left) earphone. Each participant was tested in a double-walled IAC sound-insulated booth.
A standard 1I, 2AFC procedure was employed. The experiment was divided into six different sessions scheduled on different days. Each session was made up of 100 training trials followed by three blocks of 200 trials. Each profile (i.e., flat, increasing, and decreasing) was presented in a particular block. In each trial, a sound randomly chosen to be a low or a high version of the corresponding profile was played to the participant. The participants had to determine whether the stimulus seemed to be soft or loud, on the basis of preceding stimuli listened during the block. It was an intensity discrimination task in which subjects were explicitly asked to consider the global loudness of the stimulus (corresponding to the judgment of the loudness over the duration of the sound) when making their judgment. In addition, they were instructed not to consider the loudness of the stimuli presented in previous blocks (i.e., other profiles) when making their judgment. All answers were entered directly into a MATLAB interface. Participants became familiar with the experimental procedure by completing a training session at the beginning of the experiment. The blocks were presented in random order to each subject in each session. As we were interested in spontaneous and natural strategies, participants did not receive trial-by-trial feedback. They were only informed of their score (percentage of correct identifications of low/high incoming distributions) at the end of each block to ensure an optimum focus on the task in hand. Each session lasted approximately 1 h.
E. Data analysis
We opted for a logistic regression analysis for weights estimation and employed the same decision model as in previous studies (Pedersen and Ellermeier, 2008; Oberfeld and Plank, 2011). Regressions were conducted using R (R Core Team, 2013) separately for each listener, each intensity profile and each distribution (low/high). Responses given in each trial served as the dependent variable. Soft responses were coded as 0 and loud responses coded as 1. The levels of each segment were taken as the 16 independent variables.
The temporal weights, defined as the beta coefficients given by logistic regressions, were normalized individually for each profile and distribution. First, a within three-factor (segment by profile by distribution) repeated-measures analysis of variance (RM ANOVA) was performed on the full data set. Since a significant main effect of the profile was found [F(2, 14) = 6.26; p = 0.01] but no significant interaction between profile and distribution [F(2, 14) = 0.38; p = 0.69], temporal weights of the two distributions were pooled together afterward.
A. Temporal weights for the flat profile
The left panel of Fig. 2 shows the mean temporal weights obtained for the flat profile as well as the temporal weights from an “ideal observer” (Berg, 1989), where the same weight would apply to each segment (represented by the dashed lines). The results show that the first and last segments received the highest weights from the listeners, resulting in a u-shaped temporal pattern. The middle-part of the profile was virtually ignored, as observed in the left panel of Fig. 2 where the zero weight fell within the 95% confidence intervals associated to most of these segments. Two-tailed Wilcoxon signed rank tests were conducted to compare the weights of segments 1 and 16, respectively, with the mean weight of the middle-part segments (segments 6–11). Both primacy (W = 34; p = 0.01) and recency (W = 33; p = 0.02) effects were found to be significant.
B. Temporal weights for the increasing and decreasing profiles
The mean temporal weights obtained for the increasing and decreasing profiles are plotted, respectively, in the middle and right panels of Fig. 2. For the increasing profile, only the weights of the last three segments differed from zero, which supports the assumption made in the introduction that the last portion of an increasing sound—corresponding to both its maximum and end-level—would receive the greatest attention in the task. Similarly, for the decreasing profile, all but the first three segments received a zero weighting. This latter result also supports the hypothesis that the highest weights would have been assigned to the beginning of the down-ramps, that is, the loudest portion. A common effect of saliency for the two profiles was obtained, where the most intense segments received exclusive attention from the listeners. This effect was also pointed out in other studies (Lutfi and Jesteadt, 2006; Pedersen and Ellermeier, 2008; Oberfeld and Plank, 2011) that were interested in temporal weighting of non-steady profiles, and it was defined, as mentioned above, as the “level dominance” effect.
The weights from the “ideal observer” were also computed (using data from Rabinowitz et al., 1976) in order to take into account the fact that the sensitivity for level differences of pure tones is slightly better at high than at low levels; an effect known as the near-miss to Weber's law. Ideal weights were calculated so the sensitivity at 80 dB SPL (compared to 65 dB SPL) was increased by a factor around 1.6, and that the sum of the 16 weights should be equal to 1 (with normalization). As it can be seen in the middle and right panels of Fig. 2, the near-miss effect can only marginally account for the trend obtained for the weighting profiles.
As the main motivation for this study was to account for the loudness difference between up-and down-ramps, their perceptual weighting patterns were compared. Our hypothesis was that the weights for the most intense components would be greater for the up-ramp than for the down-ramp. Thus, the weights of the last segments of the up-ramp were compared with the weights of the first segments of the down-ramp (simply by reversing their order to statistically compare them with up-ramp weights) using a within two-factor (profile by segment) RM ANOVA. As a result, the Segment × Profile interaction was not found to be statistically significant [F(15, 210) = 0.61; p = 0.87]. Thus, contrary to the assumption made in the introduction, it cannot be inferred from the results of the present experiment that high-level components are weighted more heavily for up-ramps than for down-ramps.
C. Loudness difference between up-ramps and down-ramps
In an additional experiment, we checked that the lack of asymmetry found between the two weighting patterns was not due to a failure to produce a loudness difference between the two types of time-segmented ramps. A new group of nine subjects naive to the hypothesis under test (5 men and 4 women; age 20–29 yr) took part in the complementary experiment, which was scheduled in one session lasting approximately 30 min. The apparatus used was the same as for the main experiment.
Participants compared level-fluctuating up-ramps and down-ramps in terms of global loudness in a 2I, 2AFC procedure. They had to indicate which sound of each pair they perceived as louder. They were presented with 300 trials (organized in six blocks of 50 trials). Half were composed of an increasing stimulus followed by its temporally reversed version, while the other half of the trials were presented in inverted order. In each pair, the reference ramp (an up-ramp or a down-ramp) was generated from a level-fluctuating [65–80 dB SPL] intensity profile (SD = 2 dB, restricted to Mean ±5 dB) as in the main experiment. The comparison ramp was its temporally reversed version, which was presented at five different levels relative to the reference ramp (60–75, 63–78, 65–80, 67–82, 70–85 dB SPL) in order to introduce some variances into the loudness comparisons. The goal of the experiment was to check that more than 50% of up-ramps would be perceived as louder than down-ramps, while the two types of profiles were presented with the same counterbalanced set of SPLs (in each session, up-ramps were higher than down-ramps in SPL on 150/300 trials). As a result, each of the nine participants answered more often that the up-ramp was louder (individual percentages: 66.3%, 65.7%, 67.3%, 74.3%, 65.7%, 56.3%, 58.3%, 80.7%, and 71.0%; Mean = 67.3, SD = 7.5). These proportions were significantly higher than 50%, as shown by a one-sample one-tailed Wilcoxon signed rank test (W = 45; p < 0.01). Therefore, this result demonstrates that the asymmetry in global loudness between increasing and decreasing intensity stimuli still exists with level-fluctuating ramps.
IV. Discussion and conclusion
In this study, we examined the perceptual weights for pure tones with three different intensity profiles as a function of time: flat [80 dB SPL], increasing [65–80 dB SPL] and decreasing [80–65 dB SPL]. On the one hand, clear primacy and recency effects were found for flat level profile sounds. Although the recency effect was greater in our experiment than in previous studies, most likely due to the fact that the sounds used were longer, these results broadly support the trends observed concerning this type of stimulus (Pedersen and Ellermeier, 2008; Oberfeld and Plank, 2011). On the other hand, this study used a molecular approach to examine in particular whether the temporal weights of high-level components of increasing sounds could be greater than those of decreasing sounds. Instead, a common effect of saliency—the “level dominance” effect—was found. The segments with the highest levels received greater attention from the listeners, but with no significant difference between increasing and decreasing intensity sounds. Thus, the present study shows that the difference in loudness between increasing and decreasing sounds, which still exists with level-fluctuating ramps as confirmed by the results of the additional experiment, does not arise from the fact that the high-level portion is more heavily weighted when presented at the end of the sound. Nevertheless, since the temporal weights were similar whether the loudest portion of the 2-s time-varying tones was located at the beginning or at the end—corresponding, respectively, to the down-ramp and the up-ramp in the experiment—our results uphold those from another study (Turner and Berg, 2007), indicating that the “level dominance” effect seems to override primacy and recency effects and persist over several seconds. Further experiments using non-stationary sounds with various durations are required to determine precisely the duration of the effect.
We are especially grateful to Daniel Oberfeld for very helpful discussion and comments when not only preparing but also analyzing this experiment. We also like to thank three reviewers who provided fruitful comments on an earlier version of this manuscript. This work was supported by the project LoudNat funded by the French National Agency (ANR).