There are two competing national standards for the calculation of loudness of steady sounds, DIN 45631 and ANSI S3.4. Their different concepts of critical bands lead to different predictions for broadband sounds. As that discrepancy is neither constant nor linear but highly frequency-dependent, the present study investigates spectral loudness summation in three frequency regions, at various levels, and using two different methods. The results show that both algorithms overestimate loudness; however, DIN 45631 comes closer to the subjective evaluations and often falls within their interquartile range. The overestimation by the standards is particularly large in the frequency range from 2 to 5 kHz.

Presently there are two standardized models for predicting the loudness of steady sounds. The first one is the Zwicker model (DIN, 1991), the other one is the Moore and Glasberg model (ANSI, 2007). Although the latter is based on the general structure of the Zwicker model, it differs in many details, namely, shape of the equal loudness contours and transfer functions for the outer and middle ear, the concepts of masking and excitation patterns, and spectral loudness summation. The present paper will focus on the latter issue, spectral loudness summation. While DIN 45631 uses the Bark scale for critical bands (see Fastl and Zwicker, 2007, for a review), ANSI S3.4 is based on auditory filters the width of which is expressed by an equivalent rectangular bandwidth (ERB) (see Patterson, 1976; Glasberg and Moore, 1990). This results in differences in spectral loudness summation as there are only 24 Bark but almost 40 ERB across the human hearing range.

Fastl et al. (2009) reported that for pink noise, ANSI S3.4 predicts systematically higher loudness values than DIN 45631 does. This discrepancy amounts to 5 dB when expressed as the level difference necessary to predict equal loudness. Although further subjective data for the loudness of broadband noise or complex tones are available (e.g., Zwicker, 1958; Scharf, 1959; Hellman, 1985; Hellman and Zwicker, 1987), it is difficult to compare them quantitatively because of the impact of the different headphones and psychophysical procedures used. Only a minority of the earlier studies used free-field equalized headphones (Zwicker, 1958; Hellman and Zwicker, 1987). That is why further experiments were conducted recently (Schlittenlacher et al., 2011) showing that DIN 45631 predicted the subjectively evaluated loudness of pink noise and real broadband sounds rather well, whereas ANSI S3.4 overestimated it throughout the range of levels studied.

However, the difference between the Bark and the ERB scales depends on frequency. To compare the resulting amount of spectral loudness summation, it is important to evaluate the models in different frequency regions. Bandpass-filtered pink noise may be the most suitable stimulus for this purpose because it can be defined very well in physical terms, and its constant third-octave levels ensure a rather uniform summation across the critical bands involved. For this reason, the present study investigates the loudness of bandpass-filtered pink noise for low, medium, and high center frequencies across a wide range of levels and using two different methods.

Each experiment was conducted with 20 participants. Not all of them participated in every experiment, the combination of age and sex is as follows: 13 females and seven males aged 20–43 yr (median 21) in experiments 1–1 and 1–3, 12 females and eight males aged 20–35 yr in experiment (Exp.) 1–2, 15 females and five males aged 20–35 yr (median 22) in Exp. 1–4, and 13 females and 7 males aged 20–34 yr (median 22) in Exp. 2. All participants had thresholds in quiet better than 20 dB hearing level (HL) for audiometric frequencies from 125 Hz to 8 kHz, measured in octave steps.

The stimuli were generated with 48 kHz sampling rate at a resolution of 24 bit, D/A converted by an RME Hammerfall DSP Multiface II audio interface and presented by Sennheiser HDA 200 headphones without an additional amplifier. In Exp. 1–4, Beyerdynamics DT-48A.00 5 Ω headphones, equipped with circumaural EDT 48 S cushions and connected to a TDT HB7 amplifier, were used. This was done to rule out that the surprising results of Exp. 1–3 were due to the apparatus. The participants were seated in a double-walled sound-proof chamber manufactured by the Industrial Acoustics Company.

Free-field equalization was implemented in software, applied after the signals had been generated digitally and were adjusted to have the appropriate equivalent sound pressure level. The results thus obtained were stored in the Waveform Audio File Format. For the Sennheiser HDA 200, the data of Richter (2003) were used. This free-field correction is rather flat except for the range above 3 kHz, where some deviations occur, meaning that some bands have to be amplified and others attenuated compared to the level at 1 kHz. For the Beyerdynamic DT-48, the well-known passive network for free-field equalization of Zwicker and Maiwald (1963) was simulated in matlab.

Pink noise was filtered to produce three noise bands with its lower and higher limiting frequencies of 125 Hz and 1 kHz (“lower noise,” Exp. 1–1), 500 Hz and 2 kHz (“middle noise,” Exp. 1–2), and 1.25 and 5 kHz (“higher noise,” Exp. 1–3 and 1–4), respectively. The slopes of the filters were steep enough to avoid loudness summation at the slope frequencies. The bandwidth for the lower noise was three octaves rather than two octaves because on both the Bark and ERB scales, the bandwidth is thus more similar to that of the other two noises. Using several critical bands (between 7.2 and 8.6 Bark or 10.3 and 11.7 ERB) ensures that loudness primarily depends on the main loudness within the critical bands involved and not on that part of specific loudness that relies on the masking slopes.

The loudness of the lower, middle, and higher noise was compared to that of a 1-kHz pure tone at 45, 60, and 75 dB sound pressure level (SPL) in Exps. 1–1 to 1–3. Additional data were collected for the higher noise at 70 and 85 dB SPL in Exp. 1–4. In Exp. 2, the three bandpass-filtered noises were judged at levels ranging from 35 to 75 dB SPL, and a 1-kHz pure tone was judged at levels ranging from 45 to 85 dB SPL, each in steps of 5 dB. Thus each sound was presented at nine levels. The duration of the stimuli always was 1 s with a Gaussian rise and fall time of 20 ms. A duration of 1 s is sufficient to judge the loudness of steady sounds as their loudness is independent of duration if it is longer than ca. 200 ms (see Buus et al., 1997; Moore, 2012).

The method of adjustment was used in Exps. 1–1 to 1–4. Participants were asked to adjust the loudness of a bandpass-filtered pink noise to that of a 1-kHz pure tone. First, the 1-kHz pure tone was presented, and after a pause of 1 s the bandpass-filtered pink noise followed. After a silent interval of 3 s, the pair started again until the participant decided that the two sounds were equally loud. The level of the noise could be adjusted from the beginning of its presentation until 1 s after it terminated. Such a run ended when the participant indicated a satisfactory loudness match by pressing a corresponding button. This procedure was repeated eight times for each condition. The initial SPL of the noise was chosen so that it sounded either obviously louder or softer than the fixed 1-kHz pure tone. Furthermore, six of the participants of Exp. 1–1 adjusted the level of the lower noise to the higher noise fixed at 60 dB and vice versa using the same procedure to test for potential time-order errors and other potential biases (see Epstein, 2007). All participants completed a short training before each subexperiment.

In Exp. 2, cross modality matching with line length was employed (Stevens and Guirao, 1963). After listening to a sound, the participant adjusted the length of a horizontal line on the screen by moving the mouse so that the line length matched his or her impression of loudness. The stimuli were presented in random order. During the course of the experiment, each condition was presented four times. The maximum line length was 1260 pixels. Before beginning with Exp. 2, the participants completed a short training including the softest, loudest, and a few medium stimuli to avoid boundary effects. The present study altogether reports 2880 magnitude estimates and 1856 runs using the method of adjustment. As every method for the subjective evaluation of loudness has its strengths and limitations (Marks and Florentine, 2011), the two different methods were used to avoid erroneous conclusions resulting from procedural biases.

Because the point of this study is not only to provide target values for synthetic sounds but to compare the predictions made by two competing loudness standards, all results are compared to the loudness levels calculated by these standards. All predictions made by DIN 45631 have been calculated using the java applet provided by the Unit for Technical Acoustics at TU München (Kerber, 2014), those made by ANSI S3.4 using the software of the Cambridge University Hearing Group (2014).

To facilitate the comparison of the subjective evaluations with the standards, the ordinate in Fig. 1 shows the SPL of the fixed 1-kHz pure tone and the abscissa shows the SPL of the bandpass-filtered pink noise that was adjusted. Thus the graphs can also be read as showing the loudness level of the noises in phon. The results for the lower noise, ranging from 125 Hz to 1 kHz, are illustrated on the left hand side of Fig. 1. A circle shows the median of one condition as obtained by 160 runs (20 participants, 8 runs each), the error bars represent interquartile ranges. Both algorithms predict higher values than the subjective evaluations yield. DIN 45631 is within the interquartile range of matches for two of the three levels studied and is very close to the median for the highest level. ANSI S3.4 estimates systematically higher values, but it is within the interquartile range for the highest level.

Fig. 1.

(Color online) Results of experiment 1: The left panel shows adjustments of the lower noise (125–1000 Hz; Exp. 1–1) made to match the loudness of a 1-kHz pure tone. The middle noise (0.5–2 kHz; Exp. 1–2) is shown in the center, the higher noise (1.25–5 kHz; Exp. 1–3 and 1–4) on the right hand side. Medians are depicted by circles when measured with Sennheiser HDA 200 headphones and by diamonds when using the Beyerdynamics DT-48. Interquartile ranges are marked by whisker lines.

Fig. 1.

(Color online) Results of experiment 1: The left panel shows adjustments of the lower noise (125–1000 Hz; Exp. 1–1) made to match the loudness of a 1-kHz pure tone. The middle noise (0.5–2 kHz; Exp. 1–2) is shown in the center, the higher noise (1.25–5 kHz; Exp. 1–3 and 1–4) on the right hand side. Medians are depicted by circles when measured with Sennheiser HDA 200 headphones and by diamonds when using the Beyerdynamics DT-48. Interquartile ranges are marked by whisker lines.

Close modal

The situation is very similar for the two-octave-wide noise centered at 1 kHz (middle noise, see center panel of Fig. 1). It overlaps with the lower noise by one octave; however, it does not include the frequencies regarded as the low frequency range below 500 Hz. The DIN 45631 predictions always clearly fall within the interquartile ranges of the adjustments. By contrast, ANSI S3.4 touches these only once.

Clear discrepancies between both models and the subjective evaluations emerge for the higher noise ranging from 1.25 to 5 kHz (see right panel of Fig. 1). Even the model that is closer to the data, DIN 45631, predicts values significantly higher than the subjective evaluations. It can also be seen that the discrepancy diminishes with increasing SPL. After these rather surprising results were obtained for three levels measured with the Sennheiser HDA 200 headphones in Exp. 1–3, the loudness of the higher noise was investigated with another pair of free-field equalized headphones, Beyerdynamics DT-48, at two additional levels (diamonds in the right panel of Fig. 1). These are perfectly in line with those of the first set of headphones used. Altogether the horizontal difference between the subjective evaluations and DIN 45631 is 10 dB at 45 phon and decreases to 2 dB at 85 phon. The additional discrepancy between ANSI S3.4 and DIN 45631 also diminishes with increasing loudness, resulting in deviations from the subjective evaluations of 16 dB at 45 phon and 5 dB at 85 phon, respectively.

The additional test performed by six of the participants to compare the lower and higher noise directly yielded a median level of 58 dB SPL for the higher noise to sound equally loud as the lower noise fixed at 60 dB SPL. Likewise, they adjusted the lower noise to 61 dB SPL to sound as loud as the higher noise fixed at 60 dB SPL. Thus the higher noise needs to be about 1.5 dB lower in level to be perceived equally loud as the lower noise. Almost the same value, a difference of 2 dB, is obtained when comparing the left and right panel of Fig. 1.

The magnitude estimates of Exp. 2, illustrated in Fig. 2, yield similar, but slightly different results compared to Exp. 1. When drawing a regression line through the results for the 1-kHz pure tone, a relation between line length in pixels and loudness level can be obtained because the loudness level in phon is defined as the SPL of a 1-kHz pure tone in decibels. Doing so, 10 phon correspond to a factor of 2.03 in line length (see the dotted line in Fig. 2). This allows us to include the predictions made by the standards into the figure and to compare them to the subjective evaluations. For depicting error bars, the responses of each participant were normalized by dividing them through the participant's average and multiplying it with the average of all values displayed in Fig. 2. Subsequently, the inter-individual interquartile ranges were drawn.

Fig. 2.

(Color online) Results of experiment 2: Geometric mean line lengths reflecting magnitude estimates of the loudness of the three bandpass-filtered noises (squares) are compared to those of a 1-kHz pure tone (circles) and predictions made by the standards for the noises (DIN 45631: Solid line; ANSI S3.4: Dashed line). Predicted loudness levels are converted to line lengths using the relation given by the dotted regression line for the 1-kHz pure tone. Error bars indicate interquartile ranges.

Fig. 2.

(Color online) Results of experiment 2: Geometric mean line lengths reflecting magnitude estimates of the loudness of the three bandpass-filtered noises (squares) are compared to those of a 1-kHz pure tone (circles) and predictions made by the standards for the noises (DIN 45631: Solid line; ANSI S3.4: Dashed line). Predicted loudness levels are converted to line lengths using the relation given by the dotted regression line for the 1-kHz pure tone. Error bars indicate interquartile ranges.

Close modal

The magnitude estimates for the lower noise (see Fig. 2, left panel) almost coincide with the predictions made by both standards. For the middle noise, by contrast, both standards overestimate its loudness except at the lowest levels. As in Exp. 1, the standards predict significantly greater loudness for the higher noise than the actual magnitude estimates indicate (see Fig. 2, right panel). However, the level-dependent discrepancies show the opposite pattern as had been observed for the loudness matches in Exp. 1: Magnitude estimates and model predictions are rather close to each other at low levels but show a difference of more than 10 dB at the higher levels.

When comparing Figs. 1 and 2, small but systematic differences between the results obtained by the method of adjustment and magnitude estimation are present; however, both experiments lead to the same general conclusions, which will be specified by frequency region in the next paragraphs. Furthermore, when some of the participants compared the lower and the higher noise directly using the method of adjustment, their results were consistent with the matches made to a 1-kHz pure tone and thus suggest a small, if any, time order error or similar bias.

For the lower noise, the two loudness models predict almost identical results. This is remarkable because their details differ most in this frequency region. However, the summation of a larger number of critical bands by ANSI S3.4 up to 1 kHz is compensated for by predicting a lower specific loudness within a single critical band. The results of Exp. 1 suggest a slight overestimation of loudness by the standards in this frequency range with DIN 45631 being within the interquartile range. The results of Exp. 2 fall on the predictions made by ANSI S3.4 up to a SPL of 65 dB and agree with the predictions of DIN 45631 above.

For the middle noise ranging from 0.5 to 2 kHz, most predictions made by the standards imply greater loudness than the subjective evaluations do. An exception occurs for the lowest levels in Exp. 2. Here the magnitude estimates are predicted well by DIN 45631. Furthermore, this standard is within the interquartile range of the subjective evaluations in Exp. 1. The higher noise, by contrast, is overestimated by both standards, no matter which subjective evaluation method is used as a yardstick. It is worth mentioning that in this frequency range the results of Exps. 1 and 2 differ somewhat. Experiment 1 suggests a reduced discrepancy between the standards and the subjective evaluations at higher levels, while Exp. 2 does so at lower levels.

The present results are also in line with a study by Meunier et al. (2000) who compared earlier versions of the Zwicker and Moore models with loudness matches of synthetic and natural sounds to a narrowband noise centered at 1 kHz. They also found that the Zwicker model comes somewhat closer to the data. In particular, a broadband noise centered at 3 kHz (thus resembling our present high-frequency noise) revealed the largest departures, with all models overestimating its loudness.

All in all, the matching and scaling data collected in the two experiments agree in suggesting that both models overestimate the loudness of broadband noise in the mid-frequency and especially in the high-frequency range. The discrepancy between data and model predictions increases as the noise band is shifted upward in frequency as does the difference in the predictions made by the models. ANSI S3.4 comes rather close to the predictions made by DIN 45631 at high SPLs; however, for the high-frequency noise and in the low and medium level range, it requires up to 6 dB less to predict the same loudness as DIN 45631 does.

Altogether it can be concluded that the two loudness standards, DIN 45631 and ANSI S3.4, implement spectral loudness summation well up to about 1 kHz. Despite their different concepts for critical bands, both models predict loudness values reasonably close to the present subjective evaluations. However, the loudness of the mid-frequency noise centered at 1 kHz is overestimated by both algorithms and even larger discrepancies result when the noise band is shifted higher in frequency. Spectral loudness summation appears to be significantly smaller in the range from 2 to 5 kHz compared to the predictions made by the standards. Because ANSI 3.4 predicts higher values throughout, DIN 45631 is almost always closer to the subjective evaluations. Nevertheless, the DIN standard also overestimates the loudness of broadband sounds in the most sensitive frequency range of human hearing.

1.
ANSI (
2007
). S3.4,
Procedure for the Computation of Loudness of Steady Sounds
(
Acoustical Society of America
,
New York
).
2.
Buus
,
S.
,
Florentine
,
M.
, and
Poulsen
,
T.
(
1997
). “
Temporal integration of loudness, loudness discrimination, and the form of the loudness function
,”
J. Acoust. Soc. Am.
101
,
669
680
.
3.
Cambridge University Hearing Group (
2014
). “
Auditory demonstrations
,” http://hearing.psychol.cam.ac.uk/demos/demos.html (Last viewed October 8, 2014).
4.
DIN (
1991
). 45631, Berechnung des Lautstärkepegels und der Lautheit aus dem Geräuschspektrum-Verfahren nach E. Zwicker (Procedure for calculating loudness levels and loudness) (Deutches Institut für Normung, Berlin).
5.
Epstein
,
M.
(
2007
). “
An introduction to induced loudness reduction
,”
J. Acoust. Soc. Am.
122
,
EL74
EL80
.
6.
Fastl
,
H.
,
Völk
,
F.
, and
Straubinger
,
M.
(
2009
). “
Standards for calculating loudness of stationary or time-varying sounds
,” in
Proceedings of Inter-Noise 2009
,
Ottawa, Ontario, Canada
.
7.
Fastl
,
H.
, and
Zwicker
,
E.
(
2007
).
Psychoacoustics—Facts and Models
, 3rd ed. (
Springer
,
Berlin
).
8.
Glasberg
,
B. R.
, and
Moore
,
B. C. J.
(
1990
). “
Derivation of auditory filter shapes from notched-noise data
,”
Hear. Res.
47
,
103
138
.
9.
Hellman
,
R.
(
1985
). “
Perceived magnitude of two-tone-noise complexes: Loudness, annoyance and noisiness
,”
J. Acoust. Soc. Am.
77
,
1497
1504
.
10.
Hellman
,
R.
, and
Zwicker
,
E.
(
1987
). “
Why can a decrease in dB(A) produce an increase in loudness?
,”
J. Acoust. Soc. Am.
82
,
1700
1705
.
11.
Kerber
,
S.
(
2014
). “
Loudness—calculation according to DIN 45631
,” http://www.mmk.ei.tum.de/∼kes/LoudnessMeter (Last viewed October 8, 2014).
12.
Marks
,
L. E.
, and
Florentine
,
M.
(
2011
). “
Measurement of loudness, Part I: Methods, problems, and pitfalls
,” in
Loudness
, edited by
M.
Florentine
,
A. N.
Popper
, and
R. R.
Fay
(
Springer
,
New York
).
13.
Meunier
,
S.
,
Marchioni
,
A.
, and
Rabau
,
G.
(
2000
). “
Subjective evaluation of loudness models using synthesized and environmental sounds
,” in
Proceedings of INTERNOISE 2000
,
Nice, France
.
14.
Moore
,
B. C. J.
(
2012
).
An Introduction to the Psychology of Hearing
, 6th ed. (
Emerald
,
Bingley, U.K.
).
15.
Patterson
,
R. D.
(
1976
). “
Auditory filter shapes derived with noise stimuli
,”
J. Acoust. Soc. Am.
59
,
640
654
.
16.
Richter
,
U.
(
2003
). “
Characteristic data of different kinds of earphones used in the extended high frequency range for pure-tone audiometry
,”
PTB Berichte Mechan. Akust.
72
,
1
24
.
17.
Scharf
,
B.
(
1959
). “
Loudness of complex sounds as a function of the number of components
,”
J. Acoust. Soc. Am.
31
,
783
785
.
18.
Schlittenlacher
,
J.
,
Hashimoto
,
T.
,
Fastl
,
H.
,
Namba
,
S.
,
Kuwano
,
S.
, and
Hatano
,
S.
(
2011
). “
Loudness of pink noise and stationary technical sounds
,” in
Proceedings of INTERNOISE 2011
,
Osaka, Japan
.
19.
Stevens
,
S. S.
, and
Guirao
,
M.
(
1963
). “
Subjective scaling of length and area and the matching of length to loudness and brightness
,”
J. Exp. Psychol.
66
,
177
186
.
20.
Zwicker
,
E.
(
1958
). “
Über psychologische und methodische Grundlagen der Lautheit
” (“Psychological and methodical basis of loudness”),
Acustica
8
,
237
258
.
21.
Zwicker
,
E.
, and
Maiwald
,
D.
(
1963
). “
Über das Freifeldübertragungsmaß des Kopfhörers DT 48
” (“On the Freefield response of the earphone DT 48”),
Acustica
13
,
181
182
.