Accurate measurement of the softest sound levels of phonation presents technical and methodological challenges. This study aimed at (1) reliably obtaining normative data on sustained softest sound levels for the vowel [a:] at comfortable pitch; (2) comparing the results for different frequency and time weighting methods; and (3) refining the Union of European Phoniatricians' recommendation on allowed background noise levels for scientific and equipment manufacturers' purposes. Eighty healthy untrained participants (40 females, 40 males) were investigated in quiet rooms using a head-mounted microphone and a sound level meter at 30 cm distance. The one-second-equivalent sound levels were more stable and more representative for evaluating the softest sustained phonations than the fast-time-weighted levels. At 30 cm, these levels were in the range of 48−61 dB(C)/41−53 dB(A) for females and 49 − 64 dB(C)/35−53 dB(A) for males (5% to 95% quantile range). These ranges may serve as reference data in evaluating vocal normality. In order to reach a signal-to-noise ratio of at least 10 dB for more than 95% of the normal population, the background noise should be below 25 dB(A) and 38 dB(C), respectively, for the softest phonation measurements at 30 cm distance. For the A-weighting, this is 15 dB lower than the previously recommended value.

The softest achievable phonatory sound level is an important characteristic of human voice. It has been used as a potential indicator of voice pathology (Behrman et al., 1996; Ma et al., 2007; Speyer et al., 2003) and identified as one of four basic voice parameters (besides jitter, maximum phonation time and the highest achievable fundamental frequency) best quantifying the severity of dysphonia (“Dysphonia severity index,” Wuyts et al., 2000). In order to find out about the normality or abnormality of human voice it is important to know the sound levels that can be expected in normal subjects. This has been investigated in numerous previous studies. Unfortunately, a closer look at these studies reveals considerable discrepancies in both the published results and the measurement methodology (see Fig. 1 and the text).

FIG. 1.

Examples of softest sound levels for vocally healthy speakers published in previous studies (Awan, 1991; Gramming and Åkerlund, 1988; Gramming and Sundberg, 1988; Hacki, 1999; Hakkesteegt et al., 2006; Hallin et al., 2012; Heylen et al., 2002; Leino et al., 2008; Ma et al., 2007; Pabon et al., 2011; Schneider and Bigenzahn, 2003, 2005; Schultz-Coulon and Asche, 1988; Sihvo and Sala, 1996; Šiupšinskiene, 2003; Sulter et al., 1995; Timmermans et al., 2002). The symbols A, C, hp, and lin. represent A-weighting, C-weighting, custom high-pass (hp) filtering (with the cutoff frequency indicated), and no filtering, respectively. The horizontal line at 40 dB indicates the maximum A-weighted background noise level as recommended by the Union of European Phoniatricians (UEP) (Schutte and Seidner, 1983). The information inside the bars indicates: subjects' gender (male or female, i.e., F, M); mouth-to-microphone distance; sound level measurement method (manual or computerized); and the background noise level (if reported). In the cases when the microphone position was different than 30 cm, the sound levels were either related to the 30 cm distance using a calibration procedure or recalculated to represent the respective values at 30 cm distance using the distance-law relationship. The whiskers indicate two standard deviations from the mean value.

FIG. 1.

Examples of softest sound levels for vocally healthy speakers published in previous studies (Awan, 1991; Gramming and Åkerlund, 1988; Gramming and Sundberg, 1988; Hacki, 1999; Hakkesteegt et al., 2006; Hallin et al., 2012; Heylen et al., 2002; Leino et al., 2008; Ma et al., 2007; Pabon et al., 2011; Schneider and Bigenzahn, 2003, 2005; Schultz-Coulon and Asche, 1988; Sihvo and Sala, 1996; Šiupšinskiene, 2003; Sulter et al., 1995; Timmermans et al., 2002). The symbols A, C, hp, and lin. represent A-weighting, C-weighting, custom high-pass (hp) filtering (with the cutoff frequency indicated), and no filtering, respectively. The horizontal line at 40 dB indicates the maximum A-weighted background noise level as recommended by the Union of European Phoniatricians (UEP) (Schutte and Seidner, 1983). The information inside the bars indicates: subjects' gender (male or female, i.e., F, M); mouth-to-microphone distance; sound level measurement method (manual or computerized); and the background noise level (if reported). In the cases when the microphone position was different than 30 cm, the sound levels were either related to the 30 cm distance using a calibration procedure or recalculated to represent the respective values at 30 cm distance using the distance-law relationship. The whiskers indicate two standard deviations from the mean value.

Close modal

Often, the softest sound levels are measured as a part of the “voice range profile” (VRP) or “phonetogram” (Damsté, 1970; Schutte and Seidner, 1983). Three decades ago the Union of European Phoniatricians (UEP) attempted to standardize the voice range profile measurement methodology and recommended the voice range measurements to be performed at a microphone distance of 30 cm from the mouth, using a sound level meter which is set to the standard frequency A-weighting protocol (Schutte and Seidner, 1983). Since then, the distance of 30 cm has been considered the standard distance for voice measurements, although some other distances have also been explored and used when reporting results, making the comparison among different studies problematic.

UEP also recommended the measurements to be done in a “living room environment” with a maximum background noise level of 40 dB(A). Consequently, the standard VRP plot used a value of 40 dB(A) as the minimum level on the y axis (Schutte and Seidner, 1983). While this recommendation implies that the softest voice levels are not expected to be below 40 dB(A), some studies indicate that this may not be true. This is evident from the 95% confidence intervals (i.e., the mean value plus/minus two standard deviations in normally distributed data), indicating the lowest boundary of the normal soft voice at 33 and 35 dB(A) at 30 cm in the studies of Sihvo and Sala (1996) and Sulter et al. (1995), respectively (see Fig. 1). These findings suggest that the softest voice measurement methodology and the UEP recommendation for the maximum noise level may need to be revisited.

Some of the factors that may have contributed to the differences among the previous studies are

  1. various mouth-to-microphone distances (sometimes not even reported);

  2. improperly chosen equipment (uneven frequency response or insufficient dynamic range);

  3. background noise levels (which were often not reported in studies and which may increase the minimum measurable levels if louder than the softest voice);

  4. different frequency weighting approaches (A, C, or linear standard weightings as well as custom non-standard weightings have been used);

  5. different time weighting strategies (standard Slow, Fast, Root-Mean-Square, as well as custom weightings have been applied);

  6. voicing detection methodology (manual or automatic methods with different, often undefined criteria implemented in software algorithms have been used. Since voice is inherently unstable at the phonation threshold, the voicing detection method influences the results);

  7. task elicitation issues (how closely did the attempted softest phonations match the subjects' voicing threshold).

These discrepancies and unknown factors in the published normative data concerning the softest sound levels of human voice indicate a need for a study that measures the softest voice levels as accurately as possible with precisely defined measurement methods. This study therefore aims at (1) obtaining a new set of data on the softest sound levels of human voice in normal subjects under carefully documented conditions using carefully chosen equipment; (2) comparing the results for different methods of sound level extraction; and (3) refining the recommendation on the background noise levels required for the measurement of the softest phonations. The results may be used, for instance, by voice scientists and manufacturers of voice measurement devices for selecting a microphone with a proper noise level and for implementing more rigorous conditions and methodology for soft phonation measurements. This is in line with the currently recognized need of achieving better repeatability and reproducibility of voice measures obtained in different laboratories and clinics with different devices (e.g., Aichinger et al., 2012; Sanchez et al., 2013).

A total of 80 participants took part in the study: 40 men with an average age of 28 yr (from 14 to 58) and 40 women with an average age of 23 yr (from 14 to 48). Of these, 20 (13 females and seven males) were members of an amateur choir while the rest (27 females and 33 males) were students and faculty members from the Palacký University of Olomouc. Only participants with a normal healthy voice were included in this study. The voice status was assessed using the Czech version of the Voice Handicap Index (VHI) questionnaire (Jacobson et al., 1997; Švec et al., 2009), which was administered immediately before the voice recording. All the participants had total VHI scores below 38, i.e., within the 95% confidence interval for normal voice (Rosen et al., 2004).

The voice recordings were performed over the course of several months in two different rooms. The 20 members of the choir were measured in their own audition room (size ca. 145 m3, room background noise level at 23 dB(A), reverberation time 0.35–1.3 s in the range of 0.2–10 kHz, corresponding reverberation radius 0.6–1.15 m). Measurements involving the students and teachers (60 subjects) were carried out in an acoustically treated room at the Czech broadcast station in Olomouc (size 48.75 m3, room background noise level at 18 dB(A), reverberation time 0.6–0.14 s in the range of 0.2–10 kHz, corresponding reverberation radius 1.0–1.6 m). For acquisition of the softest phonations, an omni-directional head-mounted microphone (DPA, type 4066) was placed at a distance of 5–10 cm from the mouth, at ca. 45 degrees to the side of the mouth horizontally. The microphone was connected to a DPA microphone preamplifier (type MMA 6000). A sound-level meter (Brüel & Kjaer, type 2238 Mediator) with a 1/2 in. omni-directional condenser microphone (Brüel & Kjaer, type 4188) was placed horizontally at a distance of 30 cm in front of the mouth for simultaneous sound level calibration. The sound level microphone signal was obtained from the sound-level-meter's unfiltered AC output. The sound level meter microphone was calibrated with a Brüel & Kjaer (type 4231) calibrator, and the head mounted microphone was calibrated using the sound level meter as a reference microphone. Signals from both the head-mounted microphone and the sound level meter microphone were recorded with a digital recorder M-Audio (type Microtrack II, M-Audio, Irwindale, CA) at a sampling frequency of 48 kHz and with 24 bit quantization. The measurement setup is schematically displayed in Fig. 2. All recorded signals were saved in uncompressed “wav” format.

FIG. 2.

Setup for recording the softest phonations.

FIG. 2.

Setup for recording the softest phonations.

Close modal

The protocol was designed to be as short as possible (for practical reasons, owing to the time constraints of the participants and the availability of the rooms in which the measurements were performed). The procedure aimed at recording the essential voice characteristics of each participant (Wuyts et al., 2000): sustained phonation at comfortable pitch and loudness for assessment of perturbation characteristics; maximum phonation time; the lowest and highest achievable frequency; and the softest and loudest phonations. The current study focuses only on the softest voice levels. The relevant steps of the recording procedure were as follows.

  1. Acquisition of Voice Handicap Index data.

  2. Participant setup in the measurement room: Attachment of the head-mounted microphone off axis in relation to the mouth, at a mouth-to-microphone distance of 5–10 cm—the exact distance was not required to be measured since the calibration procedure (see below) allowed relating the recorded signal to the standard 30 cm distance; placement of the sound level meter at a mouth-to-microphone distance of 30 cm.

  3. Start of the audio recording.

  4. Acquisition of the 94 dB calibrator sound with the sound level meter microphone (first step in the sound level calibration procedure).

  5. Vocalization of the sustained vowel [a:] on a comfortable pitch and loudness (second step in the sound level calibration procedure).

  6. Recording of approximately 5 s of silence.

  7. Softest possible phonation, vowel [a:] at comfortable pitch.

  8. End of the audio recording.

The vowel [a:] was selected for the soft phonation task, because it has been used most frequently for investigation of human voice range profiles based on the recommendation of Gramming (1991). The subjects were instructed to sustain their soft voiced phonation while trying to slowly decrease the level down to the threshold at which the phonation spontaneously changed to whisper. This was first demonstrated to each subject by the instructors (J.G.Š. and H.Š.). The duration of the phonations was always longer than 2 s, to fulfill the recommendation of Coleman (1993). The instructors were present during the recordings, and they monitored whether the participants performed the task correctly. If not done correctly, the participants were asked to repeat the task until the instructors were satisfied.

For converting the digitized microphone signal to a sound level value and determining the minimum sound levels for each participants, six processing steps were performed: (1) calibration of the signal, in order to relate the values in the digitized signal to known pressure values; (2) frequency weighting and filtering, for obtaining standard frequency-weighted acoustic signals (IEC 61672-1, 2002) and for removing low-frequency artifacts from the signal; (3) time averaging or time weighting according to the standard requirements for sound levels (IEC 61672-1, 2002); (4) determining the background noise levels; (5) voicing detection; and (6) detection of the softest voiced sound per phonation. These six processing steps are outlined below. The program Matlab (R2012a, 7.14.0.739, MathWorks, Natick, MA) and the software Praat (version 5.3.02) (Boersma and Weenink, 2013) were used for data processing.

1. Calibration

The two-step microphone calibration method introduced by Švec et al. (2003) was used to calibrate both the head-mounted microphone (HMM) and the sound level meter microphone (SLMM). First, the SLMM was calibrated using the calibrator tone (1000 Hz pure tone at 94 dB sound level). Then, the HMM signal was adjusted to have the same (frequency non-weighted) continuous equivalent sound level as the SLMM signal during the phonation of the sustained vowel [a:]. This procedure was performed using the Matlab software, scaling the HMM signal to have the sound level as if captured at the 30 cm mouth-to-microphone distance.

2. Frequency weighting and filtering

The IEC 61672–1 standard (IEC 61672-1, 2002) defines three frequency weighting schemes: A, C, and Z. The A-weighting approximates the human ear sensitivity at low sound levels, i.e., it is most sensitive to the sound components in the frequency range of 1–5 kHz while attenuating components outside this range. The C-weighting is linear in the range of 32–8000 Hz and attenuates the sound components outside this range by more than 3 dB. The Z-weighting aims at not influencing the sound spectrum at all (Z stands for “zero”). In this study, all three frequency weighting schemes were separately applied to each phonation. The freely downloadable Matlab scripts defining the A and C weightings written by Couvreur (1997) were used for this purpose. The unmodified original signal was used for the Z-weighting scheme.

In order to optionally eliminate background noise that is clearly below the lowest frequency found in the recorded voices, a butterworth fifth order high-pass (hp) filter with a cutoff frequency of 70 Hz was implemented in Matlab. The cutoff frequency of 70 Hz was carefully chosen after determining the fundamental frequencies (f0) of all the analyzed phonations using sound spectrography: The lowest detected f0 in all the participants was 86 Hz, and the hp-filter was set to influence the amplitude at this frequency by less than 0.6 dB.1

The various frequency weighting schemes and the optional additional hp-filter were denoted in the respective sound level indicator as a suffix (e.g., LhpA for a hp-filtered and A-weighted sound level).

3. Time averaging and time weighting

Two types of sound levels were measured here, based on the IEC 61672–1 standard. These were (1) the equivalent continuous sound levels and (2) the time-weighted sound levels. The equivalent sound levels are useful for characterizing an average energy of a relatively steady acoustic signal whereas the time-weighted sound levels allow monitoring the changes in the sound level over time at a speed corresponding to a predefined time constant.

The time-averaged or equivalent continuous sound level LeqT was used here for reporting the background noise levels of the recordings and for the averaged levels of the softest phonations. This level is based on the logarithm of the root-mean-square value of the sound pressure over a time interval T, as defined in the international standard (IEC 61672-1, 2002). For assessing the equivalent sound level of the recorded soft phonations, the duration of the interval T was by default set to one second (i.e., n = 48 000 samples). The resulting sound level label was denoted by the suffix eq1s, which was appended after the suffix for indicating the frequency weighting (e.g., LAeq1s for an A-weighted equivalent sound level calculated for the duration of one second, see Table I). For assessing the background noise levels the duration of the interval T was approximately 5 s (with slight variation between the individual recordings). Here, the sound level for background noise is only indicated by the suffix eq (e.g., LAeq, see Table I). Since the background noise was rather stable, the slight differences in time interval duration had essentially no influence on the calculated background noise sound levels.

TABLE I.

Labels for the different sound levels measured and their meanings.

Label Meaning
LAeq, LCeq, LZeq  Equivalent continuous sound levels obtained with standard A-, C-, or Z-frequency weighting, respectively 
LhpAeq, LhpCeq, LhpZeq  The same as above, but with high-pass (hp) filtering added 
LAeq1s, LCeq1s, LZeq1s  One-second-equivalent continuous sound levels obtained with standard A-, C- or Z-frequency weighting, respectively 
LhpAeq1s, LhpCeq1s, LhpZeq1s  The same as above, but with hp-filtering added 
LAF, LCF, LZF  Sound levels obtained with standard A-, C-, or Z- frequency weighting, respectively, and standard “Fast” time weighting (time decay constant 0, 125 ms) 
LhpAF, LhpCF, LhpZF  The same as above, but with hp-filtering added 
Label Meaning
LAeq, LCeq, LZeq  Equivalent continuous sound levels obtained with standard A-, C-, or Z-frequency weighting, respectively 
LhpAeq, LhpCeq, LhpZeq  The same as above, but with high-pass (hp) filtering added 
LAeq1s, LCeq1s, LZeq1s  One-second-equivalent continuous sound levels obtained with standard A-, C- or Z-frequency weighting, respectively 
LhpAeq1s, LhpCeq1s, LhpZeq1s  The same as above, but with hp-filtering added 
LAF, LCF, LZF  Sound levels obtained with standard A-, C-, or Z- frequency weighting, respectively, and standard “Fast” time weighting (time decay constant 0, 125 ms) 
LhpAF, LhpCF, LhpZF  The same as above, but with hp-filtering added 

The time-weighted sound level Lτ was used for monitoring the level of the softest sustained phonations and their variations over time. The calculation of this level was done in Matlab by implementing the IEC 61672–1-recommended procedure (see Fig. 3) using a filter corresponding to the exponential decay (U = et/τ, where t is time and τ is the standard time constant). For the purpose of this study, the standard fast (F) time weighting (with the time constant τ = 0.125 s) has been implemented and applied to the data. While the IEC 61672–1 standard also defines the “Slow” time weighting (with the time constant τ = 1 s), this approach was not applied here since it requires a duration of approximately three time constants (i.e., 3 s in the case of the Slow time weighting) for the sound level to stabilize after the phonation onset. This was considered to be rather slow for the purpose of the present study (see also Sec. II D 6). The fast time-weighted sound levels were indicated by the suffix F (e.g., LAF, see Table I).

FIG. 3.

Schematic illustration of the derivation of time-weighted sound levels as used in this study. (Based on the IEC 61672-1 standard).

FIG. 3.

Schematic illustration of the derivation of time-weighted sound levels as used in this study. (Based on the IEC 61672-1 standard).

Close modal

In summary, the following combinations of frequency weighting (A, C, Z, with optional high-pass filter hp) and time weighting/averaging (F, eq, eq1s) have been calculated (Table I): LAeq, LCeq, LZeq, LhpAeq, LhpCeq, Lhpeq (for background noise); and LAeq1s, LCeq1s, LZeq1s, LhpAeq1s, LhpCeq1s, Lhpeq1s, LAF, LCF, LZF, LhpAF, LhpCF and LhpF (for each phonation).

4. Determining the background noise levels

While the original background noise levels in the rooms were known from the initial measurements using a sound level meter, the final background noise levels were measured as the equivalent continuous sound levels from the digital audio recordings of the head-mounted microphone signal during silence (see step 6 of the Recording Procedure above). This was done separately for each recording. The recordings were expected to have suppressed room noise due to the use of the head-mounted microphone in the proximity of the mouth and due to the calibration procedure projecting its position to the distance of 30 cm. The final background noise level was expected to reflect also the inherent noise of the microphone, preamplifier and A/D converter, however, which also could limit the accuracy of the softest voice measurements (Švec and Granqvist, 2010).

5. Voicing detection

In order to separate voiced portions of the vowel [a:] from unvoiced ones, voicing detection was performed using program Praat (Boersma and Weenink, 2013). The voicing detection is performed here as a by-product of the fundamental frequency (f0) analysis (i.e., “pitch” analysis in Praat) by means of an autocorrelation algorithm. A portion of the analyzed sound is considered to be locally unvoiced if the maximum correlation coefficient in the respective autocorrelation lag function is below a certain threshold (Praat defaults to 0.45).

For this study, Praat's default parameters for autocorrelation f0 analysis were used, with three exceptions. The “silence threshold” parameter, indicating the signal amplitude in a single frame (in comparison to the global maximum amplitude) below which the signal in the frame is considered as silence (regardless of harmonic content), was set to a very conservative value of 0.001 (i.e., 60 dB below the global maximum amplitude of the soft phonations). The minimum f0 that the autocorrelation algorithm should consider (“pitch floor” in Praat) was set to each individual participant's lowest frequency multiplied by 0.75. (Before doing so, each subject's lowest f0 was determined by means of manual analysis of respective sound spectrograms). The rationale behind this choice was to ensure a minimally short window for f0 (and hence voicing) detection, owing to the fact that Praat's default choice for the autocorrelation analysis window length is three times the lowest indicated period (i.e., three times the reciprocal of the “pitch floor”). Finally, the “pitch ceiling” parameter was set to 1.5 times the highest f0 found in each participant's collected corpus of soft phonations.

6. Detection of the softest sound levels

Only voiced portions of the vowel [a:] were considered for the detection of the softest sound levels. Within each voiced sequence, the initial 375 ms were discarded, in order to eliminate the unvoiced-to-voiced transition response of the “fast” time-weighted sound levels. The duration of 375 ms (three times the value of the standard fast time constant 125 ms) was chosen based on experimental simulation of stable sound with an abrupt onset, and observing the time after which the fast-weighted sound level approximated the final stable level. The relation between the time delay t and the expected level difference L was then derived analytically as L = −10·log10(1−et/τ), where τ is the time constant used in the “fast” time-weighting. When setting t/τ to 3 (375 ms divided by 125 ms), a level difference of approximately 0.22 dB was obtained.

From the remainder of the voiced portions of each phonation, the softest sound levels (for different frequency weighting schemes and time-weighting/averaging approaches) were determined per participant. An example of sound level estimation and voicing detection is shown in Fig. 4.

FIG. 4.

Detection of fast-weighted (LAF) and one-second-equivalent sound levels (LAeq1s). (a) The calibrated head-mounted microphone signal of the softest sustained phonation on vowel [a:]. Notice the instants of ceasing phonation when the subject crossed the phonation threshold around 3–4 s, 7–8 s, and 14.7 s; (b) The resulting fast-weighted sound levels LAF obtained from the microphone signal in (a). The dashed line is the level without any voicing detection. The solid line omits the unvoiced intervals plus the first 0.375 ms of the phonation onsets. The circle indicates the minimum fast-weighted sound level obtained using this detection method; (c) Comparison of one-second-equivalent sound level (LAeq1s, solid thick line) and fast-weighted sound level LAF (thin dotted line). The circles show the minima for the two sound levels. Note two main differences between the two detection methods: the LAeq1s has a more stable trajectory, and is also delayed in relation to the LAF, due to the fact that one second of sustained phonation is needed before the first LAeq1s value can be obtained.

FIG. 4.

Detection of fast-weighted (LAF) and one-second-equivalent sound levels (LAeq1s). (a) The calibrated head-mounted microphone signal of the softest sustained phonation on vowel [a:]. Notice the instants of ceasing phonation when the subject crossed the phonation threshold around 3–4 s, 7–8 s, and 14.7 s; (b) The resulting fast-weighted sound levels LAF obtained from the microphone signal in (a). The dashed line is the level without any voicing detection. The solid line omits the unvoiced intervals plus the first 0.375 ms of the phonation onsets. The circle indicates the minimum fast-weighted sound level obtained using this detection method; (c) Comparison of one-second-equivalent sound level (LAeq1s, solid thick line) and fast-weighted sound level LAF (thin dotted line). The circles show the minima for the two sound levels. Note two main differences between the two detection methods: the LAeq1s has a more stable trajectory, and is also delayed in relation to the LAF, due to the fact that one second of sustained phonation is needed before the first LAeq1s value can be obtained.

Close modal

In order to measure the softest sound levels correctly, it is important that the softest sound level is above the background noise level. There are various requirements on the SNR applied for different kinds of conditions−a minimum 30 dB SNR has been recommended for perturbation measurements (Deliyski et al., 2006; Perry et al., 2000), 15 dB SNR has been recommended for classrooms as well as for microphones (Švec and Granqvist, 2010), etc. For the purpose of our study we consider a less stringent SNR of 10 dB, which has been recognized a sufficient condition for accurately measuring sound level in noise (Brüel and Kjaer, 1984). The 10 dB SNR guarantees that the measured sound level of a sound source is influenced by less than 0.5 dB by the background noise. The influence of background noise on the sound source is analytically derived in the Appendix [Eq. (A10) and Fig. 6].

FIG. 6.

(Color online) Sound level correction (ΔL) as a function of the SNR (i.e., the difference between the measured voice sound level and the background noise level). The dashed line at ΔL = 0.5 dB is intersected by the graph at SNR ≈ 10 dB, indicating that the background noise level must be at least 10 dB lower than the measured sound level in order for the background noise to influence the measurement by less than 0.5 dB.

FIG. 6.

(Color online) Sound level correction (ΔL) as a function of the SNR (i.e., the difference between the measured voice sound level and the background noise level). The dashed line at ΔL = 0.5 dB is intersected by the graph at SNR ≈ 10 dB, indicating that the background noise level must be at least 10 dB lower than the measured sound level in order for the background noise to influence the measurement by less than 0.5 dB.

Close modal

The normality of the data distribution was assessed with a Jarque-Bera test. The differences between females and males (for the various time- and frequency-weighting strategies) were analyzed with Wilcoxon rank sum tests. Within each gender, the sound levels obtained with different time- and frequency-weighting methods were compared with Wilcoxon signed rank tests. All statistical tests were performed in Matlab.

The average background noise levels of the recordings, which were decisive for estimating the limits for detecting the softest voice levels, are indicated in Table II (see footnote 1). The recordings from the two rooms differed slightly; the broadcast studio (room 1) had lower background noise levels than the audition room 2. The applied frequency weighting and filter settings had considerable influence on the results. The overall mean noise values (averaged across both rooms) were distributed between 21.2 dB (LhpAeq) and 46.6 dB (LZeq). As expected, elimination of the low frequencies (below the vocal range) from the background noise by the hp-filter considerably lowered the resulting C- and Z-weighted noise level. The Z-weighting resulted in the highest and the A-weighting in the lowest background noise levels.

TABLE II.

Mean background noise levels and their standard deviations (SD) for different frequency weighting and filter settings in the recordings from the two different rooms (1, audition room; 2, room in the broadcast station) measured from the head-mounted microphone signal. The overall levels were obtained as the average noise levels from all the subjects across both rooms. The background noise levels incorporate the external noise of the room as well as the internal noise of the measurement equipment (see footnote 1).

No optional hp-filter With optional hp-filter
Frequency weighting Room Sound level type Value [dB] SD [dB] Sound level type Value [dB] SD [dB]
LAeq  24.7  2.9  LhpAeq  24.3  3.0 
19.9  2.5  19.8  2.6 
Overall  21.2  3.4  21.1  3.4 
LCeq  45.4  2.5  LhpCeq  30.9  2.5 
33.1  1.8  23.5  2.9 
Overall  36.5  5.9  25.5  4.4 
LZeq  51.9  2.5  LhpZeq  31.6  2.5 
44.6  3.7  25.0  2.5 
Overall  46.6  4.8  26.8  3.9 
No optional hp-filter With optional hp-filter
Frequency weighting Room Sound level type Value [dB] SD [dB] Sound level type Value [dB] SD [dB]
LAeq  24.7  2.9  LhpAeq  24.3  3.0 
19.9  2.5  19.8  2.6 
Overall  21.2  3.4  21.1  3.4 
LCeq  45.4  2.5  LhpCeq  30.9  2.5 
33.1  1.8  23.5  2.9 
Overall  36.5  5.9  25.5  4.4 
LZeq  51.9  2.5  LhpZeq  31.6  2.5 
44.6  3.7  25.0  2.5 
Overall  46.6  4.8  26.8  3.9 

1. Fast-time-weighted sound levels

The measured softest fast-time-weighted sound levels averaged across females and males, as calculated with different frequency-weighting and filter settings, are indicated in Table III. As expected, the lowest sound levels were obtained with the A-weighting (since the A-filter attenuates the low-frequency spectral components of voice). The mean softest A-weighted levels were found at about 44 and 39 dB(A) for females and males, respectively. The Z-weighted and C-weighted sound levels were higher and showed approximately similar mean values for both males and females—around 51–53 dB. However, these Z- and C-weighted measures were found to be sensitive to the low-frequency background noise. Table III (last column) shows that 75% and 33% of the Z- and C-weighted measures, respectively, did not fulfill the 10 dB SNR condition in females, and 78% and 15% did not fulfill it in males. When an additional 70 Hz hp-filter was applied, the 10 dB SNR condition was fulfilled in 100% of the cases. The Z- and C-weighted levels decreased in that case (on average by 1.6–1.8 and 0.3 dB, respectively) and reached similar mean values (51.6 and 51.2–51.4 dB for females and males) as expected. The 70 Hz hp-filter showed only minimal influence on the A-weighted sound levels. The 10 dB SNR condition was fulfilled in all the A-weighted individual measures—both before and after application of the hp-filter.

TABLE III.

Descriptive statistics for the softest fast-weighted sound levels at 30 cm in females and males, as calculated with different frequency weighting and filter settings. The last column indicates the percentage of cases when the softest sound level was less than 10 dB above the background noise level, suggesting that the measured sound levels were inflated by more than 0.5 dB in these cases (see the Appendix (see footnote 1)).

Sound level type Mean[dB] SD[dB] Median[dB] Min.[dB] 5% quantile[dB] 95% quantile[dB] Max.[dB] Softest level to noise ratio (SNR)
Mean[dB]] Median[dB] SNR < 10 dB[%]
Females 
LZF  53.3  3.4  53.0  45.9  48.0  58.8  62.0  6.2  4.3  75 
LhpF  51.6  4.1  51.4  39.3  46.3  58.4  61.7  23.1  23.2 
LCF  51.9  4.1  51.9  39.6  46.7  58.5  61.8  13.9  14.6  33 
LhpCF  51.6  4.2  51.3  39.2  46.3  58.4  61.7  24.3  24.3 
LAF  43.8  4.6  43.8  29.6  37.6  49.7  57.1  21.1  20.8 
LhpAF  43.8  4.6  43.8  29.6  37.6  49.7  57.1  21.3  20.9 
Males 
LZF  53.2  4.8  51.7  42.8  47.5  61.4  68.1  7.0  5.3  78 
LhpF  51.4  5.3  50.3  42.1  44.0  61.0  66.3  26.2  25.2 
LCF  51.5  5.3  50.5  42.3  44.4  61.0  66.3  16.6  15.5  15 
LhpCF  51.2  5.3  50.0  41.9  44.0  61.0  66.1  27.5  26.3 
LAF  38.8  5.9  37.6  30.1  30.9  50.7  52.0  19.1  18.3 
LhpAF  38.7  5.4  37.6  29.9  30.9  50.7  52.0  19.1  18.4 
Sound level type Mean[dB] SD[dB] Median[dB] Min.[dB] 5% quantile[dB] 95% quantile[dB] Max.[dB] Softest level to noise ratio (SNR)
Mean[dB]] Median[dB] SNR < 10 dB[%]
Females 
LZF  53.3  3.4  53.0  45.9  48.0  58.8  62.0  6.2  4.3  75 
LhpF  51.6  4.1  51.4  39.3  46.3  58.4  61.7  23.1  23.2 
LCF  51.9  4.1  51.9  39.6  46.7  58.5  61.8  13.9  14.6  33 
LhpCF  51.6  4.2  51.3  39.2  46.3  58.4  61.7  24.3  24.3 
LAF  43.8  4.6  43.8  29.6  37.6  49.7  57.1  21.1  20.8 
LhpAF  43.8  4.6  43.8  29.6  37.6  49.7  57.1  21.3  20.9 
Males 
LZF  53.2  4.8  51.7  42.8  47.5  61.4  68.1  7.0  5.3  78 
LhpF  51.4  5.3  50.3  42.1  44.0  61.0  66.3  26.2  25.2 
LCF  51.5  5.3  50.5  42.3  44.4  61.0  66.3  16.6  15.5  15 
LhpCF  51.2  5.3  50.0  41.9  44.0  61.0  66.1  27.5  26.3 
LAF  38.8  5.9  37.6  30.1  30.9  50.7  52.0  19.1  18.3 
LhpAF  38.7  5.4  37.6  29.9  30.9  50.7  52.0  19.1  18.4 

The variability of the softest sound levels was also investigated. A Jarque-Bera test revealed that the data were not always normally distributed. Therefore, summary statistics are reported in Table III as measures of sound levels along the cumulative distribution function, showing the mean and median values, the extremes, as well as the 5% and 95% quantile levels.

2. One-second- equivalent sound levels

Observation of the measurements for the softest fast-weighted sound levels revealed that the minimal values were mostly detected in the transition parts of the signal—most often at the end of the voicing passages when the sound level was rapidly decreasing [recall Fig. 4(b)]. As such, these values may underestimate the representative values of the sustained soft phonations. In order to obtain a more representative measure of the softest sustained phonation, the one-second-equivalent sound pressure levels were measured here. The results are provided in Table IV.

TABLE IV.

Descriptive statistics for the softest one-second-equivalent sound levels (Leq1s) in females and males at the distance of 30 cm, as calculated with different frequency weighting (Z-, C-, or A-weighting) and filter settings (hp or no hp-filtering). The last column indicates the percentage of cases in which the softest sound level was less than 10 dB above the background noise level, suggesting that the measured sound levels were inflated by more than 0.5 dB in these cases (see the Appendix) (see footnote 1).

Sound level type Mean[dB] SD[dB] Median[dB] Min.[dB] 5% quantile[dB] 95% quantile[dB] Max.[dB] Softest level to noise ratio (SNR)
Mean[dB] Median[dB] SNR < 10 dB[%]
Females 
LZeq1s  56.3  3.4  56.2  47.4  51.8  60.8  65.7  9.2  7.8  54 
Lhpeq1s  55.3  4.0  55.8  43.6  48.3  60.6  65.5  26.9  27.0 
LCeq1s  55.4  4.5  56.0  43.8  48.4  60.6  65.5  17.6  18.6  13 
LhpCeq1s  55.3  4.0  55.8  43.6  48.3  60.6  65.5  28.1  28.2 
LAeq1s  47.5  4.5  47.6  32.9  41.2  52.7  60.9  24.8  24.4 
LhpAeq1s  47.5  4.5  47.6  32.9  41.2  52.7  60.9  25.0  24.4 
Males 
LZeq1s  56.1  4.7  55.7  44.8  50.4  63.9  71.6  10.0  8.9  68 
Lhpeq1s  55.1  4.8  54.0  44.1  49.0  63.7  69.4  29.9  29.6 
LCeq1s  55.1  4.8  54.2  44.2  49.0  63.6  69.3  20.1  20.2  5 
LhpCeq1s  54.9  4.8  53.9  44.0  48.9  63.5  69.3  31.2  31.2 
LAeq1s  42.3  5.4  42.1  31.8  34.6  52.8  54.5  22.6  21.9 
LhpAeq1s  42.3  5.4  42.0  31.8  34.6  52.8  54.5  22.6  22.0 
Sound level type Mean[dB] SD[dB] Median[dB] Min.[dB] 5% quantile[dB] 95% quantile[dB] Max.[dB] Softest level to noise ratio (SNR)
Mean[dB] Median[dB] SNR < 10 dB[%]
Females 
LZeq1s  56.3  3.4  56.2  47.4  51.8  60.8  65.7  9.2  7.8  54 
Lhpeq1s  55.3  4.0  55.8  43.6  48.3  60.6  65.5  26.9  27.0 
LCeq1s  55.4  4.5  56.0  43.8  48.4  60.6  65.5  17.6  18.6  13 
LhpCeq1s  55.3  4.0  55.8  43.6  48.3  60.6  65.5  28.1  28.2 
LAeq1s  47.5  4.5  47.6  32.9  41.2  52.7  60.9  24.8  24.4 
LhpAeq1s  47.5  4.5  47.6  32.9  41.2  52.7  60.9  25.0  24.4 
Males 
LZeq1s  56.1  4.7  55.7  44.8  50.4  63.9  71.6  10.0  8.9  68 
Lhpeq1s  55.1  4.8  54.0  44.1  49.0  63.7  69.4  29.9  29.6 
LCeq1s  55.1  4.8  54.2  44.2  49.0  63.6  69.3  20.1  20.2  5 
LhpCeq1s  54.9  4.8  53.9  44.0  48.9  63.5  69.3  31.2  31.2 
LAeq1s  42.3  5.4  42.1  31.8  34.6  52.8  54.5  22.6  21.9 
LhpAeq1s  42.3  5.4  42.0  31.8  34.6  52.8  54.5  22.6  22.0 

As in the previous case (fast-time-weighted sound levels), the lowest sound levels were obtained with the A-weighting, as expected. Their mean values were around 47.5 and 42.3 dB(A) for females and males, respectively, while both the Z- and C-weighted values were around 55–56 dB for both males and females. The hp-filter was found to decrease the Z- and C-weighted levels, again indicating some influence of the background noise on the non-hp-filtered values. Without application of the hp-filter, the 10 dB SNR condition was not fulfilled in females in 54% and 13% of the Z-weighted and C-weighted cases, respectively, while in males it was not fulfilled in 68% and 5% of the Z-weighted and C-weighted cases. After hp-filtering the condition was fulfilled in 100% of the cases (Table IV, last column). The variation of the measured equivalent softest sound levels among subjects is also shown in Table IV.

Theoretically, the ideal condition for sound level measurement would be in the absence of any background noise. However, such a condition is hard to find in practice. In common quiet rooms, the background noise level rarely goes below 40 dB(A) or 50 dB(C). Such noise levels may still prevent accurate measurements of the softest voice levels at the distance of 30 cm. In order to minimize the influence of the background noise and measure the softest voice levels as accurately as possible we applied three precautions here: (1) very silent rooms; (2) close microphone position (5–10 cm distance to the side from the mouth, which increases the voice SNR by 10–15 dB when compared to 30 cm distance, e.g., Švec and Granqvist, 2010); and (3) frequency weighting and hp-filtering of the microphone signals.

How much was the background noise influenced by the standard frequency weighting and by the additional hp-filtering? The overall A-weighted noise level in the recordings was on average around 21 dB(A) (Table II). The custom hp-filtering was found to be not necessary for the A-weighted levels: The standard A-weighting guaranteed a satisfactory SNR in all the cases of this study (Tables III and IV).

Despite of the very quiet rooms used, the overall Z-weighted noise level in the recordings reached on average 47 dB (Table II), still making the average SNR insufficient to guarantee accurate measurements of the softest phonations in all cases (recall the SNR percentages in Table III and IV). The C-weighting was able to filter out some of the low frequency noise, yielding overall noise levels that were lower than the Z-weighted ones, but this approach was still insufficient in 13% and 5% of the female and male measurements of LCeq1s, respectively (recall Table IV). Careful hp-filtering was therefore applied, in order to further reduce the background noise components at frequencies below the vocal f0. Tables III and IV show that after the hp-filtering the 10 dB SNR condition was fulfilled in 100% of the cases, guaranteeing that the noise did inflate none of the measurements by more than 0.5 dB (see the Appendix). The hp-filtering was therefore found useful for reducing the background noise levels while minimally influencing the spectral properties of the voice signal.

While the standard Z- and C-weightings should not influence the sound levels of human voice, the standard A-weighting is known to decrease the phonatory sound levels, particularly due to its lowered sensitivity to the low frequencies (Gramming and Sundberg, 1988; Pedersen, 1997). The hp-attenuation of the A-weighting filter starts as high as 1000 Hz, reaching −3 and −6 dB attenuation at about 520 and 340 Hz, respectively, and about −11 and −19 dB around 200 and 100 Hz, respectively (IEC 61672-1, 2002) (see also footnote 1). The spectral components of human voice below 1000 Hz are therefore considerably attenuated. Generally, people with lower f0 are likely to show lower A-weighted sound levels than those with higher f0. Since the average fundamental frequency of human voice is around 100 Hz for males and around 200 Hz in females, male voices are generally attenuated more than female voices. This notion is well reflected in the results of this study: For females the A-weighted sound levels were approximately 8 dB and for males approximately 13 dB lower than the C and Z-weighted levels (Table IV).

The characteristics of the A-weighting filter coarsely match the sensitivity of human hearing to different frequencies at low loudness levels, represented by equal loudness curves (see, e.g., ANSI S1.4-1983, 1985; Howard and Angus, 2009; IEC 61672-1, 2002; ISO 226, 2003). The A-weighting strategy is therefore useful when an approximation of perceived loudness levels is desired, or when measurement in a room with considerable background noise is made. It is, however, apparent that sound level data acquired with the A-weighting method cannot be compared to equally valued data stemming from a C-weighting approach.

Measurement of the softest voice levels is a highly non-trivial task, and a couple of unresolved issues exist. Voicing detection usually relies on detecting the sound originating from periodic vocal fold vibrations (Boersma and Weenink, 2013). However, the voice has a tendency to be unstable at the lower intensity range (Lucero, 1999). Consequently, not all phonations at the extreme lower range are type I voice signals that are periodic or at least nearly periodic in nature (Titze, 1995). Due to the lack of standardization and for practical reasons, the periodicity assessment in this study has been delegated to the algorithm for voicing detection and fundamental frequency calculation, by means of the minimum correlation coefficient in the autocorrelation lag function (the “voicing threshold” in Praat). Praat's default value of 0.45 was used. Different “voicing threshold” values might arguably have resulted in more or less data points that were considered for further analysis.

Different voice analysis programs use different time intervals for calculating the sound pressure level, based, for instance, on the minimal expected fundamental frequency, on duration of the detected glottal vibratory cycle, etc. Here, in order to minimize the measurement arbitrariness as well as the effect of unsteadiness of the softest phonations on the final values, we have used the standard fast time weighting LF (exponential decay time constant 125 ms) and the time averaging Leq1s (one-second equivalent sound levels). Comparing these two methods, the minimum LF values were in overall about 3 dB lower than those of Leq1s. An inspection of the measured levels revealed that the LF values reached their minimum usually at the moment of termination of the sustained phonation when the voice amplitude was rapidly decreasing (recall Fig. 4). As such, the detected minimum LF values did not really reflect the sustained part of the soft phonation but rather the transition state between the voiced and unvoiced part. The Leq1s parameter may therefore be considered to be more representative of the sustained soft phonation, since it averages the energy of the voiced signal over the interval of one second, reducing the influence of the unstable transition portion and of the voicing detection instant. Another possibility for reducing the instability of the levels would be to use the standard “slow” time weighting. But since this approach requires about 3 s worth of data (i.e., ca. three times the “slow” time constant of 1 s) before the sound level stabilizes after the phonation onset, thus in effect discarding the first three seconds of each phonation, the slow-time-weighting approach was not utilized here.

In summary, Fig. 5 provides the best estimates of the one-second equivalent sound levels Leq1s of the softest sustained phonations of the vowel [a:] for the non-weighted (i.e., C- and Z-weighted cases, which do not influence the vocal spectrum and are expected to be the same under ideal noise-free circumstances, here approximated by hp-filtering) and the A-weighted cases. When not weighted, the mean softest voice levels are around 55 dB for both the females and males. These results are close to those reported by Leino (2008) for females and Gramming and Åkerlund (1988) for males. However, they are about 8 dB lower than those of Awan (1991), and 8 or 11 dB higher than those reported by Pabon et al. (2011) for females or Hallin et al. (2012) for males, respectively. The higher levels of Awan (1991) may be explained by his more stringent requirements on the produced phonations: the subjects were required to be steady in pitch, loudness and quality, while in this study the only requirement was continuous voicing. In contrast, Pabon et al. (2011) and Hallin et al. (2012) explored software programs in which the sound levels were detected from short portions of the signal (tenths of milliseconds rather than one second utilized here). Consequently, the levels measured in these studies could have been partly extracted from transition portions of the signal at vocal offsets, thus not representing sustained phonation levels and potentially allowing for much lower values than reported here.

FIG. 5.

The softest achievable one-second-equivalent sound levels Leq1s at 30 cm in vocally healthy human subjects when phonating on vowel [a:]. Descriptive statistics for (a) females and (b) males, showing separately within each panel the non-weighted (spectrally unchanged in the vocal range, i.e., LC, LZ, left—see text) and A-weighted sound levels (LA, right). The data are considered valid for recordings with the SNR of at least 10 dB. Each of the bars indicates (from bottom to top): the extreme minimum (bottom circle), 5% quantile (whiskers bottom), 25% quantile (box bottom), median (line), mean (rectangle), 75% quantile (box top), 95% quantile (whiskers top), extreme maximum (top circle).

FIG. 5.

The softest achievable one-second-equivalent sound levels Leq1s at 30 cm in vocally healthy human subjects when phonating on vowel [a:]. Descriptive statistics for (a) females and (b) males, showing separately within each panel the non-weighted (spectrally unchanged in the vocal range, i.e., LC, LZ, left—see text) and A-weighted sound levels (LA, right). The data are considered valid for recordings with the SNR of at least 10 dB. Each of the bars indicates (from bottom to top): the extreme minimum (bottom circle), 5% quantile (whiskers bottom), 25% quantile (box bottom), median (line), mean (rectangle), 75% quantile (box top), 95% quantile (whiskers top), extreme maximum (top circle).

Close modal

As far as the normal variations of the non-weighted sound levels are concerned, at the 5% quantile the levels were around 48 dB for both males and females. Generally, this quantile level can be considered as the low limit of normal phonations; 95% of vocally healthy subjects are expected to stay above this value. The 95% quantile was found around 64 dB in females and 67 dB in males. These levels can be considered to be the normal upper limits of the softest phonations in humans; 95% of vocally healthy subjects are expected to be able to produce softest phonations below these levels. These levels are expected to be different for voice disorders and may show variations also for different age groups; such factors were not in the scope of this study, however.

The A-weighted levels showed their mean values around 47 and 42 dB(A) in females and males, respectively. These values are similar to those reported by Hacki (1999) and Šiupšinskiene (2003) for females and Sulter et al. (1995) for males. However, they are lower than most of the other previously published results (recall Fig. 1), except of those of Sulter et al. (1995), who report about 2 dB lower mean values for females. A possible cause for the increased sound levels in other studies may be the background noise levels, which were apparently not as strictly controlled as in this study.

When investigating the normal variability of the A-weighted sound levels, the low normal limit for 95% of the subjects (i.e., the 5% quantile) was around 41 dB(A) in females and around 35 dB(A) in males. The 95% quantile was found around 53 dB(A) both in females and males. The 95% quantile levels are very close to the threshold of 55 dB(A) suggested by Friedrich and Dejonckere (2005), to be used in clinical practice for evaluating the normality of soft phonation. Our data provide rather solid support for this recommendation.

To discriminate between sound level data of females and males, and between the A-weighted and non-weighted levels, statistical testing was performed (Table V). No statistically significant difference between females and males was found when measuring non-weighted levels. In contrast, when applying the A-weighting strategy, statistically significant differences were found for females vs males. The A-weighting resulted in significant changes of the sound levels of both the female and male subjects. This is in agreement with previous observations (Gramming and Sundberg, 1988; Pedersen, 1997) and again confirms that the A-weighted levels should be considered to be a different entity as compared to the non-weighted levels.

TABLE V.

Results of statistical tests for the female vs male and non-weighted vs A-weighted softest sound level (Leq1s) comparisons.

Group Test p value Significance
Females vs males (non-weighted)  Wilcoxon rank sum  0.36  Not significant 
Females vs males (A-weighted)  Wilcoxon rank sum  1.3 × 10−5  Highly significant 
Non-weighted vs A-weighted (females)  Wilcoxon signed rank  5.3 × 10−8  Highly significant 
Non-weighted vs A-weighted (males)  Wilcoxon signed rank  3.6 × 10−8  Highly significant 
Group Test p value Significance
Females vs males (non-weighted)  Wilcoxon rank sum  0.36  Not significant 
Females vs males (A-weighted)  Wilcoxon rank sum  1.3 × 10−5  Highly significant 
Non-weighted vs A-weighted (females)  Wilcoxon signed rank  5.3 × 10−8  Highly significant 
Non-weighted vs A-weighted (males)  Wilcoxon signed rank  3.6 × 10−8  Highly significant 

In 1983, the UEP recommended the maximum background noise level value of 40 dB(A) found in “living-room acoustics” (Schutte and Seidner, 1983) for the measurements of voice range profile, including the softest voice levels. However, the data from this study indicate that this level is too high for performing accurate measurements of the softest voice levels at the mouth-to-microphone distance of 30 cm in vocally healthy subjects. For accurate measurements it is crucial that the background noise level is considerably lower than the level of the softest phonations (Švec and Granqvist, 2010). A SNR of more than 10 dB may be utilized as a useful criterion, since it influences the measurements by less than 0.5 dB (see the Appendix).

Taking into account that the sound levels at the 5% quantile were found to be around 35 dB (A-weighted) and 48 dB (non-weighted), the levels of 25 dB (A-weighted) and 38 dB (non-weighted) could be recommended as the maximum background noise levels for accurate measurements of the softest phonations in 95% of the normal population at the distance of 30 cm. These background noise levels are supposed to encompass the room, microphone as well as the recording equipment internal noise. However, when a head-mounted microphone is used, maximum background noise levels of about 35–40 dB (A-weighted) and about 48–53 dB (non-weighted) may be considered acceptable for accurate soft voice measurements, since in a head-mounted setup the voice signal is about 10–15 dB stronger than at 30 cm distance (Švec and Granqvist, 2010). In any case, the background noise levels (complemented by information about the applied frequency and time weighting methods) should always be measured and reported together with the actual voice sound level data.

Softest achievable sound levels: The one-second-equivalent softest sound levels of sustained phonations for vowel [a:] were in the range of 48–61 dB and 49–64 dB, in females and males, respectively (5% to 95% quantile range) at 30 cm distance from the mouth. When A-weighted, the corresponding female and male sound levels were lower: 41–53 dB(A) and 35–53 dB(A), respectively. These ranges can be utilized as reference data in evaluating vocal normality.

Z-weighting: The Z-weighted, i.e., spectrally unmodified voice signals are suitable for approximating the radiated vocal power. They may not allow accurate measurements of the softest phonations when background noise is present, however. An additional hp-filter with the cutoff frequency below the vocal f0 can help improving the SNR and thus the measurement accuracy.

C-weighting: The C-weighting strategy filters out the sound components below 31.7 Hz and therefore facilitates achieving better SNR than the Z-weighting under usual circumstances. C-weighting does not considerably influence the vocal spectrum and is suitable for approximating the radiated vocal power. When background noise is present, the C-weighting may however not sufficiently attenuate the background noise to allow accurate measurements of the softest phonations. An additional hp-filter can help improving the SNR and the measurement accuracy. Under noise-free conditions, the C-weighted sound levels of the softest phonations are expected to be the same as the Z-weighted ones.

A-weighting: The A-weighting standard produces greatly decreased sound levels in comparison to the C- and Z-weighting standards, particularly when analyzing phonations at lower fundamental frequencies. Yet, the A-weighting standard also decreases the background noise levels making the voice measurements less problematic in sub-optimal ambient conditions. The A-weighted sound levels may be perceptually relevant since they approximate the perceived loudness levels of soft sounds. They should however not be used for assessing the truly radiated vocal power.

Time-weighting and averaging: The one-second-equivalent (i.e., one-second-running-average) method yielded more representative results for automatic detection of minimum sound levels of the sustained phonations than the fast-time-weighting method. Averaging the power of the voiced signal over the interval of one second resulted in smaller susceptibility to the influence of unstable signal portions at voice onsets and offsets, which could otherwise show considerably lower sound levels.

Background noise levels: To assure that the inflation of the measured sound levels is less than 0.5 dB, a SNR of at least 10 dB is needed. To assure such an SNR for more than 95% of vocally healthy subjects, the background noise levels should be below 25 dB(A) and 38 dB(C or Z) for sustained softest phonation measurements at 30 cm mouth-to-microphone distance. The 25 dB(A) background noise level is 15 dB lower than the background noise level recommended by UEP.

Background noise control: Sufficient SNR may be achieved by using very silent rooms, but also by placing the microphone closer to the mouth, by proper selection and setup of the microphone and of the recording equipment (Švec and Granqvist, 2010), and by careful hp-filtering of the signals. To assure data accuracy, background noise levels (i.e., levels at the instances when the measured subjects are silent) should always be measured and reported when presenting sound level data.

The research has been supported in the Czech Republic by the yearly Palacký University student's projects PrF_2012_026, PrF_2013_017 and PrF_2014_029 (H.Š.), and by the European Social Fund Projects OP VK CZ.1.07/2.3.00/20.0057 (J.G.Š.), 1.07/2.3.00/30.0004 “POST-UP” (C.T.H., J.G.Š.), and 1.07/2.4.00/17.0009 (H.Š., J.G.Š.). Contribution of the authors: H.Š. has done all the recordings, data collection, data analysis, and wrote the first version of the manuscript; S.G. designed the procedures for signal processing, participated in the study preparations and in writing the final manuscript; C.T.H. critically analyzed the results, initially derived the formulas in the Appendix and wrote the final version of the manuscript; J.G.Š. designed and supervised the study, participated in the subject recordings, data collection and analysis, and wrote the final version of the manuscript.

When adding two uncorrelated signals, such as voice and background noise, their powers sum up. The power of a signal is proportional to the square of the signal, which results in the following relation between the measured sound pressure pM, the source sound pressure (i.e., voice sound pressure) pS, and the noise sound pressure pN:

p M 2 = p S 2 + p N 2
(A1)

or

p S 2 = p M 2 p N 2 .
(A2)

The sound pressure p can be derived from the sound pressure level L as

p = p 0 10 L / 20 ,
(A3)

where p0 is the reference sound pressure. When inserting this relationship in Eq. (A2) we obtain

( p 0 10 L S / 20 ) 2 = ( p 0 10 L M / 20 ) 2 ( p 0 10 L N / 20 ) 2
(A4)

or

10 L S / 10 = 10 L M / 10 10 L N / 10 ,
(A5)

where symbols LM, LS, LN denote the measured sound level, source sound level and the noise sound level, respectively. The source sound level LS is the (voice) “true level” without the influence of the background noise, i.e., the error-free quantity which we are interested in. Only the measured sound level LM (constituted by the voice signal and the background noise) and the noise sound level LN are known, provided that the background noise is actually measured. The source sound level LS can be derived from Eq. (A5) as

L s = 10 log 10 ( 10 L M / 10 10 L N / 10 ) .
(A6)

Defining the SNR as

SNR = L M L N
(A7)

and using the identity x a x b = x a [ 1 x ( b a ) ] Eq. (A6) can be further simplified and expressed as the function of the SNR

L S = 10 log 10 [ 10 L M / 10 ( 1 10 L N L M / 10 ) ] = L M + 10 log 10 ( 1 10 - SNR / 10 ) .
(A8)

When calculating the correction value (ΔL) for converting the measured sound level (LM) to true source sound level (LS), the following formula applies:

L S = L M Δ L ,
(A9)

where

Δ L = 10 log 10 ( 1 10 - SNR / 10 ) .
(A10)

A graph derived from Eq. (A10) is shown in Fig. 6. The graph shows that when SNR is 10 dB, i.e., the background noise level is 10 dB below the total measured sound level (source and background noise together), the respective sound level reading should be reduced by approximately 0.5 dB in order to eliminate the influence of the background noise and to obtain the true source sound level. At 15 dB SNR the correction of only 0.14 dB is required. At 7 dB SNR the correction factor is ca. 1 dB and it rapidly rises above 2 dB when the SNR is below 4 dB and the measurements thus become highly inaccurate.

1

See supplemental material http://biofyzika.upol.cz/en/vyzkum-publikace-detail?pi=312 for Fig. S1 showing the frequency response curves for the A-weighting, C-weighting, Z-weighting, and hp-filtering applied here, and for Figs. S2 and S3 graphically expressing the results from Tables II, III, and IV.

1.
Aichinger
,
P.
,
Feichter
,
F.
,
Aichstill
,
B.
,
Bigenzahn
,
W.
, and
Schneider-Stickler
,
B.
(
2012
). “
Inter-device reliability of DSI measurement
,”
Logoped. Phoniatr. Vocol.
37
,
167
173
.
2.
American National Standards Institute (
1985
). ANSI S1.4-1983. “
American National Standard: Specification for sound level meters
” ( Acoustical Society of America, Melville, NY), pp.
1
18
.
3.
Awan
,
S. N.
(
1991
). “
Phonetographic profiles and F0-SPL characteristics of untrained versus trained vocal groups
,”
J. Voice
5
,
41
50
.
5.
Behrman
,
A.
,
Agresti
,
C. J.
,
Blumstein
,
E.
, and
Sharma
,
G.
(
1996
). “
Meaningful features of voice range profiles from patients with organic vocal fold pathology: A preliminary study
,”
J. Voice
10
,
269
283
.
6.
Boersma
,
P.
, and
Weenink
,
D.
(
2013
). “
Praat: Doing phonetics by computer
” (
Institute of Phonetic Sciences, University of Amsterdam
,
Amsterdam, The Netherlands
), http://www.fon.hum.uva.nl/praat/ (Last viewed March 17, 2014).
7.
Brüel & Kjaer
(
1984
).
Measuring Sound
(
Brüel & Kjaer
,
Naerum, Denmark
), pp.
28
30
.
8.
Coleman
,
R. F.
(
1993
). “
Sources of variation in phonetograms
,”
J. Voice
7
,
1
14
.
9.
Couvreur
,
C.
(
1997
). “
Octave
,” in
Matlab Central
(
The MathWorks, Inc.
). http://www.mathworks.com/matlabcentral/fileexchange/69-octave (Last viewed March 17, 2014).
10.
Damsté
,
P. H.
(
1970
). “
The phonetogram
,”
Pract. Otorhinolaryngol. (Basel)
32
,
185
187
.
11.
Deliyski
,
D. D.
,
Shaw
,
H. S.
,
Evans
,
M. K.
, and
Vesselinov
,
R.
(
2006
). “
Regression tree approach to studying factors influencing acoustic voice analysis
,”
Folia Phoniatr. Logop.
58
,
274
288
.
12.
Friedrich
,
G.
, and
Dejonckere
,
P. H.
(
2005
). “
The voice evaluation protocol of the European Laryngological Society (ELS) - First results of a multicenter study
,”
Laryngo-Rhino-Otol.
84
,
744
752
(in German).
13.
Gramming
,
P.
(
1991
). “
Vocal loudness and frequency capabilities of the voice
,”
J. Voice
5
,
144
157
.
14.
Gramming
,
P.
, and
Åkerlund
,
L.
(
1988
). “
Non-organic dysphonia. Phonetograms for normal and pathological voices
,”
Acta Otolaryngol. (Stockh.)
106
,
468
476
.
15.
Gramming
,
P.
, and
Sundberg
,
J.
(
1988
). “
Spectrum factors relevant to phonetogram measurement
,”
J. Acoust. Soc. Am.
83
,
2352
2360
.
16.
Hacki
,
T.
(
1999
). “
Vocal capabilities of nonprofessional singers evaluated by measurement and superimposition of their speaking, shouting and singing voice range profiles
,”
HNO
47
,
809
815
(in German).
17.
Hakkesteegt
,
M. M.
,
Brocaar
,
M. P.
,
Wieringa
,
M. H.
, and
Feenstra
,
L.
(
2006
). “
Influence of age and gender on the dysphonia severity index. A study of normative values
,”
Folia Phoniatr. Logop.
58
,
264
273
.
18.
Hallin
,
A. E.
,
Frost
,
K.
,
Holmberg
,
E. B.
, and
Södersten
,
M.
(
2012
). “
Voice and speech range profiles and Voice Handicap Index for males–methodological issues and data
,”
Logoped. Phoniatr. Vocol.
37
,
47
61
.
21.
Heylen
,
L.
,
Wuyts
,
F. L.
,
Mertens
,
F.
,
De Bodt
,
M.
, and
Van de Heyning
,
P. H.
(
2002
). “
Normative voice range profiles of male and female professional voice users
,”
J. Voice
16
,
1
7
.
22.
Howard
,
D. M.
, and
Angus
,
J. A. S.
(
2009
).
Acoustics and Psychoacoustics
(
Oxford University Press
,
Oxford, UK
), Chap. 2.
23.
IEC 61672-1
(
2002
). “
Sound level meters—Part 1: Specification
,” in
Electroacoustics
(
International Electrotechnical Commission
,
Geneva, Switzerland)
, Chaps. 1−5.
24.
ISO 226 (
2003
). “
Acoustics—Normal equal-loudness-level contours
” (
International Organization for Standardization
,
Geneva, Switzerland
), pp.
1
18
.
25.
Jacobson
,
B.
,
Johnson
,
A.
,
Grywalski
,
C.
,
Silbergleit
,
A. K.
,
Jacobson
,
G. P.
, and
Benninger
,
M. S.
(
1997
). “
The voice handicap index (VHI): Development and validation
,”
J. Speech-Lang. Path.
6
,
66
70
.
26.
Leino
,
T.
,
Laukkanen
,
A. M.
,
Ilomäki
,
I.
, and
Mäki
,
E.
(
2008
). “
Assessment of vocal capacity of Finnish university students
,”
Folia Phoniatr. Logop.
60
,
199
209
.
27.
Lucero
,
J. C.
(
1999
). “
A theoretical study of the hysteresis phenomenon at vocal fold oscillation onset−offset
,”
J. Acoust. Soc. Am.
105
,
423
431
.
28.
Ma
,
E.
,
Robertson
,
J.
,
Radford
,
C.
,
Vagne
,
S.
,
El-Halabi
,
R.
, and
Yiu
,
E.
(
2007
). “
Reliability of speaking and maximum voice range measures in screening for dysphonia
,”
J. Voice
21
,
397
406
.
29.
Pabon
,
P.
,
Ternström
,
S.
, and
Lamarche
,
A.
(
2011
). “
Fourier descriptor analysis and unification of voice range profile contours: Method and applications
,”
J. Speech Lang. Hear. Res.
54
,
755
776
.
30.
Pedersen
,
M. F.
(
1997
). “
Biological development and the normal voice in puberty
,” Ph.D. dissertation,
University of Oulu
,
Oulu, Finland
, Appendix 1.
31.
Perry
,
C.
,
Ingrisano
,
D. R.
,
Palmer
,
M. A.
, and
McDonald
,
E. J.
(
2000
). “
Effects of environmental noise on computer-derived voice estimates from female speakers
,”
J. Voice
14
,
146
153
.
32.
Rosen
,
C. A.
,
Lee
,
A. S.
,
Osborne
,
J.
,
Zullo
,
T.
, and
Murry
,
T.
(
2004
). “
Development and validation of the voice handicap index-10
,”
Laryngoscope
114
,
1549
1556
.
33.
Sanchez
,
K.
,
Oates
,
J.
,
Dacakis
,
G.
, and
Holmberg
,
E. B.
(
2013
). “
Speech and voice range profiles of adults with untrained normal voices: Methodological implications
,”
Logoped. Phoniatr. Vocol.
39
,
62
71
.
34.
Schneider
,
B.
, and
Bigenzahn
,
W.
(
2003
). “
Influence of glottal closure configuration on vocal efficacy in young normal-speaking women
,”
J. Voice
17
,
468
480
.
35.
Schneider
,
B.
, and
Bigenzahn
,
W.
(
2005
). “
Vocal risk factors for occupational voice disorders in female teaching students
,”
Eur. Arch. Otorhinolaryngol.
262
,
272
276
.
36.
Schultz-Coulon
,
H. J.
, and
Asche
,
S.
(
1988
). “
Das “Normstimmfeld”- ein Vorschlag
” (“The ‘standard tuning bin’–a proposal”),
Sprache - Stimme - Gehör
12
,
5
8
.
37.
Schutte
,
H. K.
, and
Seidner
,
W.
(
1983
). “
Recommendation by the Union of European Phoniatricians (UEP): Standardizing voice area measurement/phonetography
,”
Folia Phoniatr.
35
,
286
288
.
38.
Sihvo
,
M.
, and
Sala
,
E.
(
1996
). “
Sound level variation findings for pianissimo and fortissimo phonations in repeated measurements
,”
J. Voice
10
,
262
268
.
39.
Šiupšinskiene
,
N.
(
2003
). “
Quantitative analysis of professionally trained versus untrained voices
,”
Medicina (Kaunas)
39
,
36
46
.
40.
Speyer
,
R.
,
Wieneke
,
G. H.
,
van Wijck-Warnaar
,
I.
, and
Dejonckere
,
P. H.
(
2003
) “
Effects of voice therapy on the voice range profiles of dysphonic patients
,”
J. Voice
17
,
544
556
.
41.
Sulter
,
A. M.
,
Schutte
,
H. K.
, and
Miller
,
D. G.
(
1995
). “
Differences in phonetogram features between male and female subjects with and without vocal training
,”
J. Voice
9
,
363
377
.
42.
Švec
,
J. G.
, and
Granqvist
,
S.
(
2010
). “
Guidelines for selecting microphones for human voice production research
,”
Am. J. Speech Lang. Pathol.
19
,
356
368
.
43.
Švec
,
J. G.
,
Lejska
,
M.
,
Frostová
,
J.
,
Zábrodský
,
M.
,
Dršata
,
J.
, and
Král
,
P.
(
2009
). “
Czech version of the Voice Handicap Index questionnaire for quantitative evaluation of voice problems perceived by patients
,”
Otorinolaryng. Foniat.
58
,
132
139
(in Czech).
44.
Švec
,
J. G.
,
Popolo
,
P. S.
, and
Titze
,
I. R.
(
2003
). “
Measurement of vocal doses in speech: Experimental procedure and signal processing
,”
Logoped. Phoniatr. Vocol.
28
,
181
192
.
45.
Timmermans
,
B.
,
De Bodt
,
M. S.
,
Wuyts
,
F. L.
,
Boudewijns
,
A.
,
Clement
,
G.
,
Peeters
,
A.
, and
Van de Heyning
,
P. H.
(
2002
). “
Poor voice quality in future elite vocal performers and professional voice users
,”
J. Voice
16
,
372
382
.
46.
Titze
,
I. R.
(
1995
).
Workshop on acoustic voice analysis. Summary statement
(
National Center for Voice and Speech
,
Denver, CO
), pp.
1
36
.
47.
Wuyts
,
F. L.
,
De Bodt
,
M. S.
,
Molenberghs
,
G.
,
Remacle
,
M.
,
Heylen
,
L.
,
Millet
,
B.
,
Van Lierde
,
K.
,
Raes
,
J.
, and
Van de Heyning
,
P. H.
(
2000
). “
The dysphonia severity index: An objective measure of vocal quality based on a multiparameter approach
,”
J. Speech Lang. Hear. Res.
43
,
796
809
.