This work estimates the uncertainty contributions of speech level parameters measured with a contact-sensor-based device and a headworn microphone. Four contributions are considered: (1) instrumental uncertainty, related to device calibration; (2) method repeatability and (3) reproducibility, estimated through repeated measurements without and with device repositioning, respectively; (4) source reproducibility, due to the variability of human speech. To ascertain changes in speech production, differences between measures should be at least higher than the expanded uncertainty. In the case of device repositioning, the expanded uncertainty combines contributions (1), (3), and (4). When the device is not repositioned, it combines contributions (2) and (4).

Speech sound pressure level (SPL) is measured for many purposes, such as clinic recordings, vocal effort in occupational voice users, and speech intelligibility. Furthermore, researchers are often interested in speech modifications due to different room acoustics or noise conditions (Astolfi et al., 2015), where speech level difference is mainly investigated.

Given the high variability of speech and the different measurement procedures and devices used for its monitoring, both researchers and practitioners need to know the uncertainty to be associated with speech SPL parameters. This uncertainty can be related to absolute measures, e.g., to mean speech SPL in the case of clinic evaluation of a patient's vocal health, or to differences between parameters, e.g., to the difference in mean speech SPL before and after speech therapy or in two acoustically different environments. Since speech SPL uncertainty is also related to the employed measurement instruments, its assessment for the most used devices is desirable.

In most of the studies that concern speech level measurements, voice is detected using microphones in air typically placed at distances from 15 to 30 cm, horizontally in front of the speaker's mouth, or using headworn microphones placed off axis, at a mouth to microphone distance in the range of 5 to 10 cm (Sramkova et al., 2015). The headworn microphones, which are placed at a shorter distance from the mouth, are preferable to microphones in air mounted in front of the speaker, since they ensure a higher signal-to-noise ratio and do not impair the speaker activities. Another approach, investigated in recent studies (Švec et al., 2005; Carullo et al., 2015b), is based on the estimation of speech SPL from the skin vibration detected by a contact sensor fixed at the jugular notch of the speaker's neck. Devices that embed contact sensors have been mainly developed for assessing occupational voice use (Calosso et al., 2017), since they have a negligible sensitivity to background noise (Carullo et al., 2015a).

In the existing literature, the uncertainty related to speech SPL has been investigated for contact-sensor-based devices more than microphones in air. Some studies have concerned the estimation of uncertainty detected with contact-sensor-based devices associated with a single speech frame, i.e., instantaneous speech SPL (SPLi), such as Hillman et al. (2006) and Carullo et al. (2015b). Other studies have investigated the accuracy of speech SPL parameters estimated by these devices. In particular, Searl and Dietsch (2014) found that the difference between mean SPL measured with a contact-sensor-based dosimeter and a reference microphone was in the range of 1.5 to 2.4 dB, during a text reading. An in depth study has been carried out by Švec et al. (2005), who estimated the accuracy of speech SPL parameters obtained by an accelerometer-based device, taking into account contributions related to the calibration function and to speech variability. Mean and equivalent speech SPL have been estimated with an accuracy better than 4.3 and 2.5 dB, respectively, in 95% of the cases over “normal” to “loud” speech (ANSI, 1997).

Even though the abovementioned studies allowed the contact-sensor-based devices to be characterized from a metrological point of view, the single uncertainty contributions of speech SPL parameters were not properly separated. This is due to the involvement of human subjects who wore a contact-device during the experiments for uncertainty evaluation, thus including the variability of speech reproduction in the estimation of the other uncertainty contributions.

This paper introduces new methods and tools for the estimation of the uncertainty contributions related to speech SPL parameters measured with a contact-sensor-based device and with a headworn microphone. These contributions are opportunely combined in order to provide the expanded uncertainty for absolute measures and differences of equivalent, mean, and mode speech sound pressure levels for the two devices.

Voice Care (PR.O.VOICE, Turin, Italy) was used in this work as a contact-sensor-based device. It consists of a data logger equipped with an encapsulated Electret Condenser Microphone (ECM AE38 [Alan Electronics GmbH (Dreieich, Germany)]), which is fixed at the jugular notch. The ECM acquires voltage levels that are generated by changes in acoustic pressure at the surface of the neck due to vocal-fold activity and it exhibits a low sensitivity to background noise. Possible artefacts due to body movements are made negligible by means of a high-pass digital filter (Carullo et al., 2015b). A proper root mean square (rms) voltage threshold distinguishes voiced and unvoiced frames, which are subdivided into non-overlapped intervals of 30 ms in order to effectively detect voiced and unvoiced portions of speech up to the phonemic segmental level (Titze et al., 2007). The device provides the voiced sound pressure levels, SPLi, at a fixed distance from the speaker's mouth after calibration vs a reference microphone, which consists in estimating the best-fit regression function between the rms values of the signal obtained from the skin vibration and the SPLi measured by the reference microphone.

Mipro MU-55HN (Chiayi, Taiwan) was used in this study as the omni-directional headworn microphone. It is usually placed at about 2.5 cm from the lips' edge of a talker, slightly to the side of the mouth. The microphone exhibits a flatness of ±3 dB in the range from 40 Hz to 20 kHz. It is connected to a bodypack transmitter ACT-30T that transmits to a wireless microphone system Mipro ACT 311. Wav files were stored on a handy recorder ZOOM H1 (Zoom Corp., Tokyo, Japan) in 16 bits/44.1 kHz format and later processed with ad hocmatlab® script. A logging interval of 1 s was used in the post processing for the estimation of the SPLi values and no threshold has been applied to divide voiced and unvoiced frames. The 1 s interval was chosen since it is common in measurements with such devices. The absence of SPL compression effect was verified, setting the transmitter without the automatic gain control.

Speech monitorings are usually characterized in terms of speech SPL parameters, SPLpar, namely, equivalent, SPLeq, mean, SPLm, and mode, SPLmode. The uncertainty of SPLpar values is obtained by combining the different uncertainty contributions that affect the measurement procedure according to the ISO/IEC Guide 98-3 (2008). In this study, four main contributions have been considered for both Voice Care and Mipro MU-55HN:

  • Instrumental uncertainty, due to contributions related to the device calibration;

  • method repeatability, estimated through repeated measurements without the repositioning of the device;

  • method reproducibility, estimated through repeated measurements with the repositioning of the device;

  • source reproducibility, due to the variability of the human speech.

In order to estimate the expanded uncertainty, a first step consists in estimating the standard uncertainty of speech SPLpar, u(SPLpar). In the case of absolute measures, it is obtained from the combination of instrumental uncertainty, u(SPLi,inst), method reproducibility, u(SPLpar,repr), and source reproducibility, u(SPLpar,reps), as follows:

u(SPLpar)=u2(SPLi,inst)+u2(SPLpar,repr)+u2(SPLpar,reps).
(1)

In the case of differences between measures, only the contributions related to method repeatability, u(SPLpar,repe), and source reproducibility, u(SPLpar,reps), are taken into account,

u(SPLpar)=u2(SPLpar,repe)+u2(SPLpar,reps).
(2)

In this case the instrumental uncertainty contribution is assumed to be negligible, provided that the measurements of the two quantities are performed with the same instrument in a short time interval and in similar conditions for the influence quantities.

The expanded uncertainty of speech sound pressure level parameters, U(SPLpar), is eventually calculated multiplying the standard uncertainty, u(SPLpar), by a coverage factor k equal to 2 that yields a two-standard deviation estimate.

For Voice Care, the SPLi instrumental uncertainty, u(SPLi,inst), was estimated combining, in the same way as in Eq. (1), the standard uncertainty due to the calibration of the Behringer ECM8000 reference microphone with the B&K 4230 calibrator, obtained as described by Eq. (6) in Carullo et al. (2015b), with the uncertainty related to the estimation of the calibration function, which is also called “model error.” To quantify the model error, a phonatory system simulator was used as a speech source (Casassa et al., 2017). The simulator was driven by an EGG signal recorded during in vivo acquisition of a vowel /a/ at increasing intensity. The SPL range (76–91) dB @ 13 cm, i.e., (58–73) dB @ 1 m in free field, was considered. This range is from “normal” to “loud” vocal effort according to ANSI S3.5 (1997). Three calibration sessions, including five repetitions each, were performed in a quiet dead room with background noise lower than 35 dB(A) LAeq. For each session the measurement set up was repositioned. In particular, the centre of the contact-sensor was moved ±5 mm around the centre of an area of about 16 cm2 that corresponds to the jugular notch zone. The model error was calculated as the maximum value, over the 15 calibrations, of the rms of the difference between the SPLi,ref values measured by the reference microphone and the SPLi values estimated by the calibration function.

For the headworn-microphone, the SPLi instrumental uncertainty was estimated from the combination of the standard uncertainty due to the calibration of the reference class 1 sound level meter (SLM) with the error between SPLi provided by the headworn-microphone and the SLM (XL2, NTi Audio, Schaan, Liechtenstein). The former was assumed from the calibration certificate of the SLM. In order to estimate the latter, the linearity and the absence of compression effects were previously checked for Mipro MU-55HN, disabling the automatic gain control. Thereafter, in anechoic room both the microphones were positioned in front of a B&K 4128 Head and Torso Simulator (HATS) (B&K, Nærum, Denmark) that emitted ICRA noise samples (Dreschler et al., 2001), 20 s long, equally distributed in the range (85–102) dB @ 2.5 cm, i.e., (53–70) dB @ 1 m in free field. The measure error was estimated over four differences between SPLeq,ICRA provided by the headworn microphone and the SLM. SPLeq was used in place of SPLi since it is the best indicator to be used in the case of stationary signals, as ICRA. An offset value, which is the average of the four SPLeq,ICRA differences, was used as a correction factor for each SPLeq,ICRA provided by the headworn microphone. A uniform probability density function was used to characterize each difference corrected with the offset, thus estimating its standard deviation σ as σ=|ΔSPLeq,ICRA|/3, where ΔSPLeq,ICRA is each corrected difference. Eventually, the measure error was estimated as the maximum σ over the four SPLeq,ICRA differences.

The standard uncertainty of SPLm due to the random contribution of method reproducibility, u(SPLm,repr), and repeatability, u(SPLm,repe), for both the devices is obtained as follows:

u(SPLm,repr)=u(SPLi,repr)i=1Nni,
(3)

where N is the total number of frames in the analyzed speech and ni is equal to 0 for the unvoiced frames and 1 for the voiced frames, in the case of Voice Care, while it is equal to 1 for all the frames in the case of the headworn microphone.

In the case of SPLeq the standard uncertainty due to method reproducibility, u(SPLeq,repr), and method repeatability, u(SPLeq,repe), is obtained as

u(SPLeq,repr)=u(SPLi,repr)N.
(4)

The standard uncertainty for reproducibility and repeatability of SPLmode, u(SPLmode), is equal to the respective SPLi uncertainties, as SPLmode is the most occurring value in SPLi distribution.

For Voice Care u(SPLi,repr) was estimated as the maximum spread among all the 15 calibration functions identified during the three sessions performed with the phonatory system simulator. A peak to peak value, ΔSPLi,peak, was obtained among the 15 functions, thus allowing the corresponding standard uncertainty to be estimated assuming a uniform probability distribution, as u(SPLi,repr)=ΔSPLi,peak/23. The repeatability u(SPLi,repe) was instead estimated applying the above equation to the highest ΔSPLi,peak value obtained among the sessions, which included 5 calibrations each (e.g., dashed black lines shown in Fig. 1).

Fig. 1.

(Color online) Calibration functions performed in three calibration sessions (continuous, dashed black, and continuous grey lines) with Voice Care, including five repetitions each. For each session the measurement set up has been repositioned. SPLi,ref refers to 13 cm from the phonatory system simulator mouth, Vi,ECM is the rms value of the voltage at the output of the ECM-based chain, expressed in decibel relative to 1 mV. Circles, stars, and diamonds represent experimental values related to the acquired frames, which are plotted one every ten for reading clarity.

Fig. 1.

(Color online) Calibration functions performed in three calibration sessions (continuous, dashed black, and continuous grey lines) with Voice Care, including five repetitions each. For each session the measurement set up has been repositioned. SPLi,ref refers to 13 cm from the phonatory system simulator mouth, Vi,ECM is the rms value of the voltage at the output of the ECM-based chain, expressed in decibel relative to 1 mV. Circles, stars, and diamonds represent experimental values related to the acquired frames, which are plotted one every ten for reading clarity.

Close modal

For the headworn microphone, u(SPLi,repr) was estimated in a semi-anechoic room using the HATS as a speech source. The source emitted ICRA noise at a fixed gain, with the headworn microphone at about 2.5 cm from the HATS mouth, slightly to the side. Three sessions, including four repetitions each, were performed, where the measurement set up was repositioned. The uncertainty contribution of reproducibility was evaluated as standard deviation of SPLeq,ICRA over the 12 repetitions. The uncertainty contribution of repeatability was calculated as standard deviation of the 4 repetitions of each session.

Castellana et al. (2017) investigated the source reproducibility by using both Voice Care and the headworn microphone. The intra-speaker variability, which represents the uncertainty contribution related to the source reproducibility of a single speaker, was quantified as the average over 17 subjects of the SPLpar standard deviation of each subject, based on four repeated readings. To quantify the inter-speaker variability, that is the uncertainty contribution of a group of N speakers, researchers may refer to the standard deviation of the individual's standard deviations, i.e., s(g) in Tables II and III in Castellana et al. (2017), divided by N.

Table 1 provides the standard uncertainty contributions and the expanded uncertainty related to absolute measures and differences between measures of SPLeq, SPLm, and SPLmode, for voice monitorings detected by the two devices related to a single speaker.

Table 1.

Instrumental, u(SPLi,inst), method repeatability, u(SPLpar,repe), method reproducibility, u(SPLpar,repr), source reproducibility, u(SPLpar,reprs), uncertainty contributions, and expanded uncertainty, U(SPLpar), for absolute measures and differences between measures of equivalent, mean, and mode sound pressure level (dB), detected by the Voice Care contact-sensor-based device and by the Mipro MU-55HN headworn microphone.

Voice CareMipro MU-55HN
SPLeqSPLmSPLmodeSPLeqSPLmSPLmode
Absolute measures 
u(SPLi,inst1.2 1.2 1.2 0.7 0.7 0.7 
u(SPLpar,repr<0.01 0.01 0.5 <0.01 <0.01 0.1 
u(SPLpar,reprs0.8 0.6 1.5 0.5 0.6 1.1 
U(SPLpar) 2.9 2.7 4.0 1.7 1.8 2.6 
Differences between measures 
u(SPLpar,repe<0.01 <0.01 0.3 <0.01 <0.01 0.01 
u(SPLpar,reprs0.8 0.6 1.5 0.5 0.6 1.1 
U(SPLpar) 1.6 1.2 3.1 1.0 1.2 2.1 
Voice CareMipro MU-55HN
SPLeqSPLmSPLmodeSPLeqSPLmSPLmode
Absolute measures 
u(SPLi,inst1.2 1.2 1.2 0.7 0.7 0.7 
u(SPLpar,repr<0.01 0.01 0.5 <0.01 <0.01 0.1 
u(SPLpar,reprs0.8 0.6 1.5 0.5 0.6 1.1 
U(SPLpar) 2.9 2.7 4.0 1.7 1.8 2.6 
Differences between measures 
u(SPLpar,repe<0.01 <0.01 0.3 <0.01 <0.01 0.01 
u(SPLpar,reprs0.8 0.6 1.5 0.5 0.6 1.1 
U(SPLpar) 1.6 1.2 3.1 1.0 1.2 2.1 

For Voice Care, the uncertainty due to the calibration of the reference microphone in the SPLrange (76–91) dB @ 13 cm, is (0.6–0.3) dB. The model error has been obtained from data in Fig. 1, which shows the calibration functions performed in the three sessions, including five repetitions each. The maximum value of the rms of the difference between the SPLi,ref measured by the reference microphone and the SPLi estimated by the calibration function is equa1 to 1 dB. Eventually, when the maximum values of the related contributions are combined (0.6 dB for the reference microphone and 1 dB for the model error), u(SPLi,inst) results in 1.2 dB.

In the case of Mipro MU-55HN, a standard uncertainty due to the calibration of the reference SLM equal to 0.55 dB was assumed from the calibration certificate and a maximum standard deviation of 0.50 dB in the investigated SPL range (85–102) dB @ 2.5 cm was found considering four differences between SPLeq,ICRA provided by the headworn-microphone and the reference microphone. The combination of the two uncertainties provides u(SPLi,inst) of 0.74 dB.

For Voice Care, u(SPLi,repr) and u(SPLi,repe) result in 0.5 and 0.3 dB, respectively, while u(SPLi,repr) is equal to 0.1 dB and u(SPLi,repe) is negligible for Mipro MU-55HN. These also correspond to u(SPLmode,repr) and u(SPLmode,repe) for the two devices. With a typical speech monitoring of 2 min and a phonation time percentage of 30% (Puglisi et al., 2017), the reproducibility and repeatability uncertainty contributions of SPLmean and SPLeq are negligible for both devices according to Eqs. (3) and (4).

The intra-speaker variability of SPLeq, SPLm, and SPLmode, u(SPLpar,reps), estimated with Voice Care, is equal to 0.8, 0.6, and 1.5 dB, respectively. For the headworn microphone u(SPLpar,reps) is equal to 0.5, 0.6, and 1.1 dB, respectively.

U(SPLpar) is equal to 2.9, 2.7, and 4.0 dB for Voice Care, and to 1.7, 1.8, and 2.6 dB, for Mipro MU-55HN, in the case of absolute measures of SPLeq, SPLm, and SPLmode, respectively. U(SPLpar) is equal to 1.6, 1.2, and 3.1 dB, for Voice Care, and 1.0, 1.2, and 2.1 dB, for Mipro MU-55HN, in the case of differences between the SPLpar, respectively.

The expanded uncertainty of speech SPLeq, SPLm, and SPLmode is higher for the contact-sensor-based device than for the headworn microphone. The instrumental uncertainty is the most influent contribution in the case of absolute measures, while the source reproducibility results the most significant contribution in the case of differences between measures.

As shown in Fig. 9 of Švec et al. (2005), the expanded uncertainty for SPLeq and SPLm obtained from a contact-sensor-based device, was about 3 and 2 dB, respectively, over the same SPL range considered in this study. These values are rather comparable to those obtained in the present study with Voice Care, i.e., 2.9 and 2.7 dB for SPLeq and SPLm, respectively. However, Švec et al. (2005) accounted for the human speech variability in the uncertainty estimation, but they did not include the contribution related to the method reproducibility.

It is important to consider the practical application of the results presented in this study that involves, beyond researchers, also practitioners in the field of vocal health, such as speech therapists, ENT doctors, and phoniatricians, and in the field of applied acoustics, such as acousticians or audio engineers. Whenever they have to compare two absolute speech SPLpar that imply the repositioning of Voice Care or Mipro MU-55HN under changed conditions (e.g, different period of time, acoustics, subject, illness, age, etc.), the SPLpar difference should be at least higher than U(SPLpar) values showed in Table 1 for absolute measures, in order to state that the changed condition significantly affects speech production. On the other hand, when two SPLpar's from the same speaker have to be compared without removing the device, their difference is reliable if it is at least higher than U(SPLpar) values showed in Table 1 for differences between measures.

To conclude, for the most used parameters SPLmean and SPLeq, the headworn microphone provided a lower uncertainty of ≈2 dB than the contact-sensor-based device, which exhibited an uncertainty of ≈3 dB. However, when a microphone in air is not suitable (e.g., high background noise or long-term voice monitoring), the advantage in using a contact microphone is doubtfulness evident despite its higher uncertainty. On the other hand, comparable uncertainties of ≈1 dB have been shown for the two devices when differences in SPLmean between two speeches have to be assessed without device repositioning, making them interchangeable in speech monitoring. When differences in SPLeq are considered in this case, the contact-sensor-based device has shown an uncertainty of ≈1.5 dB, which is higher than ≈1 dB obtained for the headworn microphone.

Future studies will concern the application of the methodology described in this work for the assessment of the uncertainty contributions of similar devices that have become customary in the speech monitoring practice, allowing a more objective comparison between them.

1.
ANSI
(
1997
).
ANSI S3.5: Methods for Calculation of the Speech Intelligibility Index
(
Acoustical Society of America
,
New York
).
2.
Astolfi
,
A.
,
Carullo
,
A.
,
Pavese
,
L.
, and
Puglisi
,
G. E.
(
2015
). “
Duration of voicing and silence periods of continuous speech in different acoustic environments
,”
J. Acoust. Soc. Am.
137
(
2
),
565
579
.
3.
Calosso
,
G.
,
Puglisi
,
G. E.
,
Astolfi
,
A.
,
Castellana
,
A.
,
Carullo
,
A.
, and
Pellerey
,
F.
(
2017
). “
A one-school year longitudinal study of secondary school teachers' voice parameters and the influence of classroom acoustics
,”
J. Acoust. Soc. Am.
142
(
2
),
1055
1066
.
4.
Carullo
,
A.
,
Astolfi
,
A.
,
Castellana
,
A.
,
Puglisi
,
G. E.
,
Casassa
,
F.
, and
Pavese
,
L.
(
2015a
). “
Performance comparison of different contact microphones used for voice monitoring
,” in
Proceedings of the International Congress on Sound and Vibration 22
, July 12–16, Florence, Italy.
5.
Carullo
,
A.
,
Vallan
,
A.
,
Astolfi
,
A.
,
Pavese
,
L.
, and
Puglisi
,
G. E.
(
2015b
). “
Validation of calibration procedures and uncertainty estimation of contact microphone based vocal analyzers
,”
Measurement
74
,
130
142
.
6.
Casassa
,
F.
,
Schiavi
,
A.
, and
Troia
,
A.
(
2017
). “
Development of a test system for voice monitoring contact sensor: Phonatory system simulator
,” in
Proceedings of the International Congress on Sound and Vibration 24
, July 23–27, London, United Kingdom.
7.
Castellana
,
A.
,
Carullo
,
A.
,
Astolfi
,
A.
,
Puglisi
,
G. E.
, and
Fugiglando
,
U.
(
2017
). “
Intra-speaker and inter-speaker variability in speech sound pressure level across repeated readings
,”
J. Acoust. Soc. Am.
141
(
4
),
2353
2363
.
8.
Dreschler
,
W. A.
,
Verschuure
,
H.
,
Ludvigsen
,
C.
, and
Westermann
,
S.
(
2001
). “
ICRA noises: Artificial noise signals with speech-like spectral and temporal properties for hearing instrument assessment
,”
Audiology
40
(
3
),
148
157
.
9.
Hillman
,
R. E.
,
Heaton
,
J. T.
,
Masaki
,
A.
,
Zeitels
,
S. M.
, and
Cheyne
,
H. A.
(
2006
). “
Ambulatory monitoring of disordered voices
,”
Annu. Otol. Rhinol. Laryngol.
115
(
11
),
795
801
.
10.
ISO/IEC
(
2008
).
ISO/IEC Guide 93-3: Uncertainty of Measurement
(
International Organization for Standardization
,
Switzerland
).
11.
Puglisi
,
G. E.
,
Astolfi
,
A.
,
Cantor Cutiva
,
L. C.
, and
Carullo
,
A.
(
2017
). “
Four-day-follow-up study on the voice monitoring of primary school teachers: Relationships with conversational task and classroom acoustics
,”
J. Acoust. Soc. Am.
141
(
1
),
441
452
.
12.
Searl
,
J.
, and
Dietsch
,
A.
(
2014
). “
Testing of the VocaLog2 vocal monitor
,”
J. Voice
28
(
4
),
523.e27
523.e37
.
13.
Sramkova
,
H.
,
Granqvist
,
S.
,
Herbst
,
C. T.
, and
Švec
,
J. G.
(
2015
). “
The softest sound level of the human voice in normal subjects
,”
J. Acoust. Soc. Am.
137
(
1
),
407
418
.
14.
Švec
,
J. G.
,
Titze
,
I. R.
, and
Popolo
,
P. S.
(
2005
). “
Estimation of sound pressure levels of voiced speech from skin vibration of the neck
,”
J. Acoust. Soc. Am.
117
(
3
),
1386
1394
.
15.
Titze
,
I. R.
,
Hunter
,
E. J.
, and
Švec
,
J. G.
(
2007
). “
Voicing and silence periods in daily and weekly vocalizations of teachers
,”
J. Acoust. Soc. Am.
121
(
1
),
469
478
.