Much previous research has demonstrated that listeners do not agree well when using traditional rating scales to measure pathological voice quality. Although these findings may indicate that listeners are inherently unable to agree in their perception of such complex auditory stimuli, another explanation implicates the particular measurement method—rating scale judgments—as the culprit. An alternative method of assessing quality—listener-mediated analysis-synthesis—was devised to assess this possibility. In this new approach, listeners explicitly compare synthetic and natural voice samples, and adjust speech synthesizer parameters to create auditory matches to voice stimuli. This method is designed to replace unstable internal standards for qualities like breathiness and roughness with externally presented stimuli, to overcome major hypothetical sources of disagreement in rating scale judgments. In a preliminary test of the reliability of this method, listeners were asked to adjust the signal-to-noise ratio for 12 synthetic pathological voices so that the resulting stimuli matched the natural target voices as well as possible For comparison to the synthesis judgments, listeners also judged the noisiness of the natural stimuli in a separate task using a traditional visual-analog rating scale. For 9 of the 12 voices, agreement among listeners was significantly (and substantially) greater for the synthesis task than for the rating scale task. Response variances for the two tasks did not differ for the remaining three voices. However, a second experiment showed that the synthesis settings that listeners selected for these three voices were within a difference limen, and therefore observed differences were perceptually insignificant. These results indicate that listeners can in fact agree in their perceptual assessments of voice quality, and that analysis-synthesis can measure perception reliably.

1.
Cranen
,
B.
, and
Schroeter
,
J.
(
1995
). “
Modeling a leaky glottis
,”
J. Phonetics
23
,
165
177
.
2.
de Krom
,
G.
(
1993
). “
A cepstrum-based technique for determining a harmonics-to-noise ratio in speech signals
,”
J. Speech Hear. Res.
36
,
254
266
.
3.
de Krom
,
G.
(
1994
). “
Consistency and reliability of voice quality ratings for different types of speech fragments
,”
J. Speech Hear. Res.
37
,
985
1000
.
4.
Egan, J. P. (1975). Signal Detection Theory and ROC Analysis (Academic, New York).
5.
Epstein, M., Gabelman, B., Antoñanzas-Barroso, N., Gerratt, B., and Kreiman, J. (1999). “Source model adequacy for pathological voice synthesis,” in Proc. ICPhS99, pp. 2049–2052.
6.
Fant, G., Liljencrants, J., and Lin, Q. (1985). “A four-parameter model of glottal flow,” Speech Transmission Lab. Quart. Prog. Status Rep. 4, 1–13.
7.
Fant, G., and Lin, Q. (1988). “Frequency domain interpretation and derivation of glottal flow parameters,” STL-QPSR 2-3, 1–21.
8.
Gelfer
,
M. P.
(
1988
). “
Perceptual attributes of voice: Development and use of rating scales
,”
J. Voice
2
,
320
326
.
9.
Gerratt
,
B. R.
,
Kreiman
,
J.
,
Antoñanzas-Barroso
,
N.
, and
Berke
,
G. S.
(
1993
). “
Comparing internal and external standards in voice quality judgments
,”
J. Speech Hear. Res.
36
,
14
20
.
10.
Gescheider, G. A. (1997). Psychophysics: The Fundamentals, 3rd ed. (Erlbaum, Mahwah, NJ).
11.
Gescheider
,
G. A.
, and
Hughson
,
B. A.
(
1991
). “
Stimulus context and absolute magnitude estimation: A study of individual differences
,”
Percept. Psychophys.
50
,
45
57
.
12.
Hillenbrand
,
J.
,
Cleveland
,
R.
, and
Erickson
,
R.
(
1994
). “
Acoustic correlates of breathy vocal quality
,”
J. Speech Hear. Res.
37
,
769
778
.
13.
Hillenbrand
,
J.
, and
Houde
,
R. A.
(
1996
). “
Acoustic correlates of breathy vocal quality: Dysphonic voices and continuous speech
,”
J. Speech Hear. Res.
39
,
311
321
.
14.
Hirano
,
M.
,
Hibi
,
S.
,
Yoshida
,
T.
,
Hirade
,
Y.
,
Kasuya
,
H.
, and
Kikuchi
,
Y.
(
1988
). “
Acoustic analysis of pathological voice
,”
Acta Oto-Laryngol.
105
,
432
438
.
15.
Javkin
,
H.
,
Antonanzas-Barroso
,
N.
, and
Maddieson
,
I.
(
1987
). “
Digital inverse filtering for linguistic research
,”
J. Speech Hear. Res.
30
,
122
129
.
16.
Jensen
,
P. J.
(
1965
). “
Adequacy of terminology for clinical judgment of voice quality deviation
,”
Eye Ear Nose Throat Mon.
44
,
77
82
.
17.
Kreiman, J., Gabelman, B., and Gerratt, B. R. (2001). “Perceptual and acoustic modeling of vocal tremor,” submitted for publication.
18.
Kreiman
,
J.
, and
Gerratt
,
B. R.
(
1998
). “
Validity of rating scale measures of voice quality
,”
J. Acoust. Soc. Am.
104
,
1598
1608
.
19.
Kreiman
,
J.
, and
Gerratt
,
B. R.
(
2000
). “
Sources of listener disagreement in voice quality assessment
,”
J. Acoust. Soc. Am.
108
,
1867
1879
.
20.
Kreiman
,
J.
,
Gerratt
,
B. R.
, and
Berke
,
G. S.
(
1994
). “
The multidimensional nature of pathologic vocal quality
,”
J. Acoust. Soc. Am.
96
,
1291
1302
.
21.
Kreiman
,
J.
,
Gerratt
,
B. R.
,
Kempster
,
G.
,
Erman
,
A.
, and
Berke
,
G. S.
(
1993
). “
Perceptual evaluation of voice quality: Review, tutorial, and a framework for future research
,”
J. Speech Hear. Res.
36
,
21
40
.
22.
Kreiman
,
J.
,
Gerratt
,
B. R.
, and
Precoda
,
K.
(
1990
). “
Listener experience and perception of voice quality
,”
J. Speech Hear. Res.
33
,
103
115
.
23.
Michaelis
,
D.
,
Frohlich
,
M.
, and
Strube
,
H.
(
1998
). “
Selection and combination of acoustic features for the description of pathologic voices
,”
J. Acoust. Soc. Am.
103
,
1628
1639
.
24.
Orlikoff
,
R. O.
(
1999
). “
The perceived role of voice perception in clinical practice
,”
Phonoscope
2
,
87
106
.
25.
Poulton
,
E. C.
(
1979
). “
Models for biases in judging sensory magnitude
,”
Psychol. Bull.
86
,
777
803
.
26.
Sundberg, J. (1987). The Science of the Singing Voice (Northern Illinois U.P., De Kalb, IL).
27.
Swets, J. A., and Pickett, R. M. (1982). Evaluation of Diagnostic Systems: Methods from Signal Detection Theory (Academic, New York).
28.
Verdonck-de Leeuw, I. M. (1998). “Perceptual analysis of voice quality: Trained and naive raters, and self-ratings,” in Proceedings of Voicedata98 Symposium on Databases in Voice Quality Research and Education, edited by G. de Krom (Utrecht Institute of Linguistics OTS, Utrecht), pp. 12–15.
29.
Wedell
,
D. H.
,
Parducci
,
A.
, and
Lane
,
M.
(
1990
). “
Reducing the dependence of clinical judgment on the immediate context: Effects of number of categories and type of anchor
,”
J. Pers. Soc. Psychol.
58
,
319
329
.
30.
Weismer, G., and Liss, J. (1991). “Reductionism is a dead-end in speech research: Perspectives on a new direction,” in Dysarthria and Apraxia of Speech: Perspectives on Management, edited by K. Yorkston, C. Moore, and D. Beukelman (Brookes, Baltimore), pp. 15–27.
31.
Wuyts
,
F. L.
,
DeBodt
,
M. S.
, and
Van de Heyning
,
P. H.
(
1999
). “
Is the reliability of a visual analog scale higher than an ordinal scale? An experiment with the GRBAS scale for the perceptual evaluation of dysphonia
,”
J. Voice
13
,
508
517
.
32.
Yumoto
,
E.
,
Gould
,
W. J.
, and
Baer
,
T.
(
1982
). “
Harmonics-to-noise ratio as an index of the degree of hoarseness
,”
J. Acoust. Soc. Am.
71
,
1544
1550
.
This content is only available via PDF.
You do not currently have access to this content.