Perceptual sensitivity to temporal modification in two consecutive speech segments was measured in word contexts to explore the following two questions: (1) whether there is an interaction between multiple segmental durations, and (2) what aspect of the stimulus context determines the perceptually salient temporal markers? Experiment 1 obtained acceptability ratings for words with temporal modifications. The results showed that the compensatory change in duration of a vowel (V) and its adjacent consonant (C) is not perceptually so salient as expected for the simultaneous modifications in the two segments. This finding suggests the presence of a time perception range wider than a single segment (V or C). The results of experiment 1 also showed that rating scores for compensatory modification between V and C do not depend on the temporal order of modified pairs (VC or CV), but rather on the loudness difference between V and C; the acceptability decreased when the loudness difference between V and C became high. This suggests that perceptually salient markers locate around major jumps in loudness. The second finding, the dependence on the loudness jump, was replicated in experiment 2, which utilized a detection task for temporal modifications on nonspeech stimuli modeling the time-loudness features of the speech stimuli. Experiment 3 further investigated the influence of the temporal order of V and C by utilizing the detection task for the speech stimuli instead of the acceptability ratings.

1.
Allen, G. D. (1972a). “The location of rhythmic stress beats in English: An experimental study I,” Lang. Speech 15, 72–100.
2.
Allen, G. D. (1972b). “The location of rhythmic stress beats in English: An experimental study II,” Lang. Speech 15, 179–195.
3.
Allen, J., Hunnicutt, M. S., and Klatt, D. H. (1987). From Text to Speech: The MITalk System (Cambridge U.P., Cambridge, UK).
4.
Barbosa
,
P.
, and
Bailly
,
G.
(
1994
). “
Characterisation of rhythmic patterns for text-to-speech synthesis
,”
Speech Commun.
15
,
127
137
.
5.
Campbell, W. N. (1992). “Multi-level timing in speech,” doctoral dissertation, University of Sussex, Brighton, UK.
6.
Campbell, W. N., and Sagisaka, Y. (1991). “Moraic and syllable-level effects on speech timing,” Acoustical Society of Japan, Trans. Tech. Committee Speech SP90-107, 35–40.
7.
Carlson, R., and Granström, B. (1975). “Perception of segmental duration,” in Structure and Process in Speech Perception, edited by A. Cohen and S. G. Nooteboom (Springer-Verlag, Berlin), pp. 90–106.
8.
Carlson, R., Granström, B., and Klatt, D. H. (1979). “Some notes on the perception of temporal patterns in speech,” in Frontiers of Speech Communication Research, edited by B. Lindblom and S. Öhman (Academic, London), pp. 233–243.
9.
Fant, G., and Kruckenberg, A. (1989). “Preliminaries to the study of Swedish prose reading and reading style,” Royal Institute of Technology, Speech Transmission Lab. Q. Prog. Status Report 2/1989, 1–83.
10.
Florentine
,
M.
(
1983
). “
Intensity discrimination as a function of level and frequency and its relation to high-frequency hearing
,”
J. Acoust. Soc. Am.
74
,
1375
1379
.
11.
Fowler
,
C. A.
(
1979
). “
Perceptual centers’ in speech production and perception
,”
Percept. Psychophys.
25
,
375
388
.
12.
Fowler
,
C. A.
(
1983
). “
Converging sources of evidence on spoken and perceived rhythms of speech:  Cyclic production of vowels in monosyllabic stress feet
,”
J. Exp. Psychol. Gen.
112
,
386
412
.
13.
Fry
,
D. B.
(
1955
). “
Duration and intensity as physical correlates of linguistic stress
,”
J. Acoust. Soc. Am.
27
,
765
768
.
14.
Fujisaki
,
H.
, and
Higuchi
,
N.
(
1980
). “
Temporal organization of segmental features in Japanese disyllable
,”
J. Acoust. Soc. Jpn. (E)
1
,
25
30
.
15.
Fujisaki, H., Nakamura, K., and Imoto, T. (1975). “Auditory perception of duration of speech and non-speech stimuli,” in Auditory Analysis and Perception of Speech, edited by G. Fant and M. A. A. Tatham (Academic, London), pp. 197–219.
16.
Green, D. M. (1993). “Auditory intensity discrimination,” in Human Psychophysics, edited by W. A. Yost, A. N. Popper, and R. R. Fay (Springer-Verlag, New York), pp. 13–55.
17.
Green, D. M., and Swets, J. A. (1966). Signal Detection Theory and Psychophysics (Wiley, New York).
18.
Higuchi
,
N.
,
Shimizu
,
T.
,
Kawai
,
H.
, and
Yamamoto
,
S.
(
1993
). “
Control of phoneme duration based on the movement of speech organs
,”
J. Acoust. Soc. Jpn. (E)
14
,
281
283
.
19.
Hiki
,
S.
, (
1967
). “
Effects of the context on the duration of phoneme segment
,”
J. Acoust. Soc. Jpn.
23
,
317
318
.
20.
Hoshino, M., and Fujisaki, H. (1983). “A study on perception of changes in segmental durations,” Acoustical Society of Japan, Trans. Tech. Committee Speech H83-8/S82-75, 593–599 (in Japanese with English abstract and English figure captions).
21.
Huggins
,
A. W. F.
(
1972a
). “
Just noticeable differences for segment duration in natural speech
,”
J. Acoust. Soc. Am.
51
,
1270
1278
.
22.
Huggins
,
A. W. F.
(
1972b
). “
On the perception of temporal phenomena in speech
,”
J. Acoust. Soc. Am.
51
,
1279
1290
.
23.
Imai, S., and Kitamura, T. (1978). “Speech analysis synthesis system using the log magnitude approximation filter,” Trans. Inst. Electron. Commun. Eng. Jpn. J61-A, 527–534 (in Japanese with English figure captions).
24.
ISO (1975). “Acoustics—Method for calculating loudness level,” ISO 532-1975 (E) (International Organization for Standardization, Geneva).
25.
Jesteadt
,
W.
,
Wier
,
C. C.
, and
Green
,
D. M.
(
1977
). “
Intensity discrimination as a function of frequency and sensation level
,”
J. Acoust. Soc. Am.
61
,
169
177
.
26.
Kaiki, N., and Sagisaka, Y. (1992). “The control of segmental duration in speech synthesis using statistical methods,” in Speech Perception, Production and Linguistic Structure, edited by Y. Tohkura, E. Vatikiotis-Bateson, and Y. Sagisaka (IOS, Amsterdam), pp. 391–402.
27.
Kaiki, N., Takeda, K., and Sagisaka, Y. (1992). “Linguistic properties in the control of segmental duration for speech synthesis,” in Talking Machines: Theories, Models, and Designs, edited by G. Bailly, C. Benoı̂t, and T. R. Sawallis (Elsevier, Amsterdam), pp. 255–263.
28.
Kato
,
H.
, and
Tsuzaki
,
M.
(
1994
). “
Intensity effect on discrimination of auditory duration flanked by preceding and succeeding tones
,”
J. Acoust. Soc. Jpn. (E)
15
,
349
351
.
29.
Kato, H., Tsuzaki, M., and Sagisaka, Y. (1992). “Acceptability and discrimination threshold for distortion of segmental duration in Japanese words,” in Proceedings of International Conference on Spoken Language Processing (University of Alberta, Edmonton, AB), pp. 507–510.
30.
Kato, H., Tsuzaki, M., and Sagisaka, Y. (1993). “Acceptability for durational modification of segments in words,” Acoustical Society of Japan, Trans. Tech. Committee Speech SP92-145, 65–72 (in Japanese with English abstract and English figure captions).
31.
Kato, M., and Hashimoto, S. (1992). “Rhythm rules in Japanese based on the center of energy gravity of vowels,” in Proceedings of International Conference on Spoken Language Processing (University of Alberta, Edmonton, AB), pp. 1139–1142.
32.
Klatt
,
D. H.
(
1976
). “
Linguistic uses of segmental duration in English: Acoustic and perceptual evidence
,”
J. Acoust. Soc. Am.
59
,
1208
1221
.
33.
Klatt, D. H. (1979). “Synthesis by rule of segmental durations in English sentences,” in Frontiers of Speech Communication Research, edited by B. Lindblom and S. Öhman (Academic, London), pp. 287–299.
34.
Klatt, D. H., and Cooper, W. E. (1975). “Perception of segment duration in sentence contexts,” in Structure and Process in Speech Perception, edited by A. Cohen and S. G. Nooteboom (Springer-Verlag, Berlin), pp. 69–89.
35.
Kurematsu
,
A.
,
Takeda
,
K.
,
Sagisaka
,
Y.
,
Katagiri
,
S.
,
Kuwabara
,
H.
, and
Shikano
,
K.
(
1990
). “
ATR Japanese speech database as a tool of speech recognition and synthesis
,”
Speech Commun.
9
,
357
363
.
36.
Lehiste, I. (1970). Suprasegmentals (MIT, Cambridge).
37.
Lehiste
,
I.
, and
Peterson
,
G. E.
(
1959
). “
Vowel amplitude and phonemic stress in American English
,”
J. Acoust. Soc. Am.
31
,
428
435
.
38.
Lieberman
,
P.
(
1960
). “
Some acoustic correlates of word stress in American English
,”
J. Acoust. Soc. Am.
32
,
451
454
.
39.
Morton
,
J.
,
Marcus
,
S.
, and
Frankish
,
C.
(
1976
). “
Perceptual centers
,”
Psychol. Rev.
83
,
405
408
.
40.
Rabinowitz
,
W. M.
,
Lim
,
J. S.
,
Braida
,
L. D.
, and
Durlach
,
N. I.
(
1976
). “
Intensity perception. VI Summary of recent data on deviations from Weber’s law for 1000-Hz tone pulses
,”
J. Acoust. Soc. Am.
59
,
1506
1509
.
41.
Rapp, K. (1971). “A study of syllable timing,” Royal Institute of Technology Speech Transmission Lab. Q. Prog. Status Report 1/1971, 14–19.
42.
Riesz
,
R. R.
(
1928
). “
Differential intensity sensitivity of the ear for pure tones
,”
Phys. Rev.
31
,
867
875
.
43.
Sagisaka, Y., and Tohkura, Y. (1984). “Phoneme duration control for speech synthesis by rule,” Trans. Inst. Electron. Commun. Eng. Jpn. J67-A, 629–636 (in Japanese with English figure captions).
44.
SAS Institute, Inc. (1990). SAS/STAT User’s Guide, Version 6 (SAS, Cary, NC), 4th ed., Vol. 2.
45.
Sato, H. (1977). “Segmental duration and timing location in speech,” Acoustical Society of Japan, Trans. Tech. Committee Speech S77-31, 1–8 (in Japanese with English abstract).
46.
Schroder
,
A. C.
,
Viemeister
,
N. F.
, and
Nelson
,
D. A.
(
1994
). “
Intensity discrimination in normal-hearing and hearing-impaired listeners
,”
J. Acoust. Soc. Am.
96
,
2683
2693
.
47.
Schulze
,
H.-H.
(
1978
). “
The detectability of local and global displacements in regular rhythmic patterns
,”
Psychol. Res.
40
,
173
181
.
48.
Scott, S. K. (1993). “P-centres in speech—An acoustic analysis,” doctoral dissertation, University College London, London, UK.
49.
Takeda
,
K.
,
Sagisaka
,
Y.
, and
Kuwabara
,
H.
(
1989
). “
On sentence-level factors governing segmental duration in Japanese
,”
J. Acoust. Soc. Am.
86
,
2081
2087
.
50.
Tanaka
,
M.
,
Tsuzaki
,
M.
and
Kato
,
H.
(
1994
). “
Discrimination of empty duration in the click sequence simulating a mora structure
,”
J. Acoust. Soc. Jpn. (E)
15
,
191
192
.
51.
Torgerson, W. S. (1958). Theory and Methods of Scaling (Wiley, New York).
52.
Tuller
,
B.
, and
Fowler
,
C. A.
(
1980
). “
Some articulatory correlates of perceptual isochrony
,”
Percept. Psychophys.
27
,
277
283
.
53.
van Santen, J. P. H. (1992). “Contextual effects on vowel duration,” 11, 513–546.
54.
van Santen
,
J. P. H.
(
1994
). “
Assignment of segmental duration in text-to-speech synthesis
,”
Comput. Speech Lang.
8
,
95
128
.
55.
Zwicker
,
E.
,
Fastl
,
H.
,
Widmann
,
U.
,
Kurakata
,
K.
,
Kuwano
,
S.
, and
Namba
,
S.
(
1991
). “
Program for calculating loudness according to DIN 45631 (ISO 532B)
,”
J. Acoust. Soc. Jpn. (E)
12
,
39
42
.
This content is only available via PDF.
You do not currently have access to this content.