The accent advantage effect in phoneme monitoring—faster responses to a target phoneme at the beginning of an L + H*-accented word than to a target phoneme at the beginning of an unaccented word—is viewed as a product of listeners' predictive capabilities [Cutler (1976). Percept. Psychophys. 20(1), 55–60]. However, previous studies have not established what information listeners use to form these predictions [Cutler (1987). Proceedings of the International Congress of Phonetic Sciences, pp. 84–87; Cutler and Darwin (1981). Percept. Psychophys. 29(3), 217–224]. This article presents evidence that at least the information in the syllable immediately preceding a target phoneme is necessary to cue the predictive attention allocation that underlies the accent advantage effect.

In phoneme monitoring, a target phoneme beginning a word is detected faster when the word bears a contrastive (L + H*) pitch accent than when it is unaccented (Cutler, 1976). This is referred to as the accent advantage effect. For instance, the d in “dirt” is detected faster in (1a) than in (1b). However, Cutler (1976) found a similar advantage when the word “dirt” was replaced via splicing with a neutral version recorded in a sentence like (1c). Since the lexical items in the sentence up to the target word (the “preamble”) were the same in all cases, some acoustic attributes of the preamble of the accented (1a) had to be responsible for the faster phoneme detection time, presumably by causing listeners to allocate attention to the target word's location.

  • Accented: She managed to remove the DIRT from the rug, but not the berry stains.

  • Unaccented: She managed to remove the dirt from the RUG, but not the clothes.

  • Neutral: She managed to remove the dirt from the rug.

Subsequent research has investigated what these acoustic attributes might be, exploring aspects of the entire preamble's prosody, including its rhythm, intensity, and F0 contour. This research has not identified an acoustic factor that is clearly required in order for the accent advantage to be observed (Cutler, 1987; Cutler and Darwin, 1981). However, Cutler and McQueen (2014) found the advantage to persist even in sentences where the preamble was presented with a delexicalized melody, indicating that the relevant attributes are prosodic rather than segmental in nature.

While Cutler (1976) and others have therefore noted that the relevant information must be somewhere in the preamble, it remains unclear just where in the preamble it lies. On the one hand, the accent advantage may result from listeners utilizing more distant or global prosodic information, perhaps the full prosodic contour. On the other hand, it might reflect listeners' use of quite local information, such as that found in the syllable preceding the target-bearing word (and thus the very end of the preamble). The question of whether the accent advantage depends broadly on distal or local cues in the preamble is the one we asked in the experiments presented below.

Stimulus sound files, data, and r analysis scripts for this paper can be found online via Foster and Deardoff (2017), Rysling et al. (2020).

One-hundred six self-reported native English listeners with no hearing or speaking disorders participated in experiment 1, while 83 participated in experiment 2. They were recruited from undergraduate populations at either the University of Massachusetts, Amherst, or the City University of New York, Staten Island, and received either course credit or modest monetary compensation for their time.

Twenty sentence triplets like those in (1a)–(1c), above, were recorded by a female native speaker of American English. Of these 20 triplets, 18 came directly from the materials of Cutler (1976) and two were newly constructed. The preambles of the sentences up to the target word within one triplet were length-normalized across conditions to the mean of the two conditions' natural lengths using the “Lengthen” function in praat (Boersma and Weenink, 2019). Additionally, 44 filler sentences (half containing target phonemes, and half not) were produced by the same speaker as the experimental items, and were also presented to listeners. Stimuli were judged to be natural-sounding by the authors.

Each experimental triplet was created by splicing the target word out of the sentence in which it was realized as either accented or unaccented [(1a) or (1 b)] and replacing it with the version from the neutral condition (1c), again, using praat.

Crucially, the initial splicing boundary of the stimuli was the only difference between experiments 1 and 2. In experiment 1, just as in all previous investigations of the accent advantage effect, the target word was spliced from the beginning of the target phoneme to the end of the target-bearing word. In experiment 2, approximately the first syllable immediately preceding the target phoneme was included in the excised part in all cases and replaced with the corresponding material from the neutral sentence. This extra, pre-target phoneme material was found by starting from the target phoneme and backtracking in the acoustic signal to a natural acoustic boundary. For most items, this meant that a pre-target determiner was spliced out along with the target-bearing word.1

Listeners participated in a phoneme monitoring experiment similar to that reported in Akker and Cutler (2003) and Cutler (1976), conducted in laboratories at the University of Massachusetts, Amherst, and the City University of New York, Staten Island. Participants were tested either one or two at a time. They were given oral and written instructions telling them to listen for the speech sound indicated before each trial, and to press the space bar on a computer keyboard in front of them when they identified the target sound. Following previous practice, stimulus presentation was counterbalanced so that every participant heard only either the accented or unaccented condition of each item, e.g., the accented condition of item 1 and unaccented condition of item 2 if in the first list, or the unaccented condition of item 1 and accented condition of item 2 if in the second list. This resulted in ten observations per participant per condition before trial rejections (detailed below). Sentences were presented binaurally through headphones. Participants manually advanced from one trial to the next and were given the option of resting in the interim. No subject took longer than 25 min to complete the study.

As planned, we followed Akker and Cutler (2003) and Cutler (1976) in rejecting all responses slower than 1500 and faster than 100 ms. This resulted in considerable data loss for some participants. To maintain comparability with earlier studies, we inferred participant rejection criteria from Akker and Cutler's first experiment. They found that no more than 22% of the data for any of their participants exceeded the 1500/100 cutoffs. We chose to reject all participants who exceeded these RT limits (or who failed to detect a target phoneme altogether) on more than 25% (5 of 20) of experimental trials. For experiment 1 this resulted in 88 usable participants and for experiment 2, 77 usable participants.

Summary statistics for all RTs included in our analyses are given in Table 1. The mean RTs for experiment 1 were 567 ms for the accented condition, and 588 for the unaccented condition. The corresponding values for experiment 2 were 560 and 543 ms. Figure 1 presents histograms of individual participant effects for the two experiments, giving an idea of how consistent the effects were across listeners.

Table 1.

Response time summary statistics (ms).

Experiment 1Experiment 2
AccentedUnaccentedAccentedUnaccented
Mean 567 588 560 543 
Standard error of the mean 20 21 22 20 
1st quartile 445 463 440 435 
Median 526 551 521 507 
3rd quartile 636 653 645 612 
Experiment 1Experiment 2
AccentedUnaccentedAccentedUnaccented
Mean 567 588 560 543 
Standard error of the mean 20 21 22 20 
1st quartile 445 463 440 435 
Median 526 551 521 507 
3rd quartile 636 653 645 612 
Fig. 1.

(Color online) By-participant differences across conditions in mean response times (ms), experiments 1 (left) and 2 (right).

Fig. 1.

(Color online) By-participant differences across conditions in mean response times (ms), experiments 1 (left) and 2 (right).

Close modal

The data were statistically analyzed using Bayesian mixed effects linear regression models, using brm (Bürkner, 2017) fitted to lognormal RTs. The default vague (or in the case of the population coefficient, uninformative flat) priors were used. Because these studies were not originally planned as an omnibus study, we first present analyses of them as separate experiments. Fixed (population-level) effects estimates for both studies, separately, as well as a combined analysis using analysis-of-variance-style contrasts for the fixed effects are given in Table 2. Models of the individual studies included random slopes and intercepts by participants and items (as group-level effects), while the combined analysis included both participant and item intercepts, and both accent condition and experiment splicing item slopes, but a participant slope only by accent condition, because different participants listened to each experiment's item splicing. For experiment 1, the 95% credible interval of the effect of accent condition is lower-bounded by zero, consistent with the original report of Cutler (1976) that hearing an unaccented preamble slowed RTs. However, for experiment 2, the 95% credible interval of the effect of the accented condition overlaps zero, lower bounded by a negative estimate and upper bounded by a positive one, such that the effect of hearing an unaccented preamble is not estimated to either speed or slow responses. For the combined analysis, the posterior effects of both accent condition and type of splicing (experiment 1 vs 2) overlapped zero, indicating that they cannot be trusted. However, the effect of the interaction between the two factors did not overlap zero, indicating that the experiments differed in the effect of accent.

Table 2.

Bayesian estimates for separate models fitted to experiments 1, 2, and combined.

EstimateEstimated error2.5%97.5%
Experiment 1 Intercept 6.26 0.04 6.18 6.34 
 Accent condition 0.04 0.02 0.07 
Experiment 2 Intercept 6.32 0.06 6.20 6.44 
 Accent condition −0.03 0.03 −0.09 0.02 
Combined Intercept 6.29 0.03 6.23 6.36 
 Accent condition 0.01 −0.02 0.02 
 Splicing −0.02 0.01 −0.05 0.01 
 Accent condition × Splicing 0.02 0.00 0.01 0.03 
EstimateEstimated error2.5%97.5%
Experiment 1 Intercept 6.26 0.04 6.18 6.34 
 Accent condition 0.04 0.02 0.07 
Experiment 2 Intercept 6.32 0.06 6.20 6.44 
 Accent condition −0.03 0.03 −0.09 0.02 
Combined Intercept 6.29 0.03 6.23 6.36 
 Accent condition 0.01 −0.02 0.02 
 Splicing −0.02 0.01 −0.05 0.01 
 Accent condition × Splicing 0.02 0.00 0.01 0.03 

Comparing results across experiments 1 and 2, we find evidence that the accent advantage effect—faster identification of target phonemes in words for which a contrastive accent is expected—is obtained only when the syllable immediately before the target-bearing word is consistent with the presence of an upcoming contrastive accent. Our results are consistent with the hypothesis that listeners begin forming an expectation about the presence of an upcoming contrastive accent well before the pre-target syllable but require the material in that syllable in order to confirm the prediction of an accent and allocate their attention to its expected locus. It is important to note that our stimuli were based on those used in the study of Cutler (1976) but were not acoustically identical to them. For the present, however, we conclude that the accent advantage reported in that study and others since [e.g., Cutler (1987), Cutler and Darwin (1981), and Cutler and McQueen (2014)] primarily reflects listeners' leveraging of relatively local cues to the upcoming accent—those occurring about one syllable earlier.

1

In 16 cases, this was “the” or “a”; in two cases, it was the word “more” and “roast”; in one case it was the final syllable of “another”; and in one case it was the two word sequence “hire a.” Acoustic analysis of the pre-target syllable, using praat, indicated that the syllable recorded with the accented word was longer than the one preceding the unaccented word, 117 vs 107 ms. p < 0.02 by two-tail t-test, and had a lower mean pitch, 167 vs 177 Hz, p = 0.01. Further, the accented pre-target syllable had a larger standard deviation of intensity than the unaccented one, 3.05 vs 2.58 dB, p < 0.05, and a lower minimum intensity, 70.2 vs 71.7 db, p < 0.01, but did not differ in maximum intensity.

1.
Akker
,
E.
, and
Cutler
,
A.
(
2003
). “
Prosodic cues to semantic structure in native and non-native listening
,”
Biling.: Lang. Cogn.
6
(
2
),
81
96
.
2.
Boersma
,
P.
, and
Weenink
,
D.
(
2019
). “
Praat: Doing phonetics by computer
.” (Last viewed January 7, 2020).
3.
Bürkner
,
P.-C.
(
2017
). “
brms: An R package for Bayesian multilevel models using Stan
,”
J. Stat. Softw.
80
(
1
),
1
28
.
4.
Cutler
,
A.
(
1976
). “
Phoneme monitoring reaction time as a function of preceding intonation contour
,”
Percept. Psychophys.
20
(
1
),
55
60
.
5.
Cutler
,
A.
(
1987
). “
Components of prosodic effects in speech recognition
,” in
Proceedings of the International Congress of Phonetic Sciences
, pp.
84
87
.
6.
Cutler
,
A.
, and
Darwin
,
C. J.
(
1981
). “
Phoneme-monitoring reaction time and preceding prosody: Effects of stop closure duration and of fundamental frequency
,”
Percept. Psychophys.
29
(
3
),
217
224
.
7.
Cutler
,
A.
, and
McQueen
,
J. M.
(
2014
). “
How prosody is both mandatory and optional
,” in
Above and Beyond the Segments: Experimental linguistics and phonetics
, pp.
71
82
.
8.
Foster
,
E. D.
, and
Deardorff
,
A.
(
2017
). “
Open science framework (OSF),
J. Med. Libr. Assoc.
105
(
2
),
203
.
9.
Rysling
,
A.
,
Bishop
,
J.
,
Clifton
,
C.
, Jr.
, and
Yacovone
,
A.
(
2020
). “
Preceding syllables are necessary for the accent advantage effect: R script, data files, Psychopy script
,”
Open Science Framework
, .