The relationship between prosodic structure and segmental realisation is a central question within phonetics. For vowels, this has been typically examined in terms of duration, leaving largely unanswered how prosodic boundaries influence spectral realisation. This study examines the influence of prosodic boundary strength—as well as duration and pauses—on vowel dynamics in spontaneous Japanese. While boundary strength has a marginal effect on dynamics, increased duration and pauses result in greater vowel peripherality and spectral change. These findings highlight the complex relationship between prosodic and segmental structure, and illustrate the importance of multifactorial analysis in corpus research.

A central question in phonetic research has concerned the relationship between prosodic structure and the acoustic realisation of speech segments, and how the organisation of words into hierarchical units modulates the production and perception of speech. Within this enterprise, much focus has considered the articulatory strengthening that occurs at the beginnings of prosodic units (“domain-initial strengthening” (Cho, 2005, 2009; Fougeron and Keating, 1997), as well as on the temporal lengthening of segments preceding a prosodic boundary (“final lengthening”) (Cho, 2015; Turk and Shattuck-Hufnagel, 2007; Wightman , 1992). With respect to the spectral characteristics of vowels, the vowel's position in F1-F2 space and the dynamic change in formant values over the vowel's timecourse have been shown to be expanded in domain-initial position (Georgeton and Fougeron, 2014) and under various forms of prosodic prominence (Fridland , 2014; Jacewicz , 2009; Mo , 2009; Wouters and Macon, 2002), though it remains less clear how the presence of a prosodic boundary following a vowel modulates its spectral characteristics. Given the importance of final lengthening and the presence of pauses to the production and perception of prosodic boundaries (e.g., Ferreira, 1993; Krivokapić, 2007; Petrone , 2017; Steffman and Jun, 2021; Streeter, 1978), the question remains open as to how the presence of the boundary itself influences spectral realisation independent of other factors associated with the edges of prosodic domains.

This study addresses this question by focusing on the influence of prosodic boundary strength and its associated phenomena–increased duration and post-boundary pauses—on the spectral properties of vowels, both as independent and interacting factors, using a corpus of spontaneous Japanese. Japanese has five vowels (/a/, /i/, /u/, /e/, /o/, with long vowel counterparts), and maintains two levels of prosodic structure—the Accentual Phrase (AP) and the Intonational Phrase (IP) (Kubozono, 1993; Pierrehumbert and Beckman, 1988; Venditti, 2005). APs are characterised by an F0 rise on the second mora,1 with downstep over the remainder of the phrase. IPs are delimited via F0 reset, as well as “pitch boundary movements” (PBMs), which perform a range of pragmatic and intentional functions (Venditti , 1998). Vowels have been shown to exhibit dynamic formant changeover their timecourse, where even monophthongs change in their spectral characteristics in production (Harrington and Cassidy, 1994; Hillenbrand , 1995; Hirata and Tsukada, 2004; Yazawa and Kondo, 2019). It has been demonstrated that these dynamic properties capture additional detail in the phonetic realisation of vowels compared with single point measurements (Farrington , 2018; Renwick and Stanley, 2020), which may in turn reflect acoustic variation under different prosodic contexts. The dynamics of vowels have previously been shown to be affected by duration, where longer vowel duration results in greater formant change (Fox and Jacewicz, 2009; Fridland , 2014; Yazawa and Kondo, 2019). With respect to how prosodic phenomena affect the dynamics of vowels, most studies have focused on the presence of prominence (Fridland , 2014; Jacewicz , 2009), leaving unaddressed the relationship between boundaries and formant dynamics (Brandt , 2018).

The data used for this study come from the Corpus of Spontaneous Japanese-Core (CSJ) (Koiso , 2014; Maekawa , 2000), containing ∼45 h of speech (recorded 1999–2001) from 137 speakers (58 female) born 1930–1979. Relevant to the research questions of the study, the CSJ contains extensive intonational and prosodic annotation based on X-JToBI annotation scheme (Kikuchi and Maekawa, 2003; Maekawa , 2002). Prosodic boundaries are annotated in the CSJ by way of “Break Index” (BI) labels, which reflect the prosodic association and disjuncture between words, aligning with the tonal and segmental cues to the boundary (Kikuchi and Maekawa, 2003; Venditti, 2005). This scheme uses 4 BI levels, mapping to separate levels of prosodic disjuncture. Values of BI = 0 represent minimal disjuncture, such as within words; BI = 1 reflects word boundaries within an AP; BI = 2 reflects AP-level boundaries; and BI = 3 reflects the right edges of IPs.2 The presence of pauses are annotated in the CSJ by means of chunking speech into inter-pausal units (IPUs), delimited by silences of 200 ms or greater.

All vowel data were extracted from the relational database version of the CSJ (Koiso , 2014), including the BI label, duration, and the presence of following pause, as well as speaker and word information. Vowels marked as devoiced (59 632 tokens) were excluded due to their unknown durational and spectral properties: it should be noted that the exclusion of devoiced vowels disproportionately affects the high vowels /i, u/, which are uniquely targeted by phonological devoicing processes within words and at prosodic boundaries (Fujimoto, 2015). As to not conflate phonetic vowel duration with phonological vowel length, this study focuses exclusively on phonemically short vowels in Japanese: as such, all multi-vowel sequences (i.e., phonemically long vowels like /e:/, sequences of different vowel qualities such as /ai/, etc.), were excluded from the analysis. Focusing only on short vowels means that “duration”, for the purpose of this analysis, refers specifically to phonetic vowel duration, consistent with previous studies examining the relationship between duration and formant dynamics (e.g., Fridland , 2014). Vowel formants {F1, F2} were extracted using the parselmouth Python package (Jadoul , 2018), using separate maxmimum formant values for male and female speakers (5000 and 5500 Hz, respectively), and values were extracted at 21 equidistant points across each vowels' timecourse. The first and last 20% points were excluded to avoid coarticulatory effects of surrounding segments (Renwick and Stanley, 2020; Williams and Escudero, 2014). Each formant point was Lobanov (Z) normalised within-speaker using all included tokens for the given speaker. In total, 300 257 vowel tokens (11 231 word types) were used in the final analysis (Table 1).

TABLE 1.

Counts of vowel tokens used in the analysis, grouped by upcoming Break Index value and by the presence of an upcoming pause.

/a/ /i/ /u/ /e/ /o/
Break Index  0 (word-internal)  62 763  11 215  8380  25 503  28 514 
  1 (AP-internal)  21 100  10 947  8015  13 036  23 847 
  2 (AP-final)  11 857  5203  2275  4090  11 487 
  3 (IP-final)  18 007  5454  3426  12 541  12 615 
Pause presence  No pause  101128  28 750  19 327  45 08  68 276 
  Pause  12 599  4069  2769  9962  8187 
/a/ /i/ /u/ /e/ /o/
Break Index  0 (word-internal)  62 763  11 215  8380  25 503  28 514 
  1 (AP-internal)  21 100  10 947  8015  13 036  23 847 
  2 (AP-final)  11 857  5203  2275  4090  11 487 
  3 (IP-final)  18 007  5454  3426  12 541  12 615 
Pause presence  No pause  101128  28 750  19 327  45 08  68 276 
  Pause  12 599  4069  2769  9962  8187 

To examine the distinct but overlapping effects of prosodic boundary strength, vowel duration, and following pause presence on vowel formant trajectories, the F1 and F2 trajectories were modelled using generalised additive mixed models (GAMMs) (Wood, 2017), which have been utilised in numerous recent analyses of dynamic formant trajectories (e.g., Kirkham , 2019; Renwick and Stanley, 2020; Sóskuthy , 2018; Stanley , 2021; Strycharczuk and Scobbie, 2017). While there are numerous other approaches to the statistical analysis of formant trajectories (e.g., Farrington , 2018; Risdal and Kohn, 2014; Williams and Escudero, 2014), GAMMs are well suited for addressing the research questions of the study, as it is possible to directly evaluate the distinct roles of prosodic boundaries, duration, and pauses on vowel position and trajectory shape, as well as the relationships between predictors on the outcome variable.

GAMMs for the normalised F1 and F2 trajectories were separately fit for each vowel using the mgcv package (Wood, 2011) in R (R Core Team, 2021), with parametric (linear) terms for BI label and the presence of the following pause, and non-parametric (smooth) terms for the sampling timepoint, duration, BI label (by timepoint), and following pause (by timepoint). To model the relationship between predictors, the models also included separate tensor product interactions between duration and sampling timepoint, which were also fit by BI labels and following pause presence. To control for the potential effect of confounding variables, the model was fit with parametric and non-parametric terms for the presence of PBMs and lexical pitch accents. To control for speaking rate (as a confounder to duration), the model also included a parametric term of local speech rate (syllables per second within an IP, subtracted from the speaker's mean speech rate), which was then scaled within-speaker (so the speech rate term is equivalent across speakers). The model also included a random smooth by speaker. As including a random smooth by word proved to be too computationally complex, the model was instead fit with a parametric term of log-transformed word frequency. To compensate for non-independence of the formant points in the trajectory, each model was first fit, and then refit using an AR1 parameter using the ρ value from the original fitted model (Sóskuthy, 2021). The statistical significance of the predictors of interest (prosodic boundary strength, duration, pause, and their interactions) was assessed by means of model comparison using the itsadug package (van Rij , 2020), where models fit without the term(s) of interest are compared on their minimised smoothing parameter selection score (and evaluated via a χ 2 test of the difference between scores), as well as by visual examination of predicted model trajectories (Renwick and Stanley, 2020). Code and models for this study are available (Tanner, 2023).

The results of comparing the fully-specified F1 and F2 GAMMs with those without each term of interest (prosodic boundary strength, duration, and pause) can be seen in Table 2. Looking first at the influence of prosodic boundary strength, including Break Index information resulted in a significant improvement in model fit for all vowels and formant types, indicating that the upcoming prosodic boundary has some modulating effect on the position and shape of formant trajectory (Table 2, rows 1–2). Figure 1(A) illustrates this observation, which shows the marginal effect of Break Index on each vowel's F1 and F2 trajectories. In contrast with prior studies on prosodic prominence (Jacewicz , 2009)—where vowels with greater prominence are more peripheral and have longer trajectories—there does not appear to be a clear order to the boundary effect on each vowel's acoustic realisation. Instead, trajectories appear similar in length across BI levels, with the position in vowel space not following any obvious “stronger > weaker” pattern with respect to boundary strength. Like prosodic boundary strength, the duration of the vowel also significantly improves model fit (Table 2, rows 3–4). As Fig. 1(B) shows, the duration of the vowel, independent of other prosodic factors, has a strong and ordered effect on the vowel, where vowels with longer durations are both more peripheral in formant space and exhibit substantially longer trajectories, which supports previous observations about regarding vowel duration and spectral characteristics (Fox and Jacewicz, 2009; Fridland , 2014; Mayr and Davies, 2011). The presence of a pause also has a significant independent effect on all vowels, with the exception of F2 for /u/ (Table 2, rows 5 and 6). Figure 1(C) shows that vowels preceding a pause appear to have a longer trajectory than those not preceding a pause, as well as appearing more peripheral (particularly in the starting points of the trajectories).

Table 2.

Model comparisons between the full model and different subset models for each vowel. First column denotes the set of terms removed in the subset model; second column denotes the degrees of freedom ((total difference in the number of terms between the subset and full model); third column denotes the formant modelled. The χ 2 value for the model comparison is reported for each vowel, with significant results (p < 0.05) in bold.

Model vs. full Degree of freedom Formant /a/ /i/ /u/ /e/ /o/
No BI  18  F1  685.2  117.1  67.6  676.5  578.5 
    F2  951.1  144.1  412.2  2196.2  747.9 
No duration  28  F1  4462.3  77.7  23.8  2210.4  1437.1 
    F2  1570.1  256.9  440.1  640.5  2316.9 
No pause  11  F1  3896.6  16.6  43.6  820.3  1570.9 
    F2  513.9  27.5  1.1  137.3  129.1 
No BI × duration  F1  277.1  38  0.9  81.2  117.2 
    F2  80.9  5.9  14.4  72.3  137.3 
No pause × duration  F1  284.4  3.5  9.5  24.8  267.6 
    F2  29.6  4.3  0.7  25.7  7.2 
Model vs. full Degree of freedom Formant /a/ /i/ /u/ /e/ /o/
No BI  18  F1  685.2  117.1  67.6  676.5  578.5 
    F2  951.1  144.1  412.2  2196.2  747.9 
No duration  28  F1  4462.3  77.7  23.8  2210.4  1437.1 
    F2  1570.1  256.9  440.1  640.5  2316.9 
No pause  11  F1  3896.6  16.6  43.6  820.3  1570.9 
    F2  513.9  27.5  1.1  137.3  129.1 
No BI × duration  F1  277.1  38  0.9  81.2  117.2 
    F2  80.9  5.9  14.4  72.3  137.3 
No pause × duration  F1  284.4  3.5  9.5  24.8  267.6 
    F2  29.6  4.3  0.7  25.7  7.2 
Fig. 1.

Predicted normalised formant trajectories by Break Index (A), duration (B), and presence of following pause (C), estimated as the median value from 10 000 draws from each model's posterior distribution. Trajectories reflect the “marginal effect” of the term of interest, where all other model terms are held at their average values. For duration, “average” corresponds to the averaged normalised duration, while “short” and “long” correspond to approximately 1 standard deviation less or greater, respectively, to the average normalised duration.

Fig. 1.

Predicted normalised formant trajectories by Break Index (A), duration (B), and presence of following pause (C), estimated as the median value from 10 000 draws from each model's posterior distribution. Trajectories reflect the “marginal effect” of the term of interest, where all other model terms are held at their average values. For duration, “average” corresponds to the averaged normalised duration, while “short” and “long” correspond to approximately 1 standard deviation less or greater, respectively, to the average normalised duration.

Close modal

Having examined the independent effects of prosodic boundary strength, duration, and following pause presence, the second goal of this study is to consider how these effects interact to condition each vowel's positional and dynamic realisation, given the strong overlap of these effects in speech. First considering how duration modulates the influence of prosodic boundary strength, Table 2 (rows 7 and 8) show that including this interaction as a non-parametric effect (i.e., its effect on trajectory shape) results in significantly improved model for all vowels except /u/. Figure 2(A) shows the model-predicted vowel trajectories at different duration values, and demonstrates that while /a/ (F1) and /e/ (F1 and F2) appear to exhibit greater trajectory differences between BI levels at longer vowel durations, such distinctions (in F2) for /o/ and /u/ are more present at shorter vowel durations. The relationship between duration and the presence of a following pause is more constrained, being only significant for both F1 and F2 for /a/ and /e/, and only significant for F1 for /o/ and /u/ (Table 2, rows 9 and 10). Compared with the interaction with Break Index, Fig. 2(B) shows a more complicated relationship between pause presence and duration, where the difference between pause and non-pause trajectories is greater at longer durations for /a/ F1, while shorter durations for /o/ F2, /a/ F2, and /e/ F2 differ substantially by the presence of a following pause.

Fig. 2.

Predicted normalised F1 (solid) and F2 (dashed) trajectories for Break Index (A) and following pause (B) by different rates of vowel duration. For duration, “average” corresponds to the averaged normalised duration, while “short” and “long” correspond to approximately 1 standard deviation less or greater, respectively, to the average normalised duration. Lines reflect median estimated value, and shaded areas correspond to 95% confidence intervals based on 10 000 draws from each model's posterior distribution.

Fig. 2.

Predicted normalised F1 (solid) and F2 (dashed) trajectories for Break Index (A) and following pause (B) by different rates of vowel duration. For duration, “average” corresponds to the averaged normalised duration, while “short” and “long” correspond to approximately 1 standard deviation less or greater, respectively, to the average normalised duration. Lines reflect median estimated value, and shaded areas correspond to 95% confidence intervals based on 10 000 draws from each model's posterior distribution.

Close modal

The goal of this study has been to examine the relationship between prosodic structure and the phonetic realisation of speech segments, focusing specifically on how the spectral properties of vowels—both overall position in formant space and the degree of spectral change—are modulated by the strength of an upcoming prosodic boundary. Given that the edges of prosodic domains are also closely related to other contextual factors, including increased duration due to pre-boundary lengthening (Cho, 2015; Turk and Shattuck-Hufnagel, 2007; Wightman , 1992) and the presence of pauses following the boundary (Ferreira, 1993), this study attempts to examine the influence of prosodic boundary strength on vowel realisation whilst accounting for the presence of these co-occurring effects, as well as considering how these effects interact to modulate the phonetic realisation of vowels.

Through modelling the effects of prosodic boundary strength, vowel duration, and pause presence in a corpus of spontaneous Japanese speech, it is found that prosodic boundary strength does exhibit an effect on the formant position and dynamic properties of Japanese phonemically short vowels, though this effect does not appear to follow a clear pattern of incremental strength (i.e., where stronger boundaries would result in more peripheral or dynamic vowels). This contrasts with previous studies examining the influence of prosodic prominence on vowel dynamics (Jacewicz , 2009; Wouters and Macon, 2002), and suggests that different dimensions of prosodic structure may have distinct effects on phonetic realisation. Instead, the duration of the vowel is shown to have a strong directional effect on the vowel's spectral properties, consistent with previous observations (Fridland , 2014; Jacewicz , 2009); it would be tempting, then, to consider the possibility that observations of increased vowel peripherality and spectral change in prosodically strong positions may instead reflect the confounding effect of duration. In other words, vowels in prosodically strong positions are typically longer in duration, which would in turn drive the observed effects on a vowel's formant position and trajectory. However, as Table 2 and Fig. 2(A) seem to suggest, some distinctions in prosodic boundary strength are more apparent at shorter durations (e.g., /o/ and /u /F2), indicating that phonetic vowel duration alone cannot account for the subtle role of prosodic boundary strength observed in this analysis. A possible explanation for this subtle boundary strength effect may be that Japanese makes little use of vowel dynamics for signalling the presence of a boundary—given that Japanese maintains a robust system of intonational cues for marking the edges of prosodic domains (Igarashi, 2014; Venditti , 1998; Venditti , 2008)—and the wide range of cross-linguistic variation in the perception of acoustic cues to prosodic structure (Kim, 2020). As such, it is possible that cue-weighting for the perception of boundaries is oriented towards durational lengthening, F0 reset, pauses, and the presence of PBMs (Krivokapić, 2007; Swerts, 1997; Wightman , 1992). Further research concerning the perception of prosody in Japanese would be needed to validate this hypothesis. Finally, as this study focuses exclusively on phonemically short vowels in Japanese, one possibility is that phonemically short vowels in Japanese may be too phonetically short for a speaker to articulatorily manifest any boundary-related effects. In this sense, further research may wish to examine the role of boundary strength on phonemically long vowels (e.g., /e:/, /a:/, etc.) or other vowel–vowel adjacent sequences (e.g., /ai/, /oi).

With respect to the relationship between prosody and the phonetic realisation of segments, this study has demonstrated the prosodic structure influences not just the temporal properties of segments, but also the spectral properties of vowels, including their position in formant space and their degree of dynamic formant change. Specifically, however, it is shown that the factors that often co-occur with prosodic boundaries in natural speech—segmental lengthening and the presence of pauses—appear to have the most pronounced effects on a vowel's spectral properties, which may reflect the articulatory and perceptual overlap in these factors. In this sense, this study illustrates the importance of multifactorial approaches to the study of phonetic phenomena, particularly in natural-speech settings (Tomaschek , 2018), where the analysis of potentially colinear or confounding variables may reveal patterns otherwise unobserved when single factors are examined in isolation.

The author wishes to thank Yōsuke Igarashi, Morgan Sonderegger, and Jane Stuart-Smith for helpful comments. Computational resources were provided by Calcul Québec and the Digital Research Alliance of Canada. The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to licencing restrictions (see https://clrd.ninjal.ac.jp/csj/en/index.html).

1

The mora is generally considered to be the primary timing unit in Japanese (Warner and Arai, 2001), while the syllable plays a separate role in constraining phonological well formedness (Kawahara, 2016).

2

Previous studies have made use of a “final” or “utterance” label based on the presence of final pause (e.g., Martin , 2016); since examining the effect of final pause is one goal of this study, an utterance-level label is not used and the presence of pauses are instead examined separately.

1.
Brandt
,
E.
,
Zimmerer
,
F.
,
Andreeva
,
B.
, and
Mōbius
,
B.
(
2018
). “
Impact of prosodic structure and information density on dynamic formant trajectories in German
,” in Proceedings of Speech Prosody 2018, Poznan, Poland (International Speech Communication Association, Baixas, France), pp.
119
213
.
2.
Cho
,
T.
(
2005
). “
Prosodic strengthening and featural enhancement: Evidence from acoustic and articulatory realizations of /a,i/ in English
,”
J. Acoust. Soc. Am.
117
,
3867
3878
.
3.
Cho
,
T.
(
2009
). “
Effects of initial position versus prominence in English
,”
J. Phon.
37
,
466
485
.
4.
Cho
,
T.
(
2015
). “
Language effects on timing at the segmental and suprasegmental levels
,” in
The Handbook of Speech Production
, edited by
M. A.
Redford
(
Wiley-Blackwell
,
Oxford, UK
), pp.
505
529
.
5.
Farrington
,
C.
,
Kendall
,
T.
, and
Fridland
,
V.
(
2018
). “
Vowel dynamics in the southern vowel shift
,”
Am. Speech
93
,
186
222
.
6.
Ferreira
,
F.
(
1993
). “
Creation of prosody during sentence production
,”
Psych. Rev.
100
,
233
253
.
7.
Fougeron
,
C.
, and
Keating
,
P. A.
(
1997
). “
Articulatory strengthening at edges of prosodic domains
,”
J. Acoust. Soc. Am.
101
,
3728
3740
.
8.
Fox
,
R. A.
, and
Jacewicz
,
E.
(
2009
). “
Cross-dialectal variation in formant dynamics of American English
,”
J. Acoust. Soc. Am.
126
,
2603
2618
.
9.
Fridland
,
V.
,
Kendall
,
T.
, and
Farrington
,
C.
(
2014
). “
Durational and spectral differences in American English vowels: Dialect variation within and across groups
,”
J. Acoust. Soc. Am.
136
,
341
349
.
10.
Fujimoto
,
M.
(
2015
). “
Vowel devoicing
,” in
Handbook of Japanese Phonetics and Phonology
, edited by
H.
Kubozono
(
De Gruyter Mouton
,
Berlin, Germany
), pp.
167
214
.
11.
Georgeton
,
L.
, and
Fougeron
,
C.
(
2014
). “
Domain-initial strengthening on French vowels and phonological contrasts: Evidence from lip articulation and spectral variation
,”
J. Phon.
44
,
83
95
.
12.
Harrington
,
J.
, and
Cassidy
,
S.
(
1994
). “
Dynamic and target theories of vowel classification: Evidence from monophthongs and diphthongs in Australian English
,”
Lang. Speech
37
,
357
373
.
13.
Hillenbrand
,
J.
,
Getty
,
L. A.
,
Clark
,
M. J.
, and
Wheeler
,
K.
(
1995
). “
Acoustic characteristics of English vowels
,”
J. Acoust. Soc. Am.
97
,
3099
3111
.
14.
Hirata
,
Y.
, and
Tsukada
,
K.
(
2004
). “
The effects of speaking rates and vowel length on formant movements in Japanese
,” in
Proceedings of the Texas Linguistics Society Conference
, edited by
A.
Agwuele
,
W.
Warren
, and
S.-H.
Park
(
Cascadilla Proceedings Project
,
Somerville, MA
), pp.
73
85
.
15.
Igarashi
,
Y.
(
2014
). “
Typology of intonational phrasing in japanese dialects
,” in
Prosodic Typology II: The Phonology of Intonation and Phrasing
, edited by
S.-A.
Jun
(
Oxford University Press
,
Oxford, UK
), pp.
464
492
.
16.
Jacewicz
,
E.
,
Salmons
,
J.
, and
Fox
,
R. A.
(
2009
). “
Prosodic conditioning, vowel dynamics and sound change
,” in
Variation and Gradience in Phonetics and Phonology
, edited by
F.
Kügler
,
C.
Féry
, and
R.
van de Vijver
(
De Gruyter Mouton
,
Berlin, Germany
), pp.
99
124
.
17.
Jadoul
,
Y.
,
Thompson
,
B.
, and
de Boer
,
B.
(
2018
). “
Introducing Parselmouth: A Python interface to Praat
,”
J. Phon.
71
,
1
15
.
18.
Kawahara
,
S.
(
2016
). “
Japanese has syllables: A reply to Labrune
,”
Phonology
33
,
169
194
.
19.
Kikuchi
,
H.
, and
Maekawa
,
K.
(
2003
). “
Performance of segmental and prosodic labeling of spontaneous speech
,” in
Proceedings of the International Speech Communication Association/Institute of Electrical and Electronics Engineers Workshop on Spontaneous Speech Processing and Recognition
, (Tokyo Institute of Technology, Tokyo, Japan), paper TAP6.
20.
Kim
,
J.
(
2020
). “
Individual differences in the production and perception of prosodic boundaries in American English
,” Ph.D. thesis,
University of Michigan
,
Ann Arbor, MI
.
21.
Kirkham
,
S.
,
Nance
,
C.
,
Littlewood
,
B.
,
Lightfoot
,
K.
, and
Groake
,
E.
(
2019
). “
Dialect variation in formant dynamics: The acoustics of lateral and vowel sequences in Manchester and Liverpool English
,”
J. Acoust. Soc. Am.
145
,
784
794
.
22.
Koiso
,
H.
,
Den
,
Y.
,
Nishikawa
,
K.
, and
Maekawa
,
K.
(
2014
). “
Design and development of an RDB version of the Corpus of Spontaneous Japanese
,” in
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
, Reykjavik, Iceland [European Language Resources Association (ELRA), France], pp.
1471
1476
.
23.
Krivokapić
,
J.
(
2007
). “
Prosodic planning: Effects of phrasal length and complexity on pause duration
,”
J. Phon.
35
,
162
179
.
24.
Kubozono
,
H.
(
1993
).
The Organization of Japanese Prosody
(
Kuroshio
,
Tokyo
).
25.
Maekawa
,
K.
,
Kikuchi
,
H.
,
Igarashi
,
Y.
, and
Venditti
,
J.
(
2002
). “
X-JToBI: An extended J_ToBI for spontaneous speech
,” in
7th International Conference on Spoken Language Processing, ICSLP 2002
, Denver, CO, pp.
1545
1548
.
26.
Maekawa
,
K.
,
Koiso
,
H.
,
Furui
,
S.
, and
Isahara
,
H.
(
2000
). “
Spontaneous speech corpus of Japanese
,” in
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC'00)
, Athens, Greece [European Language Resources Association (ELRA), France], Vol.
2
, pp.
946
952
.
27.
Martin
,
A.
,
Igarashi
,
Y.
,
Jincho
,
N.
, and
Mazuka
,
R.
(
2016
). “
Utterances in infant-directed speech are shorter, not slower
,”
Cognition
156
,
52
59
.
28.
Mayr
,
R.
, and
Davies
,
H.
(
2011
). “
A cross-dialectal acoustic study of the monophthongs and diphthongs of Welsh
,”
J. Int. Phon. Assoc.
41
,
1
25
.
29.
Mo
,
Y.
,
Cole
,
J.
, and
Hasegawa-Johnson
,
M.
(
2009
). “
Prosodic effects on vowel production: Evidence from formant structure
,” in
INTERSPEECH 2009, 10th Annual Conference of the International Speech Communication Association
, Brighton, UK, September 6–10.
30.
Petrone
,
C.
,
Truckenbrodt
,
H.
,
Wellmann
,
C.
,
Holzgrefe-Lang
,
J.
,
Wartenburger
,
I.
, and
Hōhle
,
B.
(
2017
). “
Prosodic boundary cues in German: Evidence from the production and perception of bracketed lists
,”
J. Phon.
61
,
71
92
.
31.
Pierrehumbert
,
J.
, and
Beckman
,
M.
(
1988
).
Japanese Tone Structure
(
MIT Press
,
Cambridge, MA
).
32.
R Core Team
(
2021
). “
R: A Language and Environment for Statistical Computing
,” R Foundation for Statistical Computing,
Vienna, Austria
, https://www.R-project.org/.
33.
Renwick
,
M. E. L.
, and
Stanley
,
J. A.
(
2020
). “
Modeling dynamic trajectories of front vowels in the American South
,”
J. Acoust. Soc. Am.
147
,
579
595
.
34.
Risdal
,
M. L.
, and
Kohn
,
M. E.
(
2014
). “
Ethnolectal and generational differences in vowel trajectories: Evidence from African American English and the Southern vowel system
,” in
Nwav 42
(
University of Pennsylvania
,
Philadelphia
), pp.
139
148
.
35.
Sóskuthy
,
M.
(
2021
). “
Evaluating generalised additive mixed modelling strategies for dynamic speech analysis
,”
J. Phon.
84
,
101017
.
36.
Sóskuthy
,
M.
,
Foulkes
,
P.
, and
Hughes
,
V.
(
2018
). “
Changing words and sounds: The roles of different cognitive units in sound change
,”
Top. Cogn. Sci.
10
,
787
802
.
37.
Stanley
,
J. A.
,
Renwick
,
M. E. L.
,
Kuiper
,
K. I.
, and
Olsen
,
R. M.
(
2021
). “
Back vowel dynamics and distinctions in Southern American English
,”
J. Eng. Ling.
49
,
389
418
.
38.
Steffman
,
J.
, and
Jun
,
S.-A.
(
2021
). “
Tonal cues to prosodic structure in rate-dependent speech perception
,”
J. Acoust. Soc. Am.
150
,
3825
3837
.
39.
Streeter
,
L.
(
1978
). “
Acoustic determinants of phrase boundary perception
,”
J. Acoust. Soc. Am.
64
,
1582
1592
.
40.
Strycharczuk
,
P.
, and
Scobbie
,
J. M.
(
2017
). “
Fronting of southern British English high-back vowels in articulation and acoustics
,”
J. Acoust. Soc. Am.
142
,
322
331
.
41.
Swerts
,
M.
(
1997
). “
Prosodic features at discourse boundaries of different strength
,”
J. Acoust. Soc. Am.
101
,
514
521
.
42.
Tanner
,
J.
(
2023
). “
Prosodic and durational influences on the formant dynamics of Japanese vowels
,” available at https://doi.org/10.17605/OSF.IO/3U2KS.
43.
Tomaschek
,
F.
,
Hendrix
,
P.
, and
Baayen
,
R. H.
(
2018
). “
Strategies for addressing collinearity in multivariate linguistic data
,”
J. Phon.
71
,
249
267
.
44.
Turk
,
A.
, and
Shattuck-Hufnagel
,
S.
(
2007
). “
Multiple targets of phrase-final lengthening in American English words
,”
J. Phon.
35
,
445
472
.
45.
van Rij
,
J.
,
Wieling
,
M.
,
Baayen
,
R. H.
, and
van Rijn
,
H.
(
2020
). “
itsadug: Interpreting time series and autocorrelated data using gamms
,” R package version 2.4.
46.
Venditti
,
J.
(
2005
). “
The J_ToBI model of japanese intonation
,” in
Prosodic Typology
, edited by
J.
Sun-Ah
(
Oxford University Press
,
Oxford, UK
), pp.
172
200
.
47.
Venditti
,
J.
,
Maeda
,
K.
, and
van Santen
,
J. P.
(
1998
). “
Modeling Japanese boundary pitch movements for speech synthesis
,” in
Proceedings of the 3rd European Speech Communication Association/Committee for the Co-ordination and Standardisation of Speech Databases and Assesment Techniques Workshop on Speech Synthesis (SSW 3)
, pp.
317
322
.
48.
Venditti
,
J.
,
Maekawa
,
K.
, and
Beckman
,
M. E.
(
2008
). “
Prominence marking in the japanese intonation system
,” in
Handbook of Japanese Linguistics
, edited by
S.
Miyagawa
and
M.
Saito
(
Oxford University Press
,
Oxford, UK
), pp.
456
512
.
49.
Warner
,
N.
, and
Arai
,
T.
(
2001
). “
The role of the mora in the timing of spontaneous Japanese speech
,”
J. Acoust. Soc. Am.
109
(
3
),
1144
1156
.
50.
Wightman
,
C. W.
,
Shattuck-Hufnagel
,
S.
,
Ostendorf
,
M.
, and
Price
,
P. J.
(
1992
). “
Segmental durations in the vicinity of prosodic phrase boundaries
,”
J. Acoust. Soc. Am.
91
,
1707
1717
.
51.
Williams
,
D.
, and
Escudero
,
P.
(
2014
). “
A cross-dialectal acoustic comparison of vowels in Northern and Southern British English
,”
J. Acoust. Soc. Am.
136
,
2751
2761
.
52.
Wood
,
S.
(
2017
).
Generalized Additive Models: An Introduction with R
(
CRC Press
,
Boca Raton, FL
).
53.
Wood
,
S. N.
(
2011
). “
Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models
,”
J. R. Stat. Soc. Ser. B
73
(
1
),
3
36
.
54.
Wouters
,
J.
, and
Macon
,
M. W.
(
2002
). “
Effects of prosodic factors on spectral dynamics. I. analysis
,”
J. Acoust. Soc. Am.
111
,
417
427
.
55.
Yazawa
,
K.
, and
Kondo
,
M.
(
2019
). “
Acoustic characteristics of Japanese short and long vowels: Formant displacement effect revisited
,” in
Proceedings of the 19th International Congress of Phonetic Sciences, Melbourne, Australia
.