This study examines a rare cross-linguistic contrast, that between plain and secondarily palatalized postalveolar fricatives, through (i) an acoustic analysis of the production of 31 Romanian speakers, and (ii) a perception experiment with a different group of 31 native speakers. Evidence of acoustic separation between plain and palatalized forms was found for 27 of the subjects, suggesting that the contrast is produced by the majority. This is consistent with previous reports of native speakers collected in 1961. These findings were supported by the results of the perceptual experiment, which showed that native speakers exhibit moderate sensitivity to this contrast. An examination of each of the two genders' production separately suggests that a process of neutralization may be in progress, more strongly realized by males compared to females. Aside from documenting this phenomenon in Romanian, an explanation is sought for its longevity, and it is proposed that grammatical restructuring offers the best account for the observed facts.

It has been noted that in phonemic inventories, postalveolars usually pattern with either plain or palatalized consonants but not both (Kochetov, 2002). This has been attributed to the low salience of the secondary palatalization contrast at this place of articulation (Kochetov, 2002), which may be related to gestural timing, specifically to the overlap or blending of the palatalization gesture with that of the primary place of articulation (Zsiga, 2000). More generally, as noted by Mester and Itô (1989), the realization of the palatalized form of any consonant at the coronal place of articulation tends to involve, other than the secondary gesture, additional articulatory changes compared to the plain form, e.g., a change in primary place of articulation and increased burst release duration in the case of stops, to the point where the contrast is effectively between a (plain) stop and a (palatalized) affricate. According to Mester and Itô (1989), the characterization “palatalized” is strictly speaking only accurate for noncoronals, i.e., labials and velars, as palatalization of coronals changes their primary place of articulation to palatal or alveopalatal. It has been suggested that this may be due to the difficulty of achieving simultaneous targets at the dental and palatal regions, leading to intermediate articulations between the two (Kochetov, 2002), or to phonologically-driven strategies for phonetic enhancement in order to preserve a perceptually vulnerable contrast (Spinu et al., 2012). Either way, the perceptual consequence of these changes is increased salience of the plain-palatalized contrast.

The typologically rare contrast between plain and palatalized postalveolar fricatives ([ʃ]-[ʃj]) is present in Romanian. This contrast has low to moderate perceptual salience, as demonstrated by the results of experimental work with native speakers (Spinu, 2007; Spinu et al., 2012). Despite these findings, an acoustic study of the stimuli employed in the perception experiments showed that the distinction between plain and palatalized consonants is not statistically significant in postalveolars, unlike other places of articulation (Spinu et al., 2012). These findings to some extent are consistent with previous typological descriptions attesting the rarity of the plain-palatalized contrast at this place (Kochetov, 2002), but also raise questions in light of the observed mismatch between perception and production. Specifically, can any evidence be found of the realization of the secondary palatalization contrast at this place of articulation by employing more refined acoustic methods? Similarly, would using different methodology uncover increased perceptual sensitivity to this contrast? Given the results of a 1961 study suggesting that the plain-palatalized contrast at the postalveolar place may have been present at the time in the production of Romanian speakers (Şuteu, 1961), the answers to these questions have the potential to provide both a diachronic and synchronic perspective of secondary palatalization in Romanian fricatives at this place of articulation (and also compared to other places). This paper helps us gain more insight into the typology of secondary palatalization. Furthermore, it contributes new data from an understudied language to sociolinguistic theories of language change and adds to the body of work on phonological contrast neutralization.

Secondary palatalization is present in languages from diverse families, such as Polish, Russian, Hungarian, Irish, Mandarin, Mongolian, Navajo, Isthmus Mixe, etc. (Bateman, 2007; Bhat, 1978). A survey of a random sample of 117 languages found secondary palatalization in 27% of them (Bateman, 2007), suggesting that typologically it is not a rare phenomenon. In terms of phonological status, there are two main patterns, with secondary palatalization being either (1) distinctive, as in Russian, where consonants with secondary palatal articulations are part of the phonemic inventory, contrasting with plain ones, e.g., [glup] stupid vs [glupj] depth, or (2) non-distinctive, as in Japanese (Vance, 1987), where secondary palatalization is assumed to be a surface realization of underlying consonant-high front vowel or consonant-glide sequences, e.g., /kar-itai/ [karjitai] shear (Kochetov and Alderete, 2011).

In terms of phonotactics, neutralization of the plain-palatalized contrast is encountered in final (coda) position, as well as in pre-consonantal position, more often with labials than with coronals (Kochetov, 2002). As for articulation, secondarily palatalized consonants are characterized by fronting and raising of the tongue body towards the hard palate, timed with respect to the primary articulation. The timing of the primary and secondary gestures was found to vary with speaker and syllabic position (Kochetov, 1998, 2002).

Acoustically, palatalized consonants are generally longer than the plain ones, with stops having a strident-like release, which causes lower F1 and higher F2 on neighboring vowels, as compared to plain consonants. The perception of the plain-palatalized contrast was found to be influenced by the primary place of articulation, with the contrast being disfavored (less salient) at the labial place compared to [+anterior] coronal in experiments with Russian stops (Kavitskaya, 2006; Kochetov, 2002). In a more recent experiment with Romanian fricatives, however, the reverse pattern was found, with the plain-palatalized contrast in labiodentals and glottals/velars being more salient than in both [+anterior] and [-anterior] coronals (Spinu et al., 2012).

While true palatalized palatals appear to be unattested (Operstein, 2010) and are deemed impossible from both a phonological and articulatory perspective (Hall, 1997, but see Campbell, 1974, for a differing view), the situation with postalveolars is not categorical. Kochetov (2002) notes that in a given language, postalveolar segments usually pattern with either plain or palatalized consonants, but not both. However, some loanwords in Polish show palatalization of (retroflex) postalveolar fricatives before the high front vowel /i/ to palatalized laminal postalveolar fricatives, which contrast acoustically with alveolo-palatal fricatives (Zygis and Hamann, 2003). Moreover, it was reported that Livonian contrasts /ʃ/ and /ʃj/, while Mordvin contrasts /c/ and /cj/ despite the posited non-existence of true palatals (Campbell, 1974, cf. Van der Weijer, 2011).

Morphological palatalization affects all consonants in Isthmus Mixe, including the postalveolar fricative (Dieterman, 2008). An acoustic analysis found distinctions between the plain and palatalized postalveolar forms in duration, spectral peak, and formant transitions, with higher F2 and F3 for the palatalized consonants (Dieterman, 2008). Dieterman notes that, despite extensive research, she has not been able to find any languages besides the Oaxacan Mixe family in which the entire consonant inventory may be modified by secondary palatalization manifesting a morpheme. As we will see, while Romanian palatalization is morphologically-induced for the most part, it differs from Isthmus Mixe in that it applies to most but not all consonants in the inventory.

Russian is one of the few languages with a four-way contrast involving palatalized sibilant fricatives, specifically: palatalized dental/alveolar /sj/, palatalized postalveolar (prepalatal) /ʃj/, non-palatalized dental/alveolar /s/, and retroflex (apical postalveolar) /ʃ/ (Timberlake, 2004). A recent acoustic study (Kochetov, 2017) found that the palatalized vs non-palatalized contrast was distinguished by F1 and (especially) F2 at the onset, the midpoint, and in some cases at the offset of the following vowel. Fricative duration only marginally distinguished /ʃj/, commonly described as a geminate, from the other consonants. The author's conclusion is that Russian voiceless sibilant fricatives are robustly distinguished by spectral differences in this language.

Not found elsewhere in the Romance family, contrastive secondary palatalization in Romanian occurs only in word-final position, as it is commonly associated with (but not restricted to) the presence of two affixes thought to be an underlying [-i]: the plural of certain nouns and adjectives (1a) and the second person singular in the present indicative of verbs (1b). (1c) shows an example of the contrast in postalveolars. With regard to the phonological status of secondary palatalization in this language, the widespread view is that an underlying word-final /i/ triggers palatalization on the preceding consonant and is then deleted (Chitoran, 2002), resulting in a surface contrast between plain and palatalized consonants. Thus, secondary palatalization in Romanian is not considered phonemic as it is in Russian.

(1)

  • [domn] gentleman – /domn+i/ → [domnj] gentlemen

  • [sar] I jump – /sar+i/ → [sarj] you jump

  • [arkaʃ] archer – /arkaʃ+i/ → [arkaʃj] archers

Tables I and II provide the phonemic inventory of Romanian consonants and the phonetic inventory of palatalized consonants, respectively.

TABLE I.

Phonemic inventory of Romanian consonants.

BilabialLabiodentalDentalPostalveolarVelarGlottal
Nasal     
Plosive p b  t d  k g  
Affricate   ʦ ʧ ʤ   
Fricative  f v s z ʃ ʒ  ha 
Trill      
Approximant      
BilabialLabiodentalDentalPostalveolarVelarGlottal
Nasal     
Plosive p b  t d  k g  
Affricate   ʦ ʧ ʤ   
Fricative  f v s z ʃ ʒ  ha 
Trill      
Approximant      
a

The allophonic realizations of /h/ include both voiced and voiceless glottal and velar fricatives; in word-final position, it is realized mostly as [x].

TABLE II.

Romanian palatalized consonants. Note: (1) [d] and [s] are never palatalized in Romanian, and alternate with [zj] and [ʃj], respectively, in the context of the plural/second person singular morpheme; (2) [tj], [kj], and [gj] always correspond to a root-final /i/ and are not morphologically conditioned; (3) [ʧj] and [ʤj] are the palatalized counterparts for [k] and [g]; (4) [çj] is the palatalized form of /h/.

BilabialLabiodentalDentalPostalveolarPalatalVelar
Nasal       [pomj] trees  [anj] years    
Plosive [lupj] [lobj] wolves lobes  [puʃtj] kid   [unkj] [ungj] uncle angle 
Affricate   [hoʦj] thieves [raʧj] [raʤj] crawfish.pl. you bray   
Fricative  [zulufj] [gravj]curls grave.M.pl [kazj] you fall [paʃj] [paʒj]         steps  pageboys [ʧeçj] Czechs  
Trill   [purj] pure. M. pl    
Approximant   [bolj]illnesses    
BilabialLabiodentalDentalPostalveolarPalatalVelar
Nasal       [pomj] trees  [anj] years    
Plosive [lupj] [lobj] wolves lobes  [puʃtj] kid   [unkj] [ungj] uncle angle 
Affricate   [hoʦj] thieves [raʧj] [raʤj] crawfish.pl. you bray   
Fricative  [zulufj] [gravj]curls grave.M.pl [kazj] you fall [paʃj] [paʒj]         steps  pageboys [ʧeçj] Czechs  
Trill   [purj] pure. M. pl    
Approximant   [bolj]illnesses    

In a 1961 study (Şuteu, 1961), questionnaires regarding the pronunciation of Romanian words were sent out by mail to 920 informants of various professional backgrounds (e.g., clerks, physicians, engineers, lawyers, priests, and artists) but crucially no linguistically-trained individuals (e.g., teachers, professors, scientists, writers, or journalists). All informants were living in Bucharest and never received additional reminders or requests to complete and return the questionnaires so as to avoid the elicitation of superficial answers that might have been provided as a result of even the mildest pressure (p. 295). Most informants were college graduates. The author describes this group as homogeneous in terms of socioeconomic and educational status and speaking the standard form of the language. 314 questionnaires were completed and returned to the author. Each questionnaire included two parts comprising a total of 88 demographic and language use questions, most of which could be answered by a simple “yes” or “no.” One question in particular (Question II.7) is of relevance to the current study because it addresses the pronunciation of a word ending in a postalveolar fricative: “Do you pronounce the singular and plural of the word ‘moş’ (old man) in the same way?” This word, which is pronounced [moʃ], ends in a postalveolar fricative in its singular form and takes on the suffix “i” in its plural form. The spelling of the plural form is “moşi.” Of 309 answers received, 94.4% stated that they did not pronounce the two forms in the same way and, according to the author, many specified that they produced the plural form with a short or weak i-sound at the end. Because this study is based on self-described pronunciation without acoustic analysis, the results are difficult to interpret. While these reports are inconsistent with Schane's (1971) observation that a depalatalization process applies to palatal consonants in Romanian (i.e., ʃ, ʒ, ʧ) such that they do not surface with secondary palatalization, there is a possibility that the participants in the 1961 study were influenced by orthography and morphological factors, and their perception of how they pronounce these words is not accurate.

Acoustically, Spinu et al. (2012) found no significant differences between Romanian plain and palatalized postalveolars at the group level, though the descriptive statistics showed some differences in mean duration and spectral measures (i.e., cepstral coefficients 0 through 5). The lack of significant differences in some of these measures was also observed with other plain-palatalized pairs, most notably [z]-[zj]. A classification of palatalization performed using a linear discriminant analysis yielded 19.5% correct classification for the plain postalveolar and 96.7% for the palatalized one, leading to the conclusion that the two forms are for the most part indistinguishable from each other, both displaying the acoustic properties characteristic of secondary palatalization. These results were replicated in a more recent study (Spinu and Lilley, 2016) comparing the effectiveness of cepstral coefficients and spectral moments in the classification of fricatives. It was found that the measures that best capture the distinctions between plain and palatalized forms are as follows: for the cepstral method, coefficients 4, 3, and 2 (in that order), all extracted from the third and last temporal region of the fricative, and for the spectral method, spectral moments 3, 2, and 1, also extracted from the last fricative region. In a multinomial logistic regression analysis with consonant palatalization as the dependent variable and the top three measures from each set as continuous explanatory variables, the correct classification rate was 85.17% for cepstral coefficients and 80.16% for spectral moments. These analyses, however, collapsed fricatives from four places of articulation together. For the postalveolar segments, correct classification was close to the chance level for the plain form, while the palatalized form was classified correctly over 80% of the time, similarly to Spinu et al. (2012).

With respect to perception, Spinu et al. (2012) examined plain and palatalized fricatives from four places of articulation: labiodental, alveolar, postalveolar, and glottal/velar (referred to as “dorsal” in that publication). The stimuli consisted of 640 fricatives produced by 16 speakers, accompanied by twice as many fillers. They were presented in a context-neutral carrier sentence, and the subjects performed a forced-choice task in which, following the audio presentation of a sentence, they had to press a key corresponding to one of two words displayed on the screen, one in the singular form (ending in a plain consonant, e.g., “pantof”) and one in the plural form (phonetically ending in a palatalized consonant, and spelled with a final Ci sequence, according to the orthographic conventions of Romanian, e.g., “pantofi”). The results for accuracy, reaction time, and sensitivity (measured as the d′statistic) revealed the same patterns: (1) dorsals tended to be the most favorable hosts for the palatalization contrast with higher accuracy and sensitivity values, and (2) the postalveolar place of articulation was the least favorable, with significantly lower accuracy and longer reaction times than all other places. Spinu and Lilley (2016) determined that the Romanian dorsal fricative is realized at different places of articulation word-finally, depending on whether it is plain—in which case it is realized as mostly velar, with 13% glottal realizations—or palatalized, in which case it is realized exclusively as a palatal fricative. The difference in primary place of articulation between the plain and palatalized dorsal may have arisen for the purpose of contrast enhancement (Stevens and Keyser, 2010; Stevens et al., 1986). Regardless of the factors behind its occurrence, it is likely connected to the high perceptual salience of this contrast.

The findings reported in Sec. II C suggest that the status of the secondary palatalization contrast in Romanian postalveolars remains unclear to date. On the one hand, over 90% of Romanian speakers in Şuteu's (1961) study believed that they were articulating the contrast in 1961. While their perception could be influenced by a number of factors and not necessarily reflect reality, it is nevertheless a fact that their intuition is at odds with the results of a subsequent acoustic study using cepstral coefficients (Spinu et al., 2012), in which no significant differences were found between plain and palatalized segments at the group level. On the other hand, the results of the 2012 study do reveal mean differences between plain and palatalized postalveolars in terms of duration (129.8 ms for the plain segment and 142.4 ms for the palatalized one) and some of the cepstral coefficients examined. Furthermore, in a perceptual experiment reported in the 2012 study, listeners' sensitivity to the secondary palatalization contrast in postalveolars (measured as the d′ statistic) was .79. A d′ close to zero is interpreted as a lack of conscious access (Vermeiren and Cleeremans, 2012) while a value of 1 corresponds to 69% correct both for cases when the signal is present and when it is absent. This indicates that Romanian listeners display some sensitivity to this contrast. Taken together, these findings suggest that, while elusive, acoustic differences between plain and palatalized Romanian postalveolars do exist and closer investigation is necessary in order to uncover them. The main goal for the current paper is thus to shed more light on the acoustic and perceptual properties of this contrast.

In the following sections, the properties of plain and palatalized postalveolar fricatives are examined from an acoustic perspective, as well as in a new perceptual experiment. The questions addressed are as follows:

  1. Can any acoustic differentiation be found between plain and palatalized postalveolars?

  2. Do Romanian speakers display higher sensitivity to this contrast in circumstances that would cause them to focus more on the information conveyed by the presence/absence of secondary palatalization?

The current study thus adds to previous work in a number of ways. First, the focus is now explicitly on the secondary palatalization contrast in postalveolars and to this purpose a series of new statistical analyses are conducted on previously collected data. Second, a closer look will be taken for the first time at gender patterns as well as the individual patterns exhibited by speakers. Finally, the results of a new perceptual experiment, with different methodology compared to the 2012 paper, are reported.

The data analyzed here are a subset of a larger corpus collected in Spinu (2010). The analysis of the entire set of Romanian fricatives and a more detailed account of the experimental procedure are reported in Spinu et al. (2012) and Spinu and Lilley (2016).

1. Hypothesis and additional research questions

This study's main hypothesis is concerned with whether the contrast is present in the speech of Romanians. Based on the self-reports from the earlier study (Şuteu, 1961) and the perceptual findings of Spinu et al. (2012), the prediction is that there are acoustic differences between plain and palatalized /ʃ/.

Assessing the strength of these differences and the extent to which individual speakers produce them will permit a tentative first picture of the neutralization status of this contrast. Regarding potential gender differences, sociolinguistic studies have long observed that women use more forms of standard language than men (Gregoire, 2006). The opposite pattern was also found, however, in other studies describing women as leaders of language change (Shin, 2013) and the existence of what Labov calls the gender paradox: “women conform more closely than men to sociolinguistic norms that are overtly prescribed, but conform less than men when they are not” (Labov, 2001, p. 293; Nevalainen and Raumolin-Brunberg, 2003, p. 112). There is thus no clear expectation possible in this regard, but the question of which gender—if any—produces stronger acoustic cues to the secondary palatalization contrast in postalveolars will be explored in what follows.

2. Stimuli

To better understand the properties of postalveolar fricatives in Romanian, their behavior is compared to that of other plain-palatalized pairs. The stimuli collected for the original corpus (Spinu, 2010) consisted of pairs of words ending in a plain and a palatalized version of five fricatives from four places of articulation. The originally collected consonants were:

Labiodental: /f/ and /v/

Dental: /z/

Postalveolar: /ʃ/

Glottal/velar: /h/ ∼ /x/ – Spinu and Lilley (2016) determined that the realization of this segment in word-final position is as either a velar fricative or glottal fricative in its plain form, and exclusively as a palatal fricative when palatalized. Previous descriptions varied between a glottal fricative (Chitoran, 2002), a glide (Ruhlen, 1973), and a velar fricative, particularly before liquids or in word-final position (Mallinson, 1986; Sarlin, 2014). In this paper, /h/ is used for consistency with previous work and to reflect what is believed to be the phonemic representation of these sounds.

The stimuli consisted of pairs of real Romanian words that differed only in whether their final consonant was plain or palatalized. There were four pairs of words for each consonant. All words were disyllabic, with final stress, and were presented in a context-neutral carrier sentence, as shown in (2) below. The full set of stimuli is provided in Table III. Because of the restrictions on word shape and status, the vowel preceding the target consonant was not strictly controlled for, resulting in an unbalanced set of [e, a, o, u]. High front vowels were excluded because their presence might have resulted in coarticulatory effects (they are common triggers of primary and secondary palatalization in other languages). The uneven distribution of vowels poses certain limitations to the current study, discussed in more detail in Sec. III A 4. To avoid these limitations, the focus is on a subset of the consonants (/f/, /ʃ/, and /h/), specifically words in which the fricatives are preceded by either [a] or [o]. The words that are part of this subset are highlighted in boldface in Table III.

TABLE III.

Stimuli used in the production experiment. The forms in boldface were used in the subset concerned with the effects of the previous vowel on the palatalization contrast.

Final /C/VowelSingularPluralTranslation
/f/ [pantof] [pantofj] shoe/s 
[vətaf] [vətafj] bailiff/s 
[kartof] [kartofjpotato/es 
[zuluf] [zulufjcurl/s 
/v/ [bolnav] [bolnavjsick/pl 
[grozav] [grozavjgreat/pl 
[firav] [firavjfeeble/pl 
[zugrav] [zugravjpainter/pl 
/z/ [obez] [obezjobese/pl 
[kinez] [kinezjChinese/pl 
[ursuz] [ursuzjmorose/pl 
[mofluz] [mofluzjgrumpy/pl 
/ʃ/ [kokoʃ] [kokoʃj] rooster/s 
[kodaʃ] [kodaʃj] slacker/s 
[ʧireʃ] [ʧireʃjcherry tree/s 
[giduʃ] [giduʃjplayful/pl 
/h/ [parox] [paroçj] vicar/s 
[valax] [valaçj] Wallachian/s 
[kazax] [kazaçjCossack/s 
[monax] [monaçjmonk/s 
Final /C/VowelSingularPluralTranslation
/f/ [pantof] [pantofj] shoe/s 
[vətaf] [vətafj] bailiff/s 
[kartof] [kartofjpotato/es 
[zuluf] [zulufjcurl/s 
/v/ [bolnav] [bolnavjsick/pl 
[grozav] [grozavjgreat/pl 
[firav] [firavjfeeble/pl 
[zugrav] [zugravjpainter/pl 
/z/ [obez] [obezjobese/pl 
[kinez] [kinezjChinese/pl 
[ursuz] [ursuzjmorose/pl 
[mofluz] [mofluzjgrumpy/pl 
/ʃ/ [kokoʃ] [kokoʃj] rooster/s 
[kodaʃ] [kodaʃj] slacker/s 
[ʧireʃ] [ʧireʃjcherry tree/s 
[giduʃ] [giduʃjplayful/pl 
/h/ [parox] [paroçj] vicar/s 
[valax] [valaçj] Wallachian/s 
[kazax] [kazaçjCossack/s 
[monax] [monaçjmonk/s 

(2) Am să aleg cuvântul [kodaʃ/kodaʃj] când voi fi gata.

“I will choose the word [kodaʃ/kodaʃj] (‘slacker.M.sg./slacker.M.pl.’) when I am ready.”

The selection of a following velar stop may at first appear counterproductive to the purpose of the study as this context tends to be impoverished for obtaining place information for consonants. However, a following vowel would not have been feasible due to the phenomenon of coda resyllabification typically encountered in Romanian and other Romance languages, whereby a word-final consonant becomes an onset to a following vowel (in the case of word-final secondary palatalization, the palatal element might be resyllabified as a glide onset). The velar stop following the targets had previously been determined in an unpublished exploratory study not to induce anticipatory co-articulation effects on the target consonant (in contrast to a following labial).

The subjects produced twice as many fillers in addition to the target sentences. The fillers were also paired, showing inflections other than singular or plural, so as to distract the subjects from the target pairs. The items were presented three times to each subject, in a randomized order for each block. Thus, a full set of recordings contained 120 target items per subject: 5 consonants × 4 words per consonant × 2 forms per item (plain and palatalized) × 3 repetitions.

3. Subjects and procedure

Thirty-one native speakers of standard Romanian participated in the experiment. None of the participants had a history of speech or hearing disorders. Their linguistic and demographic background was recorded via questionnaires administered prior to testing. No participants were excluded from the study based on the info provided. They were recruited via email announcements sent to undergraduate groups at universities in Bucharest, Romania, with the help of faculty members. The subjects were 10 males and 21 females, ranging in age from 19 to 30, with an average age of 21.7 yr. They were all tested individually in a quiet room in Bucharest and received compensation for their time. The stimuli were presented and recorded using the InvTool program (Yarrington et al., 2008). Each sentence was displayed on a computer screen (using Romanian orthography) and the subjects were instructed to read the sentence as naturally as possible. Before beginning the actual experiment, the subjects completed a practice session with 20 items.

4. Data analysis

All of the subjects except one produced the full set of three repetitions; the remaining subject only produced two repetitions. Six items were rejected due to disfluencies, leaving 3674 items for acoustic analysis. Each segment was acoustically analyzed to obtain the duration and average spectral properties expressed as the first six coefficients of the Bark cepstrum, which describes the amplitude and shape of the speech spectrum in terms of a set of compact orthogonal components (Bunnell et al., 2004).

Less common in the phonetics literature, perceptually weighted cepstral coefficients are routinely used in speech recognition (Deller et al., 1993; Rabiner and Schafer, 2011), providing useful descriptions particularly for vowels and obstruents. They have also been used in clinical phonetics studies, with better results than spectral moments in the classification of stop release bursts (Bunnell et al., 2004). In the acoustic phonetics literature, Ferragne and Pellegrino (2010) recommended cepstral coefficients for computing distances between vowels. Their method yielded excellent results in estimating the acoustic distance between 13 accents of the British Isles.

While vowels and stop release bursts are acoustically different from fricatives, more recent studies demonstrate that the advantages of cepstral coefficients also apply to these segments. Mel-frequency cepstral coefficients (MFCC) were used to classify voicing in fricatives in British English and European Portuguese (Jesus and Jackson, 2008). More recently, Kong et al. (2014) obtained 85% correct classification for three places of articulation in fricatives using a set of 13 MFCCs. In Spinu and Lilley's (2016) study of Romanian fricatives, cepstral coefficients provided high correct classification rates and outperformed spectral moments in the classification of place of articulation, voicing, palatalization, and gender.

In the current study, the sequence of phonetic symbols comprising the transcription of each sentence was first aligned with the corresponding signal in the recordings. The initial automatic segmentation of the wavefiles was performed by maximizing the likelihood of a forced alignment of trained Hidden Markov models (HMMs) to the signal using the Viterbi algorithm (Viterbi, 1967). Each file was then visually inspected and alignment errors were corrected manually for the segments under investigation. Both the waveform and the wideband spectrogram of each token were used in verifying the segmentation. Fricative onset was defined as the point at which high-frequency energy first appeared on the spectrogram, and/or the point at which the number of zero crossings rapidly increased (Jongman et al., 2000). The end of each segment (plain or palatalized) was defined as the intensity minimum immediately preceding the silence during the closure of the following stop. Sample waveforms and spectrograms of one male and one female speaker's productions are shown in Fig. 1.

FIG. 1.

Waveforms and spectrograms for plain and palatalized consonants in two vowel contexts for one male speaker (left) and one female speaker (right). For each spectrogram, the first two productions contain the vowel [a] and the last two the vowel [o].

FIG. 1.

Waveforms and spectrograms for plain and palatalized consonants in two vowel contexts for one male speaker (left) and one female speaker (right). For each spectrogram, the first two productions contain the vowel [a] and the last two the vowel [o].

Close modal

The durations of the target consonants and preceding vowels were determined, and then a series of feature vectors comprising six Bark cepstral coefficients [direct current (dc) and the first five cosine terms—see  Appendix A] were extracted from the segments of interest using overlapping Hamming analysis windows, 20 ms wide and spaced 10 ms apart. Next, HMMs were used to divide the segments into three temporal regions of internally minimized variance. The goal was to maximize the differences between adjacent states and minimize the differences among feature vectors within the same state. HMMs thus partition the time-varying structure of segments into a series of piecewise approximately stationary regions, mimicking the dynamic nature of the segments. These separate temporal regions offer the possibility to examine the strength of the acoustic cues to palatalization progressively throughout each segment. The main reason for using HMM-defined regions is the focus on secondary palatalization. The degree of sequentiality of the secondary palatalization gesture was found to vary by speaker (Kochetov, 2002), with some speakers producing the primary and secondary gestures almost simultaneously. Especially for segments with acoustic properties that are similar to those of secondary palatalization, such as postalveolars, averaging over entire segments may obscure the presence of secondary palatalization. Examining their acoustic properties region by region, when the regions are determined based on internal variance, makes it more likely that, if at all sequential, the acoustic consequences of the palatalization gesture might be captured.

To train the HMMs, the segments were first divided into regions of approximately equal duration, and the initial six means and six variances of those regions provided the initial parameters of the model states. Then the boundaries between the regions were recalculated so that the total likelihood of the data is maximized with reference to the current model parameters. The means and variances of the new regions were obtained. This process was repeated until no feature vectors were reassigned (Viterbi, 1967; Baum et al., 1970).

After dividing each target segment into regions, the means of the features over all of the vectors in each region were calculated and used as input to the statistical analyses. This resulted in 36 different measures: 6 cepstral coefficients × 2 segments (fricative and preceding vowel) × 3 regions inside each segment. In the remainder of the paper, these measures will be labeled by composite names containing the specific coefficient used (e.g., C1), the segment from which it was extracted (C or V), and the region from which it was extracted (1, 2, or 3), as in C1.C.3, which refers to the first cepstral coefficient extracted from the third consonantal region. Two corpora of fricatives were constructed as follows: (1) all fricatives (3674 total)—this used the entire fricative corpus, but the information from preceding vowels was excluded. This was due to the fact that the uneven distribution of the vowels preceding the fricatives introduced confounds that could not be controlled for. Even with the vocalic information excluded, there is a possibility that coarticulatory effects carried into the frication portion and the presence of the vowel confound thus interfered with the classification. To address this, (2) a smaller subset was selected, with fricatives from three places of articulation: labiodental, postalveolar, and glottal/velar. These were all voiceless, and most importantly, the vowels preceding them were balanced between [a] and [o]. There were 1103 fricatives in this subset. In what follows, the two corpora are referred to as (1) “the ALL corpus” and (2) “the fʃh corpus.” It is the latter corpus that most figures and related discussions are based on.

Statistical analyses were carried out in the R environment (R Core Team, 2012) using the lme4 package (Bates et al., 2014) in order to find out how well duration and the cepstral coefficients extracted from different regions of the fricatives and preceding vowels predicted palatalization status. A binomial logistic regression with mixed effects was conducted on the ALL corpus to classify palatalization with 18 acoustic measures (6 coefficients × 3 fricative regions) as continuous explanatory variables and subject, word, and gender as random effects to control for the influence of different mean ratings associated with these variables. Duration and vocalic measurements were not included due to the fact that the vowels preceding the target consonants formed an unbalanced set as discussed in Sec. III A 2. To further avoid this vowel confound, a logistic mixed effects model was also conducted on the fʃh corpus to classify palatalization with subject, word, gender, and vowel as random effects. The fʃh corpus was balanced in terms of vowel distribution, with half of the target fricatives preceded by [o] and the other half by [a], so the six coefficients from the third vocalic region (adjacent to the fricative) were included together with duration and the 18 consonantal measurements for a total of 25 predictors. Correlation analyses were performed between duration differences (in plain vs palatalized) and cepstral coefficient differences. Next, subsets of measures were used in additional logistic mixed effects models with the fʃh dataset as follows: only measures from the first fricative region, only measures from the second fricative region, and only measures from the third fricative region, for a total of six predictors each time. These analyses were conducted in order to examine the discriminability between plain and palatalized forms progressively throughout the segment. Finally, one-way analyses of variances (ANOVAs) were conducted for each subject separately to see if each of the predictor variables (duration and cepstral coefficients from different regions) can reliably discriminate palatalization in postalveolars.

5. Results

In the binomial logistic mixed effects model for the entire (ALL) dataset, a significant effect on palatalization was found for 10 out of the 18 combinations of cepstral coefficients and temporal regions: C1.C.1 (p < 0.01), C2.C.3 (p < 0.001), C3.C.1 (p < 0.05), C3.C.2 (p < 0.05), C3.C.3 (p < 0.001), C4.C.1 (p < 0.01), C4.C.3 (p < 0.001), C5.C.1 (p < 0.01), C5.C.2 (p < 0.001), C5.C.3 (p < 0.001). The top five predictors (with the highest coefficients) were, in descending order, C4.C.3, C3.C.3, C3.C.2, C2.C.3, and C4.C.1. For the fʃh dataset, a significant effect on palatalization was found for 7 out of the 25 measures: C0.C.1 (p < 0.05), C0.V.3 (p < 0.05), C2.C.3 (p < 0.05), C3.C.3 (p < 0.001), C4.C.3 (p < 0.001), C4.V.3 (p= 0.01) and C5.C.1 (p < 0.05).

Classification of palatalization was 90.96% accurate for the ALL dataset, and 90.3% accurate for the fʃh model. For ALL, the correct classification rates by consonant ranged from 66.3% correct for plain [ʃ] to 100% correct for both plain and palatalized /h/. For fʃh, the lowest correct classification rate was 73.9% for plain [ʃ], and the highest was 100% for both plain and palatalized /h/. Overall, the classification of palatalization in postalveolars was higher than chance (68.7% for the ALL set and 74.2% for the subset), indicating the existence of acoustic differences between the plain and palatalized forms. See  Appendix B for the full classification table.

The results for duration and the different cepstral coefficients were reported in Spinu et al. (2012), to which the reader is referred for more details. For duration, while palatalized consonants were always longer than the plain ones, the difference was only significant (p < 0.05) in the case of /h/ and /v/. In the current study, a series of correlation analyses were used to investigate the relationship between duration and the six cepstral coefficients. It was found that differences in duration between plain and palatalized forms correlate significantly with differences in C1 (R2 = 12.25%) and C3 (R2 = 16.81%). The findings suggest that subjects who produce smaller durational differences between plain and palatalized forms also tend to produce hypoarticulated speech more generally, in which acoustic cues related to the secondary palatalization contrast (and possibly other aspects of the speech signal) are also diminished.

Figure 2 displays three-dimensional plots of the top three predictors (that is, C4 and C3 from the last consonantal region and C4 from the last vocalic region) in the fʃh subset, for each consonant separately. The top three predictors were defined as the variables with the largest coefficients and a minimum required level of significance (p ≥ 0.01). The separation of the tokens into relatively distinct areas is apparent, though more so for the labiodentals and glottals/velars than for the postalveolars.

FIG. 2.

(Color online) Three-dimensional scatterplots of the three most informative acoustic measures for classification of palatalization. Each point represents a single fricative token in the dataset. The first part of each axis label stands for the coefficient (e.g., C4), the middle part stands for the segment from which it was extracted (C or V), and the last part stands for the region from which it was extracted (1, 2, or 3). E.g., C3.C.3 = the third cepstral coefficient extracted from the third region of the fricative. CON = consonant. Pal = palatalized.

FIG. 2.

(Color online) Three-dimensional scatterplots of the three most informative acoustic measures for classification of palatalization. Each point represents a single fricative token in the dataset. The first part of each axis label stands for the coefficient (e.g., C4), the middle part stands for the segment from which it was extracted (C or V), and the last part stands for the region from which it was extracted (1, 2, or 3). E.g., C3.C.3 = the third cepstral coefficient extracted from the third region of the fricative. CON = consonant. Pal = palatalized.

Close modal

Figure 3 shows the durations of the different regions identified via the procedure described in Sec. III A 4 for plain and palatalized forms of labial (labiodental), postalveolar, and glottal/velar segments. While the regions of plain consonants appear very similar, more variability is noted for palatalized forms, with the third region being longer for the labiodental consonant (compared to the plain form) and much longer (>120 ms) for the glottal/velar. Thus, the presence of palatalization in labiodentals and glottals/velars causes a shift in the third region of the segment, which does not appear to be the case for postalveolars.

FIG. 3.

Region duration (ms) for plain and palatalized forms at three places of articulation. Each segment was divided into three regions following the procedure described in Sec. III A 4.

FIG. 3.

Region duration (ms) for plain and palatalized forms at three places of articulation. Each segment was divided into three regions following the procedure described in Sec. III A 4.

Close modal

Figure 4 displays palatalization classification results for logistic mixed effects models trained either on all parameters (including vocalic information), or just parameters extracted from a single fricative region, for each gender separately. The glottal/velar classification rates are the highest throughout, suggesting that the difference in primary place of articulation (plain = velar, palatalized = palatal) robustly encodes the secondary palatalization contrast throughout the segment. No differences are noted between the two genders. For the labiodentals, the final region yields the highest correct classification rates, which may be due to a more sequential realization of the secondary palatalization gesture at this place of articulation, and there are hardly any differences between the two genders. Finally, postalveolars also show an increase in the last region but only for female productions. Unlike for labiodentals and glottals/velars, a difference can be seen in the postalveolar data between the classifications based only on the last region and those based on all fricative regions plus vocalic information, especially for the male data. This suggests that in the case of postalveolars vocalic transitions contain important information pertaining to the secondary palatalization contrast.

FIG. 4.

Palatalization classification results for models trained either on all 25 parameters (black columns) or just parameters extracted from a single fricative region (grey columns). The column labels show the gender (M or F) and the region (R1, R2, or R3). The label ALL is used for the entire parameter set. CON = consonant.

FIG. 4.

Palatalization classification results for models trained either on all 25 parameters (black columns) or just parameters extracted from a single fricative region (grey columns). The column labels show the gender (M or F) and the region (R1, R2, or R3). The label ALL is used for the entire parameter set. CON = consonant.

Close modal

Last, one-way ANOVAs were conducted for each subject separately to find out whether each of the 25 variables discriminates between plain and palatalized postalveolars. Because fewer items were included in these analyses (i.e., 12 tokens for each plain and palatalized segment, and eight for the speaker who only completed two of the three blocks), the sample size is substantially reduced. Under these circumstances, near-significant values (between 0.05 and 0.1) were also considered. It was found that only one of the 31 subjects did not produce either significant or near-significant differences between plain and palatalized postalveolars in any of the measures examined. Three other subjects produced near-significant differences, but no significant ones. The remaining 27 subjects all produced significant differences in some of the measures. For these subjects, the average number of measures with significant differences is 3.2 (out of 25 measures); one subject showed significant differences in nine measures. There was a great deal of variability in which measures differed significantly between the two forms. The only measure for which neither significant nor near-significant differences were found in any of the subjects was c0 extracted from the third vocalic region.

6. Summary

The results of the analyses reported in Sec. III A 5 indicate that there is evidence of acoustic separation between plain and postalveolar Romanian fricatives, thus confirming the hypothesis from Sec. III A 1, contrary to the findings of Spinu et al. (2012). The differences between the two studies are likely due to the refinement in the analysis methods employed: division into regions in order to capture the dynamic nature of the contrast, addition of vocalic measurements as predictors of palatalization, and the use of logistic mixed-effects models for classification. A secondary finding was that there are certain differences in classification rates when the productions of males and females are analyzed separately, but only for the postalveolar segments. These findings indicate that males may be more prone to hypoarticulating the secondary palatalization contrast at this place of articulation.

The fact that acoustic differences were found between [ʃ] and [ʃj] casts some doubts over the relatively low sensitivity results obtained in previous perceptual experiments. Not only is this contrast produced by native speakers of Romanian, it is also acquired, which suggests that it must be perceptually salient to some extent. There is a possibility that the set-up of previous experiments did not encourage listeners to focus on it and thus failed to reveal an existing perceptual pattern. In the perception experiment reported in Spinu et al. (2012), the target words were presented in a carrier sentence that was semantically and grammatically neutral with respect to their status as singular (no palatalization on the word-final consonant) or plural (with palatalization on the word-final consonant). It is possible that some of the perceptual patterns were obscured or diminished by the fact that the target items were presented in structures in which either a singular or plural interpretation would have yielded a grammatical structure. Hawkins (2010) argues that, “while systematic phonetic variation in the spoken signal can strongly influence perception, it cannot do so unless the listener is able to relate the variation in sound patterns to different functions/meanings.” This is because linguistic information in the signal maps to more than just “low-level” abstract units (e.g., phonemes, features, or words), many of which are non-phonological (e.g., bound morphemes, function words, content words, auxiliary verbs, pronouns, etc.). No formal unit of linguistic structure is prior to any other during speech processing: “meaning (including function) is prior, and all potential units that allow meaning to be quickly understood are valuable” (Hawkins, 2010).

Given that normally there is rich agreement in Romanian (subject-predicate, nouns-determiners/modifiers, objects-pronominal clitics), the question arises whether the presence of morphological information in normal speech might lead listeners to pay more attention to the information conveyed by palatalization. The experiment described in this section investigates the perception of the plain-palatalized contrast in the presence of morphological cues either consistent or conflicting with the information (i.e., grammatical number) encoded by the presence or absence of palatalization. It may be the case that, while subjects are able to hear the contrast between plain and palatalized postalveolars, they do not particularly focus on it unless a certain degree of functional load is involved. If this experimental set-up makes the presence or absence of palatalization more detectable, particularly where mismatches are involved, subjects might demonstrate higher sensitivity to the secondary palatalization contrast in postalveolar fricatives compared to previous studies.

1. Hypothesis

The hypothesis formulated here is partially based on the phenomenon of “mismatch effect” or “mismatch cost” observed in neurolinguistic studies. Incongruent phonological or morphological cues were found to elicit larger brain responses than correct conditions (Leminen et al., 2016; Arbour, 2012; Archibald and Joanisse, 2011; Scharinger et al., 2010). For example, phonologically inconsistent words produced greater activation in medial frontal gyrus/anterior cingulate cortex (Binder et al., 2005; Tan et al., 2001). Behaviorally, incongruent stimuli were found to affect subjects' responses in lexical decision tasks (Janke and Kolokonte, 2015), result in processing disruptions such as longer reading times (Haskell and McDonald, 2003; Pearlmutter et al., 1999; Marslen-Wilson and Warren, 1994), and slower grammatical category decisions (Arciuli and Monaghan, 2009). Most importantly, behavioral responses were more accurate and faster for phonological mismatches with different sounds than for phonological matches in Lee et al. (2012). Since the results from Sec. III A 5 indicate that the contrast is acoustically present and produced by the majority of the speakers, the expectation is that conflicting morphophonological cues will result in enhanced perceptual sensitivity to the secondary palatalization contrast in postalveolar fricatives. Thus, a sensitivity value higher than previously obtained by Spinu et al. (2012), i.e., d′ = 0.79, is expected.

2. Stimuli

Each of the target words appeared in four different types of constructions. The plain and palatalized consonants were placed in carriers that either matched or did not match them with regard to the information about grammatical number conveyed by the presence or absence of secondary palatalization. Four types of constructions were used, as follows:

plain matched: the target word is in the singular; the additional number cue is singular (e.g., [un kokoʃ] one rooster).

plain mismatched: the target word is in the singular; the additional number cue is plural, thus conflicting with the information provided by the absence of palatalization (e.g., [patru kokoʃ] *four rooster).

palatalized matched: the target word is in the plural; the morphological cue is in the plural (e.g., [patru kokoʃj] four roosters).

palatalized mismatched: the target word is in the plural; the additional cue is in the singular, in conflict with the information provided by the presence of palatalization in the target word (e.g., [un kokoʃj] *one roosters).

The construction of the stimuli for this experiment involved two steps. The first step consisted of recording the target words in grammatical sentences. The same words as in the production experiment described in Sec. III A 2 were used (Table III), for a total of 40 target items (5 segments × 2 palatalization conditions × 4 words). Twice as many fillers were included, for a total of 120 sentences. The set of stimuli was repeated three times, with short breaks in between the repetitions. The order of the sentences was randomized for each presentation.

Fifteen different speakers (3 males, 12 females, mean age = 20.6) produced the sentences for this experiment. These speakers were selected among a larger pool of potential speakers, n = 31. These speakers were not the same as those who participated in the production experiment reported in Sec. III A, but they were drawn from the same demographic pool via the same methods. They were all native speakers of the standard dialect of Romanian. None of the subjects had any known speech or hearing disorders.

Each speaker went through a short practice session prior to recording the experimental items. The practice stimuli contained 20 sentences that were very similar to the ones used in the experiment. Each practice sentence was preceded by a prompt recorded by a male native speaker to demonstrate the appropriate speaking rate to the speakers. The speakers were asked to replicate the speaking rate they had heard to the extent possible. Audio prompts were not used in the recording of the actual test items. Both the practice session and the experimental session were administered in a quiet room in Bucharest, using a Sony laptop and a head-mounted microphone for recording. The software for presenting and recording the sentences was InvTool (Yarrington et al., 2008). No automatic manipulation of the sound files took place.

The second step consisted of splicing out the targets from the sentences in which they were recorded and using them to create the sentences to be employed in the perception study. The alternative would have been to record all items directly. Since this experiment used both grammatical and ungrammatical stimuli, however, ungrammatical sentences might cause the speakers to (consciously or unconsciously) hyperarticulate or exhibit other effects such as unusual intonation due to unfamiliarity with the stimuli. Because of this possibility, all sentences recorded were grammatical, and the ungrammatical items were constructed through cross-splicing of the target items from the original (morphologically appropriate) context into a morphologically mismatched context. Furthermore, to avoid the presence of splicing as a crucial difference between grammatical and ungrammatical sentences, the grammatical sentences were also constructed by splicing out the target items from the sentences in which they had been originally recorded and inserting them into appropriate contexts. Each target sentence contained exactly one morphological cue with regard to the number status (singular or plural) of the target words.

Only one repetition of each sentence per subject was used in the perception experiment. Consequently, the total number of target sentences was 1200 (5 segments × 4 words × 2 palatalization conditions × 2 contexts × 15 speakers). No fillers were used. Two lists of 600 stimuli were prepared, each containing half of the recordings. For each of the speakers, half of the targets were added to one list and the other half to the other list, such that the matched and mismatched condition for one word did not come from the same speaker.

3. Subjects and procedure

Thirty-one subjects (11 males, 20 females, mean age = 24.25) participated in this experiment, and were paid for their participation. None of the subjects reported having any current or past speech or hearing disorders. These subjects had not participated in the production experiment described in Sec. III A.

In order to limit the duration of the experiment to approximately one hour, each subject heard only one of the two lists of stimuli. The subjects were sequentially assigned to one of the lists, e.g., Subject 1 to List 1, Subject 2 to List 2, Subject 3 to List 1, etc. The order of presentation of the stimuli was randomized for each subject. A short break was taken after the first 300 sentences.

The experiment was designed and administered using the E-Prime Software, and the subjects listened to the sentences over headphones. The task consisted of pressing a key on a keyboard to indicate whether a given sentence was perceived as being acceptable or unacceptable (i.e., likely or unlikely to have been uttered by a speaker of Romanian). To eliminate the possibility of interference with motor dominance, the response keys were counterbalanced.

A practice session was administered before the experimental block so as to familiarize the subjects with the task. The practice session contained 20 sentences, half matched and half mismatched.

4. Data analysis

The dependent variable sensitivity was computed for each subject for all five consonants, and the values were compared in a repeated-measures within-subjects ANOVA with the independent factors of consonant and palatalization. The statistical test for sensitivity is known as the d′ (d prime) statistic in the context of Signal Detection Theory (cf. Wickens, 2002). In the perceptual experiment reported here, bias was taken into account by using both the number of hits (how many times a signal was correctly identified; in our case, how many of the mismatched targets were perceived as mismatched), and the number of false alarms (that is, how many times a signal was incorrectly identified; in our case, how many of the matched targets were identified as being mismatched). The d′ scores by consonant for each subject were computed, and a repeated-measures ANOVA was conducted with sensitivity (d′ scores) as the dependent variable and consonant and palatalization as independent variables.

5. Results

Figure 5 shows the sensitivity (d′) scores for each consonant, broken down by palatalization status. The repeated-measures ANOVA showed that consonant identity had a significant effect on sensitivity, F(4, 27) = 82.59, p < 0.001. No main effect of palatalization or the interaction of consonant and palatalization was found. Pairwise comparisons with the Bonferroni adjustment for multiple comparisons show that listeners' sensitivity to /f/ differed significantly from /z/ and /ʃ/, sensitivity to /v/ differed significantly from /ʃ/ and /h/, sensitivity to /z/ differed significantly from /ʃ/ and /h/, sensitivity to /ʃ/ differed significantly from all other places, and sensitivity to /h/ differed significantly from all places but /f/.

FIG. 5.

Mean sensitivity scores (d prime) to plain and palatalized consonants.

FIG. 5.

Mean sensitivity scores (d prime) to plain and palatalized consonants.

Close modal

Generally speaking, the larger the d′ value, the higher the sensitivity to a certain contrast. A d′ value of zero means that trials with the target cannot be reliably distinguished from trials without the target, whereas a d′ of 4.65 indicates a nearly perfect ability to distinguish between trials that include the target and trials that do not include the target. The latter appears to be the case of the glottal/velar fricative, which had the highest sensitivity scores for both the plain and the palatalized forms (d′ = 4.51, averaged over the two). The postalveolar /ʃ/ was the consonant to which the subjects were least sensitive, with a d′ score of 0.93, averaged over the two categories (plain and palatalized). This score corresponds to moderate sensitivity.

6. Summary

The results of the perception experiment were consistent with those of the production experiment. The analysis of sensitivity scores revealed the same general patterns: glottals/velars tended to be the most favorable hosts for the palatalization contrast, with the highest sensitivity values. The postalveolar place of articulation was, by contrast, the least favorable. The d′ value for this place was higher than previously obtained (Spinu et al., 2012), showing that the inclusion of matched and mismatched cues to the presence of palatalization causes listeners to pay more attention to it. Thus, the hypothesis was supported by the findings.

The current study shows that for postalveolar fricatives in Romanian: (a) the plain vs palatalized form can be distinguished reliably based on cepstral measurements (though not to the same extent as with other places of articulation), (b) the secondary palatalization contrast is acoustically realized at this place by 27 out of 31 speakers, (c) women appear to produce the contrast more robustly compared to men, and (d) the contrast is perceptually salient, with listeners displaying moderate sensitivity to it, compared to high sensitivity at the other places examined. The specific ways in which men produce lower acoustic differentiation between plain and palatalized forms remain unclear, but they might involve the reduction or absence of the secondary palatal gesture. Future articulatory studies could clarify whether this is indeed the case.

The main reference point with which these findings can be compared is the 1961 study (Şuteu, 1961) regarding the pronunciation of Romanian words. According to Şuteu, 292 of 309 speakers (all from Bucharest, Romania) reported making a distinction between the singular and the plural form of a word ending in a postalveolar fricative. While helping to establish a diachronic perspective, this study was not controlled experimentally, and it involved self-described pronunciation without acoustic analysis. The possibility arises that there is no exact overlap between the speakers' intuitions and their pronunciation. This being the case, the comparison between today's state of affairs and that in 1961—and any discussion of potential neutralization—is tentative at best, as no benchmark of the extent of palatalization and the robustness of the contrast in Romanian spoken in the past can be established. The only thing we know for certain is that today males produce somewhat less reliable cues to postalveolar palatalization than females, and that in itself may be interpreted as a sign of either incipient or steady-state (partial) neutralization. Nevertheless, most speakers produce this contrast. Since the mean age of the subjects used in the production experiment reported here was 21.7 yr, there could not have been any overlap in the generations that we are comparing.

According to licensing by cue (Kochetov, 1999, 2002; Steriade, 1999), the distribution of a phonological contrast is sensitive to the amount of acoustic information available in a given environment, such that (a) if environment A provides more acoustic information to a contrast between two segments /x/ and /y/, the identification of the contrast by listeners is likely to be high and, as a result, the contrast will be preserved, whereas (b) if environment B provides less acoustic information to the contrast, the identification rate of /x/ vs /y/ will tend to be lower, and the contrast is more likely to be neutralized. Adding to licensing by cue, the phonetic knowledge hypothesis (Hayes and Steriade, 2004) posits that sound patterns in languages reflect the activity of grammatical constraints induced from speakers' implicit knowledge of certain facts of phonetic difficulty. Perceptually fragile contrasts tend to undergo one of two changes: enhancement or neutralization. This is known as phonetic knowledge-based grounding. It should be mentioned, however, that this view is not uncontroversial. While certain approaches in phonology emphasize the role of physical constraints as the bases of phonological organization and contrast (Browman and Goldstein, 1995) and incorporate facts of acoustic contrast and ease of articulation as constraints in the phonological grammar (Steriade, 1999), other views emphasize the separation between quantitative (acoustic and perceptual) aspects of speech and the phonological representations which encode relational information between the paradigmatic contrasts of a given natural language (Beckman, 1999; Pierrehumbert, 1990).

Historical evidence suggests that perceptually weak contrasts can take the neutralization path. Coda position, for example, has been claimed to be a weaker position for the realization of acoustic cues for consonants in general (Fujimura et al., 1978; Ohala, 1990; Wright, 2004). Steriade (1999) lists seven phonetic properties known as cues to the voicing contrast in obstruent stops: closure voicing, closure duration, duration of the preceding sonorant, F1 value in the preceding vowel, burst duration/amplitude, voicing onset time, and F0 and F1 values at the onset of voicing in a following sonorant. In coda position, when preceded by a sonorant, three of these are potentially missing: burst duration/amplitude, voicing onset time, and F0 and F1 values at the onset of voicing in a following sonorant. When preceded by an obstruent, only the first two of these cues remain available. It is thus not surprising that coda laryngeal contrasts are typologically more marked than onset laryngeal contrasts. The voiced-voiceless distinction has been neutralized in Russian obstruents (and many other languages) in this position. In the specific case of secondary palatalization, the plain-palatalized contrast with labials in coda position tends to be neutralized cross-linguistically (Kochetov, 2002).

While somewhat difficult to quantify what counts as “more” or “less” acoustic information, the type of information that results in moderate perceptual sensitivity to a contrast, as was found with secondary palatalization in postalveolars in Romanian, would likely fall towards the low end of the scale. This would predict neutralization, and thus the possibility that postalveolar palatalization is subject to a change in progress in Romanian is quite plausible. In fact, neutralization of this contrast may be in progress, as attested by the fact that males appear to realize it to a lesser extent than females.

As far as enhancement is concerned, cases of the plain-palatalized contrast becoming enhanced are well attested. In Russian, plain consonants were described as velarized (Halle, 1959; Trubetzkoy, 1969) or, more recently, as either uvularized or velarized (Litvin, 2014), which is interpreted by Padgett (2001)—following Trubetzkoy (1969)—as a means of enhancing the secondary palatalization contrast. In Romanian, the fricative [s] contrasts with [ʃj] and, to a lesser extent, [z] contrasts with [ʒj]. Spinu and Lilley (2016) also show that the plain-palatalized contrast in fricatives at the glottal/velar place of articulation is implemented as a velar for plain forms and palatal for palatalized ones. It is not clear why /ʃ/ did not follow a similar route. Possible enhancement strategies would include (a) strengthening to an affricate, as attested for Catalan sibilants which become affricates word-initially and after a consonant, e.g., [ʃ → ʧ, ʒ →ʤ] (Lavoie, 2001); (b) replacement of the secondary articulation with a full-fledged vowel, as in Italian, in which there is no secondary palatalization and the plural affix (identical to the Romanian one underlyingly) is realized as a vowel, e.g., “wolves” is [lupi], compared to Romanian [lupj] (Lampitelli, 2014); or (c) fortition to a full-fledged stop as in Burushaski [x →q, ɣ →g, h →k] (Lavoie, 2001).

To summarize this subsection, while there is no evidence of enhancement, there is a possibility that the plain-palatalized contrast in postalveolar fricatives is undergoing incipient neutralization in Romanian, though the available data suggest this may either be a very slow process or very recently initiated.

While the evolution of this contrast over time cannot be established with certainty, the experimental findings reported here indicate that the contrast is nowadays present in the production of Romanian speakers. This situation warrants closer investigation of the factors that may override its relatively weak acoustic distinctiveness (compared to other places) and conspire to its maintenance. As explained in Sec. II C, the contrast carries a high functional load being associated with the presence of a morphological marker in the vast majority of cases. Enhancement might thus be predicted to be favored but, as discussed in Sec. IV C, no indication of enhancement of the secondary palatalization contrast at this place can be found.

The issue of morphological conditioning brings forth another possibility for the maintenance of a weak contrast, specifically that of grammatical restructuring (Kochetov, 2002). A grammar constructed by a learner can be restructured under pressure from higher-level phonological categories and morphological alternations (Kochetov, 2002). Indeed, Kochetov suggests that deviations from the general patterns observed across languages may be due to properties of the lexicon and grammar of the languages in question. For example, specific non-phonetic characteristics of languages may in some cases override the general markedness patterns established on the basis of phonotactic, articulatory, and perceptual properties. Thus, a particular contrast might be maintained in a less favorable environment if the pressure from additional factors is sufficiently strong, especially with regard to productivity and morphophonological transparency (Kochetov, 2002). Other than productivity, the strength of this pressure depends on the relative salience of these morphological categories (Pierrehumbert, 2001). Highly productive, morphologically transparent alternations are predicted to have stronger effects. The plain-palatalized contrast in less favorable phonotactic environments can be maintained if a lexical item is involved in transparent alternations, or if it signals certain morphological categories. The role of these factors in the maintenance of phonological contrasts should not be overestimated, however, as in most cases the factors that induce neutralization seem to override paradigm uniformity, e.g., syllable coda depalatalization in standard Bulgarian, where palatalized labials and coronals have been depalatalized even in the presence of substantial paradigmatic evidence from alternations (Carlton, 1990). In Polish, the palatalization contrast in labial consonants was similarly neutralized in the coda environment (Kochetov, 2002).

The persistence of Russian palatalized consonants in medial clusters (the most unfavorable environment) may be an example of grammatical restructuring. Also, in the Nova Nadezhda dialect of Bulgarian, all palatalized stops are allowed in word-medial clusters, but these result from the addition of highly productive inflectional or derivational affixes (Kochetov, 2002). Isthmus Mixe also has plain and palatalized postalveolars, which are morphologically conditioned, just as in Romanian (Dieterman, 2008). In the case of Romanian, the high functional load carried by the plain-palatalized distinction might override more general markedness effects. While this might explain why the plain-palatalized contrast with postalveolars has been retained in Romanian, it does not provide an answer as to why it has not been enhanced.

The lack of enhancement is relevant from the point of view of acquisition. If adults cannot reliably perceive the plain-palatalized contrast in postalveolars, presumably children cannot either, since their perceptual system was shown to be similar to that of adults after the age of 1. This is due to a retuning of attention and ensuing gradual decline of the ability to distinguish non-native phonological contrasts once language-specific knowledge is acquired (Aslin et al., 1998; Polka and Werker, 1994; Werker and Lalonde 1988; Werker and Tees, 1984). Together with the morphological factors, orthography may also play an important part in the maintenance of this contrast, since any secondarily palatalized consonant is spelled as a “Ci” sequence. If the role played by orthography is crucial, the distinction would presumably be absent before learning the correct spelling. A future longitudinal study could establish if the contrast in postalveolars is acquired before the age of literacy (based on morphological patterns alone) or after learning that the plural (palatalized form) is rendered with an additional “i” in orthography. Anecdotal evidence suggests that when Romanian children start learning how to write, failing to add the plural “i” at the end of words ending in a /ʃ/ is a relatively common mistake. This would be a case of late acquisition due to external pressure. If acquisition is not delayed, it may be the case that the visual differences in the articulation of the plain and palatalized forms (i.e., lip rounding in the plain form, and lip spreading in the palatalized one) play an important part in the intergenerational transmission of this contrast.

As previously discussed, some of the limitations of this study were imposed by the make-up of the original corpus and the original research goals associated with it. The most important of these has to do with the unbalanced distribution of the vowels preceding the fricatives, which resulted from the decision to include real, frequently encountered, disyllabic, stress-final words of Romanian. This problem was addressed by investigating a balanced subset separately, consisting of about a third of the original data. It was found that the overall patterns for production and perception alike were similar in both the full corpus and the subset.

A second methodological issue concerns the cepstral coefficient set employed in the acoustic analyses. As pointed out by an anonymous reviewer, the spectral properties of fricatives are strongly influenced by the vocalic context in which they are produced, especially in the case of labiodental and glottal fricatives. The primary information that cepstral coefficients will model in coda [f], [v], and [h] is the shape of the tract formed to produce the preceding vowel, which will persevere into the coda fricative; very little information in six-feature vectors will correspond to the labial or glottal constrictions that primarily characterize the non-palatalized variants of these fricative pairs. The subset fʃh partially avoided this issue because only labiodental [f] is subject to these effects (the reader is reminded that Romanian /h/ is realized mostly as a velar or palatal fricative in the context investigated here). Even though the reduced set employed here was successful with various classification tasks applied to corpora of Romanian (Spinu and Lilley, 2016), it was not an ideal choice for the current study due to its focus on postalveolars, in which all six coefficients employed tended to converge. Inasmuch as the originally formulated goal of uncovering acoustic differences between plain and palatalized postalveolars was attained, the usefulness of the restricted set employed here should not be underestimated. However, the use of a standard 14-coefficient parametric model, which may be able to tease apart consistent differences in tract shaping characterizing plain vs palatalized fricatives, is recommended for future work.

Finally, the lack of acoustic measurements to accompany the results of the 1961 study makes it impossible to compare the status of secondary palatalization in Romanian postalveolars at these two distinct points in time; therefore, all discussion of potential neutralization can only be tentative. The possibility of a real comparison may become available with the collection and annotation of corpora of Romanian spoken at different points in time. In recent years, efforts have been made to join together corpora of spoken Romanian (often referred to as a less-resourced language) produced in various communicative frameworks, as a first attempt to foster resources for corpus-based linguistic studies (Vasilescu et al., 2014). If corpora of Romanian broadcasts from the 1960s become available in the future, it will become feasible to address the issue of neutralization of the plain-palatalized contrast, among others. Furthermore, future articulatory studies (for example, electromagnetic midsagittal articulometer systems) are needed to help settle some of the remaining questions involving the realization and neutralization of this contrast. Unrelated to postalveolars, another question raised by the current work is whether the palatalized form of /h/, which has been referred to as [çj] throughout this paper (for consistency with the paradigm), is actually accompanied by a palatalization gesture, thus making it a true palatalized palatal, speculated to be impossible (Hall, 1997), but marginally attested in Mordvin (Campbell, 1974). An articulatory study would be crucial in determining whether [çj] exists in Romanian.

This study documents a rare cross-linguistic contrast, that between plain and palatalized postalveolar fricatives in Romanian. While experimental evidence shows that this contrast conforms to typological predictions of being acoustically and perceptually weaker compared to other places of articulation, most speakers' productions of the plain and palatalized form are acoustically distinct. This situation may have persisted for the past 57 years. Some evidence of neutralization was found with four of the speakers, and the productions of males yielded lower classification rates compared to those of females. The fact that to date this contrast has been neither neutralized (possibly due to morphological restructuring) nor enhanced (when other places within the same paradigm are enhanced) is a good example of the lack of 1-to-1 correspondence between the phonetic factors triggering neutralization and actual neutralization patterns attested in individual languages. In a sense, this contrast appears too important to lose (presumably because of its high functional load) but also stable enough so as not to become enhanced, which may cause it to remain in a state of inertia, almost as if it were below a threshold of phonetic/phonological “maneuverability.” Following Kochetov (2002), the best conclusion is that an explanation for neutralization, and phonotactic patterns in general, should not be sought only in phonology or only in phonetics, but in the interaction of phonetic factors with the phonological grammar—and possibly beyond.

This research was supported in part by doctoral dissertation research improvement Grant No. 0720231 from the National Science Foundation. Special thanks are owed to Jason Lilley and Florin Spinu for helping with some of the analyses. The author is also indebted to two anonymous reviewers for their generous contribution.

Cepstral coefficients are the sums of the products of the filtered log magnitude speech spectrum and each of the cosine waves illustrated in Fig. 6. Thus, c0 is the sum of all the spectral values multiplied by 1.0, i.e., the total spectral energy. For the higher order coefficients, each of these will be large and positive when the speech spectrum looks just like the feature vector. The coefficient will be large and negative when the speech spectrum is the complement of the feature vector, i.e., large where the feature vector is most negative, and small where the feature vector is most positive. A good way to conceptualize these coefficients is that they contrast energy in one or more parts of the spectrum with energy in other parts of the spectrum. So, for example, C1 contrasts low-frequency energy vs high-frequency energy; if the spectrum is low-frequency heavy (e.g., vowel) C1 will be positive, while a spectrum with predominant high-frequency energy (e.g., voiceless sibilant fricative) will have a negative C1, and the magnitude of the difference between low- and high-frequency energy determines the magnitude of C1. The C2 coefficient is positive when mid-frequency energy is weak compared to low- and high-frequency energy, and negative when energy in the mid-frequencies is strong relative to low- and high-frequency regions. The magnitude of the difference in energy between mid vs low + high frequencies determines the magnitude of the coefficient. It is also worth remembering that because the spectrum is warped to Bark units, terms like low, mid, and high are not linear frequency descriptors. For example, C2 is strongly positive for a vowel like [a] that has F1 and F2 close together in the region of 1 kHz, and C2 is typically weak or even negative for a vowel like [i] that has a very low F1 and high F2 and a valley between them.

FIG. 6.

Bark-scaled cosine terms used in the computation of cepstral coefficients 0–5.

FIG. 6.

Bark-scaled cosine terms used in the computation of cepstral coefficients 0–5.

Close modal

Correct classification rates (%) for the ALL and fʃh datasets broken down by consonant and palatalization status.

ConsonantPalatalizationALL datasetfʃh dataset
plain f 98.6 98.9 
palatalized fj 95.9 94.5 
plain v 97.5  
palatalized vj 95.3  
plain z 95.1  
palatalized zj 89.6  
ʃ plain ʃ 66.3 73.9 
palatalized ʃj 71.0 74.4 
plain x 100 100 
palatalized çj 100 100 
OVERALL both plain and palatalized 90.96 90.3 
ConsonantPalatalizationALL datasetfʃh dataset
plain f 98.6 98.9 
palatalized fj 95.9 94.5 
plain v 97.5  
palatalized vj 95.3  
plain z 95.1  
palatalized zj 89.6  
ʃ plain ʃ 66.3 73.9 
palatalized ʃj 71.0 74.4 
plain x 100 100 
palatalized çj 100 100 
OVERALL both plain and palatalized 90.96 90.3 
1.
Arbour
,
J.
(
2012
). “
The dynamic role of subphonemic cues in speech perception: Investigating coarticulatory processing across sound classes
,” Master's thesis,
McMaster University
,
Hamilton, ON, Canada
.
2.
Archibald
L. M.
, and
Joanisse
,
M. F.
(
2011
). “
Electrophysiological responses to coarticulatory and word level miscues
,”
J. Exp. Pyschol. Human Percept. Perform.
37
(
4
),
1275
1291
.
3.
Arciuli
,
J.
, and
Monaghan
,
P.
(
2009
). “
Probabilistic cues to grammatical category in English orthography and their influence during reading
,”
Sci. Studies Read.
13
(
1
),
73
93
.
4.
Aslin
,
R. N.
,
Jusczyk
,
P. W.
, and
Pisoni
,
D. B.
(
1998
). “
Speech and auditory processing during infancy: Constraints on and precursors to language
,” in
The Handbook of Child Psychology: Cognition, Perception, and Language
, edited by
D.
Kuhn
and
R.
Siegler
(
Wiley
,
New York
), pp.
147
254
.
5.
Bateman
,
N.
(
2007
). “
A crosslinguistic investigation of palatalization
,” Ph.D. dissertation, University of California, San Diego.
6.
Bates
,
D.
,
Maechler
,
M.
,
Bolker
,
B.
, and
Walker
,
S.
(
2014
). “
lme4: Linear mixed-effects models using eigen and s4 [Computer software manual]
,” http://CRAN.R-project.org/package=lme4 (Last viewed December 15, 2017).
7.
Baum
,
L. E.
,
Petrie
,
T.
,
Soules
,
G.
, and
Weiss
,
N.
(
1970
). “
A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains
,”
Ann. Math. Stat.
41
(
1
),
164
171
.
8.
Beckman
,
M. E.
(
1999
). “
Implications for phonological theory
,” in
Coarticulation; Theory, Data and Techniques
, edited by
W. J.
Hardcastle
and
N.
Hewlett
(
Cambridge University Press
,
Cambridge, UK
), pp.
199
228
.
9.
Bhat
,
D. N. S.
(
1978
). “
A general study of palatalization
,” in
Universals of Human Language, Vol. 2: Phonology
, edited by
J. H.
Greenberg
(
University Press Stanford
,
Stanford, CA
), pp.
47
92
.
10.
Binder
,
J. R.
,
Medler
,
D. A.
,
Desai
,
R.
,
Conant
,
L. L.
, and
Liebenthal
,
E.
(
2005
). “
Some neurophysiological constraints on models of word naming
,”
Neuroimage
27
(
3
),
677
693
.
11.
Browman
,
C. P.
, and
Goldstein
,
L.
(
1995
). “
Dynamics and articulatory phonology
,” in
Mind as Motion: Explorations in the Dynamics of Cognition
, edited by
R.
Port
and
T.
van Gelder
(
MIT Press
,
Cambridge, MA
), pp.
175
193
.
12.
Bunnell
,
H. T.
,
Polikoff
,
J. B.
, and
McNicholas
,
J. E.
(
2004
). “
Spectral moment vs. bark cepstral analysis of children's word-initial voiceless stops
,” in
Proceedings of the 8th International Conference on Spoken Language Processing
, October 4–8, Jeju, Korea, pp.
1313
1316
.
13.
Campbell
,
L.
(
1974
). “
Phonological features: Problems and proposals
,”
Language
50
,
52
65
.
14.
Carlton
,
T. R.
(
1990
).
Introduction to the Phonological History of the Slavic Languages
(
Slavica
,
Columbus, OH
), p.
461
.
15.
Chitoran
,
I.
(
2002
).
The Phonology of Romanian: A Constraint-based Approach
(
Mouton de Gruyter
,
Berlin
), p.
292
.
16.
Deller
,
J. R., Jr.
,
Proakis
,
J. G.
, and
Hansen
,
J. H. L.
(
1993
).
Discrete-Time Processing of Speech Signals
(
Macmillan
,
Englewood Cliffs, NJ
).
17.
Dieterman
,
J.
(
2008
).
Secondary Palatalization in Isthmus Mixe: A Phonetic and Phonological Account
(
Summer Institute of Linguistics
,
Dallas, TX
).
18.
Ferragne
,
E.
, and
Pellegrino
,
F.
(
2010
). “
Vowel systems and accent similarity in the British Isles: Exploiting multidimensional acoustic distances in phonetics
,”
J. Phon.
38
(
4
),
526
539
.
19.
Fujimura
,
O.
,
Macchi
,
M.
, and
Streeter
,
L. A.
(
1978
). “
Perception of stop consonants with conflicting transitional cues: A cross-linguistic study
,”
Lang. Speech
21
,
337
346
.
20.
Gregoire
,
S.
(
2006
). “
Gender and language change: The case of early modern women
,” http://homes.chass.utoronto.ca/~cpercy/courses/6362-gregoire.htm (Last viewed December 17, 2017).
21.
Hall
,
T. A.
(
1997
).
The Phonology of Coronals
(
John Benjamins
,
Amsterdam, the Netherlands
), p.
176
.
22.
Halle
,
M.
(
1959
).
The Sound Pattern of Russian
(
Mouton, the Hague
,
the Netherlands
), p.
206
.
23.
Haskell
,
T. R.
, and
MacDonald
,
M. C.
(
2003
). “
Conflicting cues and competition in subject-verb agreement
,”
J. Memory Lang.
48
,
760
778
.
24.
Hawkins
,
S.
(
2010
). “
Phonetic variation as communicative system: Perception of the particular and the abstract
,”
Lab. Phonol.
10
,
479
510
.
25.
Hayes
,
B.
, and
Steriade
,
D.
(
2004
). “
Introduction: The phonetic bases of phonological markedness
,” in
Phonetically-Based Phonology
, edited by
B.
Hayes
,
R.
Kirchner
, and
D.
Steriade
(
Cambridge University Press
,
Cambridge, UK
), pp.
1
33
.
26.
Janke
,
V.
, and
Kolokonte
,
M.
(
2015
). “
False cognates: The effect of mismatch in morphological complexity on a backward lexical translation task
,”
Second Lang. Res.
31
(
2
),
137
156
.
27.
Jesus
,
L. M.
, and
Jackson
,
P. J.
(
2008
). “
Frication and voicing classification
,” in
Computational Processing of the Portuguese Language: 8th International Conference on Computational Processing of Portuguese (PROPOR)
, September 8–10, Aveiro, Portugal, pp.
11
20
.
28.
Jongman
,
A.
,
Wayland
,
R.
, and
Wong
,
S.
(
2000
). “
Acoustic characteristics of English fricatives
,”
J. Acoust. Soc. Am.
108
(
3
),
1252
1263
.
29.
Kavitskaya
,
D.
(
2006
). “
Perceptual salience and palatalization in Russian
,” in
Laboratory Phonology
, edited by
L.
Goldstein
,
D. H.
Whalen
, and
C. T.
Best
(
Mouton de Gruyter
.
Berlin
), Vol.
8
, pp.
589
610
.
30.
Kochetov
,
A.
(
1998
). “
Articulatory gestures and perceptual enhancement: Palatalized labials in Polish dialects
,” in
Proceedings of the First High Desert Student Conference in Linguistics
, April 3–4, Albuquerque, NM, pp.
43
68
.
31.
Kochetov
,
A.
(
1999
). “
Phonotactic constraints on the distribution of palatalized consonants
,”
Toronto Working Papers Linguist.
17
,
171
212
.
32.
Kochetov
,
A.
(
2002
).
Production, Perception, and Emergent Phonotactic Patterns: A Case of Contrastive Palatalization
(
Routledge
,
New York
), p.
256
.
33.
Kochetov
,
A.
(
2017
). “
Acoustics of Russian voiceless sibilant fricatives
,”
J. Int. Phon. Assoc.
47
(
3
),
321
348
.
34.
Kochetov
,
A.
, and
Alderete
,
J.
(
2011
). “
Patterns and scales of expressive palatalization: Experimental evidence from Japanese
,”
Can. J. Linguist.
56
(
3
),
345
376
.
35.
Kong
,
Y.-Y.
,
Mullangi
,
A.
, and
Kokkinakis
,
K.
(
2014
). “
Classification of fricative consonants for speech enhancement in hearing devices
,”
PLoS ONE
9
(
4
),
e95001
.
36.
Labov
,
W.
(
2001
).
Principles of Linguistic Change, Vol. 2: Social Factors
(
Blackwell
,
Oxford, UK
), p.
592
.
37.
Lampitelli
,
N.
(
2014
). “
The Romance plural isogloss and linguistic change: A comparative study of Romance nouns
,”
Lingua
140
,
158
179
.
38.
Lavoie
,
L. M.
(
2001
).
Consonant Strength: Phonological Patterns and Phonetic Manifestations
(
Garland
,
New York
), p.
214
.
39.
Lee
,
J. Y.
,
Harkrider
,
A. W.
, and
Hedrick
,
M. S.
(
2012
). “
Electrophysiological and behavioral measures of phonological processing of auditory nonsense V-CV-VCV stimuli
,”
Neuropsychologia
50
(
5
),
666
673
.
40.
Leminen
,
A.
,
Jakonen
,
S.
,
Leminen
,
M.
,
Mkel
,
J. P.
, and
Lehtonen
,
M.
(
2016
). “
Neural mechanisms underlying word- and phrase-level morphological parsing
,”
J. Neurolinguist.
38
,
26
41
.
41.
Litvin
,
N.
(
2014
). “
An ultrasound investigation of secondary velarization in Russian
,” Master's thesis,
University of Victoria
,
BC, Canada
.
42.
Mallinson
,
G.
(
1986
). “
Rumanian
,” in
The Romance Languages
, edited by
M.
Harris
and
N.
Vincent
(
Croom Helm
,
London
), pp.
391
419
.
43.
Marslen-Wilson
,
W.
, and
Warren
,
P.
(
1994
). “
Levels of perceptual representation and process in lexical access: Words, phonemes, and features
,”
Psychol. Rev.
101
,
653
675
.
44.
Mester
,
R. A.
, and
Itô
,
J.
(
1989
). “
Feature predictability and underspecification: Palatal prosody in Japanese mimetics
,”
Language
65
,
258
293
.
45.
Nevalainen
,
T.
, and
Raumolin-Brunberg
,
H.
(
2003
).
Historical Sociolinguistics: Language Change in Tudor and Stuart England
(
Pearson Education Ltd.
,
London
), p.
260
.
46.
Ohala
,
J. J.
(
1990
). “
The phonetics and phonology of aspects of assimilation
,” in
Papers in Laboratory Phonology I: Between the Grammar and the Physics of Speech
, edited by
J.
Kingston
and
M.
Beckman
(
Cambridge University Press
,
Cambridge, UK
), pp.
258
275
.
47.
Operstein
,
N.
(
2010
).
Consonant Structure and Prevocalization
(
John Benjamins Publishing Company
,
Philadelphia, PA
), p.
234
.
48.
Padgett
,
J.
(
2001
). “
Contrast dispersion and Russian palatalization
,” in
The Role of Speech Perception in Phonology
, edited by
E.
Hume
and
K.
Johnson
(
Academic Press
,
New York
), pp.
187
218
.
49.
Pearlmutter
,
N. J.
,
Garnsey
,
S. M.
, and
Bock
,
K.
(
1999
). “
Agreement processes in sentence comprehension
,”
J. Memory Lang.
41
,
427
456
.
50.
Pierrehumbert
,
J.
(
1990
). “
Phonological and phonetic representations
,”
J. Phon.
18
,
375
394
.
51.
Pierrehumbert
,
J.
(
2001
). “
Exemplar dynamics: Word frequency, lenition, and contrast
,” in
Frequency Effects and the Emergence of Lexical Structure
, edited by
J.
Bybee
and
P.
Hopper
(
John Benjamins
,
Amsterdam, the Netherlands
), pp.
137
157
.
52.
Polka
,
L.
, and
Werker
,
J. F.
(
1994
). “
Developmental changes in the perception of non-native vowel contrasts
,”
J. Exp. Psychol.: Human Percept. Perform.
20
,
421
435
.
53.
Rabiner
,
L. R.
, and
Schafer
,
R. W.
(
2011
).
Theory and Applications of Digital Speech Processing
(
Pearson
,
Upper Saddle River, NJ
).
54.
R Core Team
(
2012
). “
R: A language and environment for statistical computing
,” R Foundation for Statistical Computing, Vienna, Austria.
55.
Ruhlen
,
M.
(
1973
). “
Rumanian Phonology
,” Ph.D. dissertation,
Stanford University
,
Stanford, CA
.
56.
Sarlin
,
M.
(
2014
).
Romanian Grammar
(
Books on Demand
,
Helsinki, Finland
), p.
378
.
57.
Schane
,
S.
(
1971
). “
The phoneme revisited
,”
Language
47
,
503
521
.
58.
Scharinger
,
M.
,
Lahiri
,
A.
, and
Eulitz
,
C.
(
2010
). “
Mismatch negativity effects of alternating vowels in morphologically complex word forms
,”
J. Neurolinguist.
23
,
383
399
.
59.
Shin
,
N. L.
(
2013
). “
Women as leaders of language change: A qualification from the bilingual perspective
,” in
Proceedings of the 6th International Workshop on Spanish Sociolinguistics
, April 12–14, Tucson, AZ, pp.
135
147
.
60.
Spinu
,
L.
(
2007
). “
Perceptual properties of palatalization in Romanian
,” in
Romance Linguistics 2006: Selected Papers From the 36th Linguistic Symposium on Romance Languages (LSRL)
, March 31–April 2, New Brunswick, Canada, pp.
277
289
.
61.
Spinu
,
L.
(
2010
). “
Palatalization in Romanian: Experimental and theoretical approaches
,” Ph.D. dissertation,
University of Delaware
,
Newark, DE
.
62.
Spinu
,
L.
, and
Lilley
,
J.
(
2016
). “
A comparison of cepstral coefficients and spectral moments in the classification of Romanian fricatives
,”
J. Phon.
57
,
40
58
.
63.
Spinu
,
L.
,
Vogel
,
I.
, and
Bunnell
,
T.
(
2012
). “
Palatalization in Romanian—Acoustic properties and perception
,”
J. Phon.
40
(
1
),
54
66
.
64.
Steriade
,
D.
(
1999
). “
Phonetics in phonology: The case of laryngeal neutralization
,” in
Papers in Phonology
, Vol.
3
, UCLA Working Papers in Linguistics 2, edited by
M.
Gordon
(
UCLA
,
Los Angeles, CA
), pp.
25
146
.
65.
Stevens
,
K.
, and
Keyser
,
S.
(
2010
). “
Quantal theory, enhancement and overlap
,”
J. Phon.
38
(
1
),
10
19
.
66.
Stevens
,
K.
,
Keyser
,
S. J.
, and
Kawasaki
,
H.
(
1986
). “
Toward a phonetic and phonological investigation of redundant features
,” in
Symposium on Invariance and Variability of Speech Processes
, edited by
J.
Perkell
and
D. H.
Klatt
(
Lawrence Erlbaum
,
Hillsdale, NJ
), pp.
426
463
.
67.
Şuteu
,
V.
(
1961
). “
Observaţii asupra pronunţării limbii române
” (“Notes on the pronunciation of the Romanian language”),
Studii şi cercetări lingvistice
12
(
3
),
293
304
.
68.
Tan
,
L. H.
,
Feng
,
C.-M.
,
Fox
,
P. T.
, and
Gao
,
J.-H.
(
2001
). “
An fMRI study with written Chinese
,”
Neuroreport: Int. J. Rapid Commun. Res. Neurosci.
12
(
1
),
83
88
.
69.
Timberlake
,
A.
(
2004
).
A Reference Grammar of Russian
(
Cambridge University Press
,
Cambridge, UK
), p.
503
.
70.
Trubetzkoy
,
N.
(
1969
).
Principles of Phonology
(
University of California Press
,
Berkeley, CA
), p.
344
.
71.
Vance
,
T. J.
(
1987
).
An Introduction to Japanese Phonology
(
SUNY Press
,
New York
), p.
226
.
72.
Van der Weijer
,
J.
(
2011
). “
Secondary and double articulation
,” in
The Blackwell Companion to Phonology
, edited by
M.
van Oostendorp
,
C. J.
Ewen
,
E. V.
Hume
, and
K.
Rice
(
Wiley
,
London
), pp.
694
710
.
73.
Vasilescu
,
I.
,
Vieru
,
B.
, and
Lamel
,
L.
(
2014
). “
Exploring pronunciation variants for Romanian speech-to-text transcription
,” in
Proceedings of the International Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU)
, May 14–16, St. Petersburg, Russia, pp.
161
168
.
74.
Vermeiren
,
A.
, and
Cleeremans
,
A.
(
2012
). “
The validity of d' measures
,”
PLoS ONE
7
(
2
),
e31595
.
75.
Viterbi
,
A. J.
(
1967
). “
Error bounds for convolutional codes and an asymptotically optimum decoding algorithm
,”
IEEE Trans. Inf. Theory
13
(
2
),
260
269
.
76.
Werker
,
J. F.
, and
Lalonde
,
C. E.
(
1988
). “
Cross-language speech perception: Initial capabilities and developmental change
,”
Dev. Psychobiol.
24
,
672
683
.
77.
Werker
,
J. F.
, and
Tees
,
R. C.
(
1984
). “
Cross-language speech perception: Evidence for perceptual reorganization during the first year of life
,”
Infant Behav. Dev.
7
,
49
63
.
78.
Wickens
,
T.
(
2002
).
Elementary Signal Detection Theory
(
Oxford University Press
,
New York
), p.
288
.
79.
Wright
,
R.
(
2004
). “
A review of perceptual cues and cue robustness
,” in
Phonetically-Based Phonology
, edited by
B.
Hayes
,
R.
Kirchner
, and
D.
Steriade
(
Cambridge University Press
,
Cambridge, UK
), pp.
34
57
.
80.
Yarrington
,
D.
,
Gray
,
J.
,
Pennington
,
C.
,
Bunnell
,
H. T.
,
Cornaglia
,
A.
,
Lilley
,
J.
,
Nagao
,
K.
, and
Polikoff
,
J. B.
(
2008
). “
ModelTalker Voice Recorder—An interface system for recording a corpus of speech for synthesis
,” in
Proceedings of the ACL-08: HLT Demo Session
, June 16, Columbus, OH, pp.
28
31
.
81.
Zsiga
,
E.
(
2000
). “
Phonetic alignment constraints: Consonant overlap and palatalization in English and Russian
,”
J. Phon.
28
,
69
102
.
82.
Zygis
,
M.
, and
Hamann
,
S.
(
2003
). “
Perceptual and acoustic cues of Polish coronal fricatives
,” in
Proceedings of the 15th International Congress of Phonetic Sciences
, August 3–9, Barcelona, Spain, pp.
395
398
.