The results of the quantification of the acoustic differences between German /ç/ and /ʃ/ in three speaker groups with varying contrast realizations are presented. Data for two speaker groups were collected in Berlin and Kiel, where the contrast is still realized. Data for a third group were collected in Berlin from speakers of Hood German—a youth-style multiethnolect spoken by adolescents in multilingual and multicultural neighborhoods of Berlin—where the contrast has weakened or is even lost. A forced choice perception test showed that listeners reliably differentiate these two fricatives in minimal pairs produced by the speakers from Berlin and Kiel, but fail to do so for the productions of the Hood German speakers. The acoustic analysis reveals that spectrally, the fricatives are very similar in all varieties. The spectral moments (Center of Gravity, standard deviation, kurtosis, skewness) fail to reveal the differences between the fricatives that are apparent from visual inspection of the spectra and the perceived auditory differences. Analyses of the discrete cosine transformation coefficients, however, better quantify these differences. This study suggests that minute differences between fricatives that vary between speaker groups may be captured more reliably with discrete cosine transformations compared to spectral moments.

In the languages of the world, the palatal fricative /ç/ sound is relatively rare. While German is one of the languages that has it, in Standard German, the distribution of /ç/ is rather restricted—it can only occur after high front vowels or in word- or syllable-onset position. German is also one of the three known languages in which the palatal fricative /ç/ and postalveolar fricative /ʃ/ are contrastive (Mielke, 2008). While this contrast has been maintained in the northern German dialect area diachronically, the contrast has already been lost in the Middle German dialect region (Herrgen, 1986; Dirim and Auer, 2004; Hall, 2014), ranging from Trier in the southwest up to Cologne in the west and over to Dresden in the east. Now this merger towards /ʃ/ is also affecting the speech of speakers in the northeast of this Middle German dialect belt up to Berlin, where the phonetic forms of phonological /ç/ and /ʃ/ can include [ɕ] and [ʃ].

Wedel et al. (2013) claim that the probability of a merger is inversely related to the number of minimal pairs found in a language. A language is more likely to lose a phoneme if the number of minimal pairs involving that phoneme is small (Wedel et al., 2013). An analysis of the Celex database (Baayen et al., 1995) for German revealed that there are only seven within-category (e.g. noun−noun, verb−verb, etc.) minimal pairs (Wedel, 2016). The application of the same statistical model on cross-linguistic data as explained in Wedel et al. (2013) revealed that the probability of /ç/ and /ʃ/ merging in German is greater than, for example, that of the /e/–/æ/ merger in German, or the pin–pen merger in English. This means that their statistical model gives the /ç/–/ʃ/ pair in German a similar probability of merging as it gives to other comparable contrasts that have already merged (Wedel, 2016).

We suggest that in addition to the spread of features of the Middle German dialect into the greater Berlin metropolitan area, there are two more factors contributing to the loss of the phonological contrast between /ç/ and /ʃ/ in Berlin and its vicinity over time: (1) the pervasive ambient youth-style multiethnolect (which we have termed Hood German, see Jannedy and Weirich, 2012, 2014a) and (2) the speech habits of some older east German speakers who also merge these two sounds towards the postalveolar fricative. Hood German is a youth-style multiethnolect (predominantly) spoken in urban areas of Germany, and has morphosyntactic (Auer, 2003; Wiese, 2009) and phonetic features, including the lack of contrast between /ç/ and /ʃ/, that distinguish it from Standard German (Jannedy and Weirich, 2012, 2014a,b). This lack of contrast appears to have its locus in the multiethnic and multilingual communities of larger urban areas, and plays a rather important role in the overall perception of or stance towards this multiethnolect. The potential spread of this merger to a wider speech community makes this fricative alternation a feature of a sociolect rather than a multiethnolect. Perception work in Berlin has shown that older listeners perceive /ʃ/ more often than /ç/ on a synthesized acoustic continuum when primed with the concept of Kreuzberg (the multiethnic neighborhood most highly associated with this alternation) compared to when primed with a neighborhood name not associated with the multiethnic youth variety. Younger listeners, however, did not show this effect (Jannedy and Weirich, 2014a). Thus, this alternation is becoming more widely accepted with younger speaker groups, which possibly points to a beginning sound change in Berlin. The third factor potentially contributing to the loss of fricative contrast in Berlin is that, independently of the younger speakers, some older monolingual monoethnic German speakers from east Berlin and the surrounding area of Brandenburg have weakened the contrast as well: /ç/ is realized more like [ɕ] than [ç], thereby merging in the direction of [ʃ] (Jannedy and Weirich, 2014a).

While this fricative variation in German is clearly perceptible, noticed, and socially marked (Jannedy et al., 2011; Jannedy and Weirich, 2014a), spectrally it has not yet been explored in detail. In this paper, we will systematically investigate the acoustic characteristics of these merged and unmerged fricatives as realized by (a) adolescent multiethnolectal speakers from Berlin-Kreuzberg, who produce largely merged forms, (b) Berlin university students, and (c) students from Kiel, which is in the same dialect region on the Baltic Sea but about 360 km to the northwest of Berlin, where the contrast is still fully realized. Figure 1 shows an oscillogram and a spectrogram of the palatal and the postalveolar fricatives produced by a female speaker of northern Standard German. The palatal fricative shows less intensity overall, reflected by the smaller amplitude in the oscillogram and the lighter gray color in the spectrogram. In addition, the postalveolar fricative shows more energy in the lower frequencies around 2000 Hz.

FIG. 1.

Sound pressure wave and spectrogram of the palatal and the postalveolar fricative of a Standard German speaker.

FIG. 1.

Sound pressure wave and spectrogram of the palatal and the postalveolar fricative of a Standard German speaker.

Close modal

Much work has focused on the acoustic characteristics of the anterior alveolar /s/ and postalveolar /ʃ/ sibilants (Evers et al., 1998; Jongman et al., 2000; Gordon et al., 2002; Nowak, 2006; Cheon and Anderson, 2008; Li et al., 2011) and some work has attempted to describe the acoustic differences between fricatives in languages with complex fricative systems such as Polish (Jassem, 1995; Guzik and Harrington, 2007; Bukmaier and Harrington, 2016; Czaplicki et al., 2016). While the contrast between these fricatives is easily perceived by native Polish listeners, the difference is not easily quantified in the acoustics (Nowak, 2006). Czaplicki et al. (2016) in their study of an emergent postalveolar sibilant variant in Polish extracted multiple acoustic measures from their data (spectral peak, the four spectral moments: 1. Center of Gravity, 2. the standard deviation of the spectrum, 3. the skewness, and 4. the kurtosis, as well as spectral slopes, the formant frequencies of preceding and following vowels, and the fricative duration) and subjected the data to a linear discriminant analysis. Their results for the fricatives /s ʂ ɕ/ indicate that higher spectral peaks and center of gravity (COG) are good predictors for discriminating the new variant from the Standard Polish counterparts.

Bukmaier and Harrington (2016) simultaneously obtained articulatory and acoustic measurements of the contrasting Polish retroflex, dental, and alveopalatal sibilants /ʂ ɕ ʃ/. Their results indicate that articulatorily, the three fricatives are quite distinct, while acoustically, the retroflex /ʂ/ and the alveopalatal /ʃ/ are difficult to separate by comparing averaged fricative spectra. Harrington (Guzik and Harrington, 2007; Harrington, 2010) has pioneered the method of quantifying fricative spectra by comparing the shape of the spectrum to discrete cosine functions: “[Discrete cosine transformation] coefficients, like spectral moments, reduce the quantity of information in a spectrum to a handful of values and, importantly, in such a way that different phonetic categories are often quite well separated (assuming these categories have differently shaped spectra)” (Harrington, 2010, p. 305). Guzik and Harrington (2007) found that discrete cosine transformations (DCT) effectively distinguish between the four fricative types in Polish. Weirich (2012) in her dissertation showed that DCT were useful in describing speaker-specific implementations of /s/ and /ʃ/ in the speech of monozygotic and dizygotic twins. We have previously done some pilot work on the acoustics of /ç/ and /ʃ/ in northern Standard German, eastern Middle German (Jannedy and Weirich, 2016), and in Berlin German (Jannedy et al., 2015) that points to DCT as a parameter that is able to differentiate the very similar spectra of these fricatives. More detail on the role of DCT in the characterization and comparison of fricative spectra is given in Sec. II C.

The objective of this paper is to describe the realization of the /ç/–/ʃ/ contrast in German in the speech of three speaker groups that vary in their productions of the acoustic contrast. To do this, first, a perception test was carried out to see if listeners perceive differences between the fricatives in each of the three speaker groups, and second, an acoustic analysis was conducted to explore the measures and parameters that best capture the minute differences in the acoustic realizations of [ç] and [ʃ].

All in all, 130 speakers participated in this study, separated into three different speaker groups (cf. Table I). We had two groups of Berlin speakers: the 32 adolescents ranged in age from 12 to 16, the 86 university students from 19 to 36. Participants in both of these speaker groups were born and raised in Berlin and their dominant language was German. The group of adolescents was comprised of both monolingual monoethnic German speakers and speakers with a multilingual multiethnic background, as is common for urban places such as Berlin. All of the adolescents were living in Kreuzberg (group: Berlin_KB), a neighborhood known for its multicultural flair and its use and prevalence of Hood German among adolescent speakers. None of the monolingual university students (group: Berlin) was living in Kreuzberg. The third group of 12 speakers were university students from Kiel (23–32 yr old), a smaller northern German city 360 km to the northwest of Berlin, but also within the Low German dialect region (group: Kiel).

TABLE I.

Overview of the number of speakers and items recorded separated by speaker group.

No. of speakersItems /ʃ/Items /ç/Total
Berlin_KB 32 177 169 346 
Berlin 86 510 513 1023 
Kiel 12 72 72 144 
No. of speakersItems /ʃ/Items /ç/Total
Berlin_KB 32 177 169 346 
Berlin 86 510 513 1023 
Kiel 12 72 72 144 

All participants were asked to read three sets of minimal pairs embedded in a carrier phrase. Labov (1972) notes that speakers pay more attention to speech when they are asked to elicit contrast directly, as is the case when speakers have to read pairs of words or minimal pairs. We embedded the individual members of the three minimal pairs in a carrier phrase, which should also give the word local prosodic prominence due to the contrastive setup. Participants were asked to read these sentence pairs twice, but in some instances only one repetition could be used due to reading errors, resulting in 1513 tokens for analysis (instead of 130 participants × 2 repetitions × 6 words = 1560, cf. Table I). Note that for the analysis of the fricative contrasts the number varies slightly, because for some word pairs only one repetition could be used and a few tokens had to be excluded due to measurement errors in the DCT or spectral moment analysis (number of fricative contrasts for the spectral moment analysis: 737, and for the DCT analysis: 734).

The three target minimal pairs are listed in (1) and a sample sentence is given in (2),

  1. fischte/fɪʃtə/fished,3rdp.sg.Fichte/fɪçtə/sprucemisch/mɪʃ/mix!mich/mɪç/myselfwischt/vɪʃt/wipe,3rdp.sg.Wicht/vɪçt/gnome

  2. (a)Ichhabefischtegesagt.Isaidfished.(b)IchhabeFichtegesagt.Isaidspruce.

The contrasting sentences were read one after the other. In all cases, the experiment was administered by the same experimenter.

For the perception test, all Fichte and fischte utterances were selected from the 12 students from the north (Kiel) and from 12 randomly selected adolescents and students from Berlin each. Target words were automatically extracted via Praat script and then programmatically and automatically leveled at 70 dB (Boersma and Weenink, 2016). This generated 144 test items that were played twice as the perception test material (12 speakers × 3 speaker groups × 2 words (Fichte/fischte) × 2 renditions per recording × 2 repetitions). Twelve listeners (6 males, 6 females, mean age: 28 yr with a spread of 16–39) from Buxtehude, a small city to the south of Hamburg in northern Germany, participated in the perception experiment. In their dialect region, the contrast between /ç/ and /ʃ/ has been retained; they produce and hear the two different variants. Each of the 12 northern German experiment participants listened to 288 test items (3.456 ratings total). They were administered via the Praat function mfctest. In a forced choice experiment listeners heard one item over Sennheiser HD 280 PRO over-ear headphones (frequency range: 8–25 KHz) and had to select a text box displayed on the computer screen showing in orthography either the word Fichte or the word fischte. Selection was made via mouse click and responses were automatically logged. To ensure that all listeners were physically capable of detecting differences between the two fricatives, all were subjected to an auditory pretest to determine their hearing thresholds. All of them had good hearing.

Acoustic data were recorded with a sampling frequency of 48 kHz using a Taskam DR-05 portable digital recorder (20 Hz–22 KHz, +1/−3 dB) and a Beyerdynamic opus 54 head-mounted microphone (40–17 000 Hz). For all further analyses the spectral data were limited to a bandwidth from 500 to 12 000 Hz, thereby providing a suitable frequency range for the comparison of the spectral characteristics of [ç] and [ʃ]. Since the frequency range used has an effect on the calculation of both spectral moments and DCT, the same bandwidth (500–12 000 Hz) was used for both sets of spectral measures. In line with other studies, an upper limit of 12 000 Hz was considered to be safe to include all of the phonologically relevant spectral information. For instance, Gordon et al. (2002) used in their comprehensive cross-linguistic acoustic study of voiceless fricatives—including [ʃ] and [ç]—a frequency range up to 10 000 Hz. Czaplicki et al. (2016) used an upper limit of 11 000 Hz in their analysis of spectral moments in Polish fricatives. Frequencies below 500 Hz were excluded to control for an influence of potential voicing. A similar frequency range (414–10 313 Hz) had previously been used for the computation of DCT coefficients in the Bukmaier and Harrington (2016) study on Polish fricatives.

All fricative tokens were hand labeled in Praat based on the energy distribution of the sound pressure wave, discontinuities in the spectrogram, and also on the auditory signal. Labelling was uncontroversial due to the combination of the preceding lax front vowel /I/ and the voiceless target fricative /ç/ or /ʃ/ followed by a stop consonant in the coda or as the onset of the next syllable in the frame sentence. The example in Fig. 2 shows the orthographic transcription and on a second tier the phonemic transcription for the target word.

FIG. 2.

Sound pressure wave, spectrogram, and orthographic and phonemic textgrids of the target word Fichte with preceding schwa and following g-schwa sequence (from the carrier phrase).

FIG. 2.

Sound pressure wave, spectrogram, and orthographic and phonemic textgrids of the target word Fichte with preceding schwa and following g-schwa sequence (from the carrier phrase).

Close modal

1. Fricative spectra

For a first visual inspection of the spectral shape of the fricatives we computed overlapping power spectra centered on the temporal midpoint between the acoustic onset and offset of each fricative and calculated the averaged result in dB for each token. Calculations were made in matlab (version R2011a) using a 30 ms analysis window length around the temporal midpoint of each fricative, a 512 point discrete Fourier transform, a 6 ms Hamming window, and a frame shift of 1 ms.

2. Spectral moments: COG, SD, skewness, and kurtosis

Acoustic measurements were automatically logged at the temporal midpoint of the sibilants. Multiple acoustic measurements were chosen to parameterize the spectra of the two fricatives (Hughes and Halle, 1956; Forrest et al., 1988; Jongman et al., 2000; Newman, 2003). Four spectral moments (treating the spectrum as a probability density distribution) following Forrest et al. (1988) were calculated consisting of (1) the centroid frequency or COG, which is the pair-wise weighting of spectral amplitude with frequency and reflects the mean central frequencies for the entire spectrum; (2) the standard deviation (SD), which is a measure of how much the frequencies in the spectrum deviate from the COG; (3) the skewness, which describes the energy distribution over the whole frequency range of the spectrum and expresses whether the frequencies are skewed towards the higher or the lower frequencies; and (4) kurtosis, which reveals the spectral peakedness of the distribution. The spectral moments were calculated in Praat with a window length of 0.025 s and a smoothing of 100 Hz.

3. DCT

DCT, a method proposed by Watson and Harrington (1999), was used to quantify the shape of the spectra and, in particular, the fricative contrast in more detail (using matlab). For a better understanding of the spectral parameterization using DCT, we will have a look at the fricatives /ç/ and /ʃ/ produced by speakers from Buxtehude (Jannedy and Weirich, 2016), located in the northern dialect region where the contrast is fully realized. This contrast is mirrored by the difference in spectral shape between the palatal [ç] (gray) and postalveolar [ʃ] (black) in Fig. 3. The postalveolar fricative is characterized by higher energy in the frequency range between 5000 and 7000 Hz compared to the palatal fricative. Also, the postalveolar fricative shows a faster and steeper rise in the frequency range up to 3000 Hz. We can also see differences in the upper region of the spectrum after the energy maxima have been reached: energy at higher frequencies in the spectrum decreases faster in the postalveolar fricative than in the palatal fricative, giving rise to a steeper spectral slope for [ʃ]. DCT decomposes the signal into a set of half-cycle cosine waves, whereby the resulting amplitudes of these cosine waves are the DCT coefficients. The dotted black curves in the two plots show these cosine waves: DCT1 corresponds to a half-cycle cosine wave (left), DCT2 to a full cycle cosine wave (right). Put simply, applied to fricative spectra, as we are doing here, the DCT coefficients reflect the resemblance of the spectral shape to the respective cosine wave. In our Buxtehude data we found a significant difference between the fricatives in DCT1 and DCT2 (Jannedy and Weirich, 2016). DCT1 reflects the slope of the spectrum and thus results in higher values for [ʃ] (black) than for [ç] (gray), since the difference between the energy distribution of the lower frequencies (left of the midpoint of the spectrum marked by the crossing lines) and the higher frequencies (right of the midpoint) is higher for [ʃ] than for [ç]. DCT2, on the other hand, reflects the curvature of the spectrum and is thus negative for both fricatives, but with higher values for [ʃ] than for [ç], since more energy is found for [ʃ] within the particular frequency range between ∼3000 and ∼8000 Hz (marked by the crossing dotted lines).

FIG. 3.

Averaged fricative spectra of /ç/ (gray) and /ʃ/ (black) of speakers from Buxtehude with cosine waves corresponding to DCT1 (left) and DCT2 (right).

FIG. 3.

Averaged fricative spectra of /ç/ (gray) and /ʃ/ (black) of speakers from Buxtehude with cosine waves corresponding to DCT1 (left) and DCT2 (right).

Close modal

In the following analysis we will include the first four DCT coefficients, reflecting the mean amplitude of the spectrum (DCT0), the linear slope of the spectrum (DCT1), its curvature (DCT2), and the amplitude of the higher frequencies (DCT3). Note that while DCT0 is not a reliable measure when comparing different speakers or recordings—due to its correspondence to the amplitude of the signal—in our analysis comparisons are made only within the same speakers, recording sessions, and repetitions. Also, the tokens compared were embedded in the same phonological context (using minimal pairs), and we took great care to ensure that speakers did not vary in the distance to the microphone or loudness throughout the recording. We applied DCT to the spectra after each spectrum's frequency axis had been converted into Bark (Traunmüller, 1997). Following Harrington (2010), this has two advantages: first, Bark is more closely related to the way in which frequency is perceived, and second, fewer Bark-scaled DCT coefficients are needed to effectively distinguish between different phonetic variants than when DCT coefficients are derived from a Hz scale.

For a quantification of the acoustic contrast between the fricatives, Euclidean distances (ED) in a multidimensional space (4D) were calculated between each [ç] and [ʃ] from a minimal pair produced by a speaker separately for the three minimal pairs using (1) the spectral moments and (2) the DCT coefficients. Equation (1) shows the calculation for the acoustic contrast in the four-dimensional DCT space:

ED(ç,ʃ)=(DCT0çDCT0)2+(DCT1çDCT1)2+(DCT2çDCT2)2+(DCT3çDCT3)2.
(1)

We will first present the results of the perception experiment (A) followed by those of the acoustic analysis (B). The acoustic analysis is separated into (1) a visual examination of the fricative spectra, (2) a comparison of the fricatives in terms of the four spectral moments (a) COG, (b) SD, (c) skewness, and (d) kurtosis, and (3) a presentation of the extracted DCT coefficients and their usage for the differentiation between the fricatives. Finally, the relationship between these parameters and their suitability to quantify very small acoustic differences is discussed. All statistical analyses were carried out in R (version 2.14.1, R Development Core Team, 2008) using the lme4 package (Bates et al., 2011).

To look for an effect of the three speaker groups (Kiel, Berlin, Berlin-Kreuzberg) on the correct fricative categorization, we calculated a generalized linear mixed model (family = binomial) with the correctness scores as dependent variable. As fixed effects, repetition was entered as a control variable, and speaker group (Kiel, Berlin, Berlin_KB) and target word (fischte, Fichte) as test variables. As random effect we entered an intercept for listener. Likelihood ratio tests revealed a significant interaction effect for speaker group x target word (χ2 (2) = 64.9, p < 0.001). As Fig. 4 clearly shows, the Fichte productions by the Berlin_KB speakers were significantly less often correctly categorized than the productions by the Berlin and Kiel speakers (p < 0.001). This also holds for the fischte productions (p < 0.001) but to a much lesser degree (estimates for Fichte: 5.4 and 3.8, for fischte: 1.5 and 1.1). Thus, listeners were able to differentiate between /ç/ and /ʃ/ in the minimal pairs produced by speakers from Berlin and Kiel, but had much more difficulty in doing so when the stimuli were produced by young adolescent speakers from Berlin-Kreuzberg. Here, for the Fichte stimuli, listeners performed below chance level with only 46% correct identifications.

FIG. 4.

Distribution of correct fricative categorizations (in percent), separated by target word (Fichte vs fischte) and speaker group (Berlin_KB, Berlin, Kiel). Number of tokens: 576 items × 3 groups × 2 target words = 3456.

FIG. 4.

Distribution of correct fricative categorizations (in percent), separated by target word (Fichte vs fischte) and speaker group (Berlin_KB, Berlin, Kiel). Number of tokens: 576 items × 3 groups × 2 target words = 3456.

Close modal

In summary, while for the Berlin and Kiel speakers both fricatives were identified reliably, for the Kreuzberg multiethnolect speakers—as predicted—listeners had problems with the correct identification of the postalveolar fricative and failed to identify the palatal fricative. We take this as evidence that the phonetic characteristics of the variant resulting from the /ç/–/ʃ/ merger in the Berlin-Kreuzberg speaker group resemble those of a postalveolar fricative.

1. Fricative spectra

Mean fricative spectra of all [ç] and [ʃ] renditions are plotted in Fig. 5 separated by speaker group. Again, [ʃ] productions are plotted in black, [ç] productions in gray. A visual inspection of these average fricative spectra shows that both fricatives have rather similar overall spectral shapes for all three speaker groups. However, the spectra for the Berlin and the Kiel students show that the overall amplitude of [ʃ] is higher compared to [ç], while for the Berlin Kreuzberg (Berlin_KB) group this is only the case in a small frequency range from 6000 to 8000 Hz. In addition, the slope of the spectral shape is steeper in [ʃ] than in [ç] for Berlin and Kiel, in the sense that there is a faster rise in amplitude at the lower frequencies up to 3000–4000 Hz and a steeper fall from the peak amplitude to the end. This difference in slope (with a more evenly distributed and flatter spectrum for [ç] than for [ʃ]) reflects the shape of the Buxtehude data in Fig. 4.

FIG. 5.

Averaged fricative spectra of [ç] (gray) and [ʃ] (black) separated by the three speaker groups (Berlin, Berlin_KB, Kiel).

FIG. 5.

Averaged fricative spectra of [ç] (gray) and [ʃ] (black) separated by the three speaker groups (Berlin, Berlin_KB, Kiel).

Close modal

The more similar spectra for [ç] and [ʃ] of the Kreuzberg speakers (Berlin_KB) compared to those of the Berlin and Kiel speakers mirror the results of the perception test: here too, the fricatives of the Kreuzberg group could not be differentiated by the listeners, while the fricatives of the other two speaker groups could be distinguished reliably. Note though that the difference between the fricatives in spectral shape are very minute, also for the latter two groups. A quantification of the clearly perceptible but acoustically rather small difference between these two speech sounds for the Kiel and Berlin groups seems rather difficult. A very sensitive measure is needed to capture the fine spectral variations and differences between the palatal and postalveolar fricatives in German.

2. Spectral moments: COG, SD, skewness, and kurtosis

Figure 6 shows the extracted spectral moments for [ç] (gray) and [ʃ] (black) separated by speaker group. Overall, the distributions of [ç] and [ʃ] productions overlap to a great extent, especially for Berlin-Kreuzberg, but also for the Kiel speakers. Only in the Berlin group are the fricative distributions slightly shifted: the palatal fricative shows higher SD and COG values than the postalveolar fricative, as is expected from the wider bandwidth of the [ç] spectrum (cf. Fig. 1). Differences in skewness or kurtosis seem rather negligible or even absent.

FIG. 6.

Scatterplot of extracted spectral moments for /ç/ (gray) and /ʃ/(black) separated by the speaker groups. Upper plot: SD (y axis) and skewness (x axis) values, lower plot: kurtosis (y axis) and COG (x axis) values.

FIG. 6.

Scatterplot of extracted spectral moments for /ç/ (gray) and /ʃ/(black) separated by the speaker groups. Upper plot: SD (y axis) and skewness (x axis) values, lower plot: kurtosis (y axis) and COG (x axis) values.

Close modal

For a quantification of the fricative contrast, spectral moments of [ç] and [ʃ] productions were paired for each speaker and word pair. Then, Euclidean distances (EDs) were calculated for each fricative pair in a four-dimensional space using COG, SD, skewness, and kurtosis [see Eq. (1)]. Figure 7 shows the distributions of the EDs separated by word pair and speaker group. We see a very similar pattern for all three word pairs. While the Kreuzberg adolescents generally show the smallest remaining difference in production, also the Kiel speakers do not differ much in their contrast. The Berlin students have the greatest ED, and thus the greatest fricative contrast expressed in the extracted spectral moments.

FIG. 7.

Distribution of Euclidean Distances in COGxSDxSKEWNESSxKURTOSIS space (y axis: ED spectral moments) between the fricatives /ç/ and /ʃ/ separated by speaker group and minimal pair (speakers: 130, fricative pairs: 737).

FIG. 7.

Distribution of Euclidean Distances in COGxSDxSKEWNESSxKURTOSIS space (y axis: ED spectral moments) between the fricatives /ç/ and /ʃ/ separated by speaker group and minimal pair (speakers: 130, fricative pairs: 737).

Close modal

We performed a linear mixed effects analysis to investigate the difference in magnitude of the fricative contrast between the three speaker groups. The dependent variable was the ED between the fricatives in the four-dimensional COG×SD×SKEWNESS×KURTOSIS space. As fixed effects, repetition was entered as a control variable, and speaker group (Kiel, Berlin, Berlin-Kreuzberg) and minimal pair (fischte – Fichte, wischt – Wicht, misch – mich) as test variables. As random effect we entered an intercept for speaker as well as a by-speaker random slope for the effect of minimal pair. Likelihood ratio tests were run to test for a significant effect of the test variables by comparing the model with the factor in question to a model without that factor. A significant interaction effect was found for speaker group x minimal pair (χ2 (4) = 13.1, p < 0.05) pointing to speaker group specific differences in the implementation of the fricatives between the minimal pairs. Post hoc tests were carried out [using the lsmeans package in R (Lenth, 2016)] to reveal any significant differences between the different levels of speaker group and minimal pair. Table II shows only those comparisons which turned out significant. In the summary statistics of linear mixed models and post hoc tests, the given “Estimates” (column 3 of Table II) can be interpreted similarly to traditional effect sizes. The given value reflects the increase (or decrease if negative) in the acoustic distance influenced by the predictor level (speaker group or word pair). In detail, the Berlin speakers show a much smaller difference in the acoustic contrast for Fichte-fischte (−317.9) and also for Wicht-wischt (−426.3) than for mich-misch. This was already apparent in Fig. 7. However, as the figure also showed, the different speaker groups do not vary much in their acoustic contrast expressed by the spectral moments; only for the mich-misch pair was a significant difference found between Berlin and Berlin Kreuzberg (Berlin_KB): as expected the fricative contrast is much higher (614.1) for the Berlin students than for the adolescents from Berlin Kreuzberg (Berlin_KB).

TABLE II.

Results of the post-hoc Tukey tests comparing the acoustic fricative contrast in terms of spectral moments between the different levels of the factors minimal pair and speaker group.

Factor levelComparisonEstimateSEdft.ratiop-value
for Berlin fi(s)chte vs mi(s)ch −317.9 58.5 132.8 −5.4 <0.0001 
 wi(s)cht vs mi(s)ch −426.3 63.3 122.2 −6.7 <0.0001 
for mi(s)ch Berlin vs Berlin_KB 614.1 126.7 130 4.85 <0.0001 
Factor levelComparisonEstimateSEdft.ratiop-value
for Berlin fi(s)chte vs mi(s)ch −317.9 58.5 132.8 −5.4 <0.0001 
 wi(s)cht vs mi(s)ch −426.3 63.3 122.2 −6.7 <0.0001 
for mi(s)ch Berlin vs Berlin_KB 614.1 126.7 130 4.85 <0.0001 

These results, however, do not match the visual impressions of the spectral shapes of the fricatives shown in Fig. 5. Here, both Berlin and Kiel have differing [ʃ] and [ç] spectra, while the spectra of the Kreuzberg fricatives overlap to a great extent. Also, these results do not match the perception test, where listeners could easily differentiate Fichte from fischte productions of Berlin and Kiel speakers but identified the Berlin-Kreuzberg speakers' Fichte tokens only by chance. Thus, apparently, we have a mismatch between perception and the observable spectral shape on the one hand and the quantification of the fricative contrast in terms of spectral moments on the other. Given the impression from the visual inspection of the spectra and the results of the perception test, where contrast realizations were distinguishable for Berlin and Kiel but not for Berlin_KB, we would have expected the acoustic contrast (ED captured by the spectral moments) to be higher in Berlin und Kiel than in KB. This however is not the case. Thus, for these two German fricatives, the four spectral moments do not adequately capture the minute differences between [ç] and [ʃ] in the three speaker groups.

3. DCT

The graphs of Fig. 8 show the extracted DCT coefficients plotted separated by speaker group. Again, [ç] productions are plotted in gray, [ʃ] productions in black. While the fricative distributions completely overlap for Kreuzberg (Berlin_KB)—in both DCT-combinations (upper and lower graph)—a clearer separation can be seen for Kiel and more so for Berlin (left panel of both graphs). The tendencies are the same in both speaker groups, most obviously in DCT0 and DCT2. For DCT2 (which corresponds to the curvature of a spectrum, cf. Fig. 3), [ʃ] (black) reveals more negative values than [ç] (see the x axis of the upper plot of Fig. 8), pointing to more energy in the center of the postalveolar fricatives' spectrum (frequencies around 5500 Hz) in comparison to its start and end than in that of the palatal fricatives (gray). The variation in DCT0 mirrors the difference in the overall spectral amplitude (including the whole frequency range from 500 to 11 000 Hz) between the fricatives (see also the sound pressure waves in Fig. 1).

FIG. 8.

Scatterplot of extracted DCT coefficients for /ç/ (gray) and /ʃ/ (black) separated by the speaker groups. Upper plot: DCT1 (y axis) and DCT2 (x axis) values, lower plot: DCT0 (y axis) and DCT3 (x axis) values.

FIG. 8.

Scatterplot of extracted DCT coefficients for /ç/ (gray) and /ʃ/ (black) separated by the speaker groups. Upper plot: DCT1 (y axis) and DCT2 (x axis) values, lower plot: DCT0 (y axis) and DCT3 (x axis) values.

Close modal

Again Euclidean distances between the fricatives were calculated for each speaker and minimal pair separately. We included four DCT values in our analysis, resulting in EDs measured in a four-dimensional space [see Eq. (1)]. Figure 9 shows the distributions of EDs separated by minimal pair and speaker group. Parallel to the spectral moment analysis, for all three minimal pairs, the fewest differences between [ç] and [ʃ] are found in the fricatives produced by the adolescent Berlin Kreuzberg speakers (Berlin_KB). However, in contrast to the previous spectral moment analysis, speakers from Kiel show higher EDs than speakers from Berlin_KB, with the largest difference in the misch-mich pair.

FIG. 9.

Distribution of Euclidean Distances in DCT0×DCT1×DCT2×DCT3 space (ED DCT) between the fricatives /ç/ and /ʃ/ separated by speaker group and minimal pair.

FIG. 9.

Distribution of Euclidean Distances in DCT0×DCT1×DCT2×DCT3 space (ED DCT) between the fricatives /ç/ and /ʃ/ separated by speaker group and minimal pair.

Close modal

Again linear mixed models were run with repetition as control variable and speaker group and minimal pair as test variables. As random effects we again entered an intercept for speaker as well as a by-speaker random slope for the effect of minimal pair. Likelihood ratio tests showed a significant main effect of speaker group (χ2 (2) = 25.34, p < 0.0001) and of minimal pair (χ2 (2) = 33.4, p < 0.0001). Again, post hoc Tukey tests were carried out, and Table III shows the significant differences between the different levels of minimal pair and speaker group. Since no interaction was found, results are averaged over levels of minimal pair and speaker group, respectively. Regarding the minimal pairs, the Wicht-wischt pair was produced with a significantly smaller acoustic contrast than Fichte-fischte (22.6) and mich-misch (20.06). Regarding the speaker groups, a significant difference was found between Kiel and Berlin-Kreuzberg (p < 0.05) and between Berlin and Berlin-Kreuzberg (p < 0.0001) in terms of a smaller acoustic contrast in the Kreuzberg speakers (38.81 and 48.79). Thus, unlike the spectral moment analysis, the DCT analysis mirrors the results of the perception test in terms of a clearer acoustic fricative contrast in the Berlin and Kiel students in comparison to the Berlin-Kreuzberg adolescents. Also, the DCT results reflect the observation of the slight differences in the spectral shapes of the fricatives in Fig. 5.

TABLE III.

Results of the post-hoc Tukey tests comparing the acoustic fricative contrast in terms of DCT coefficients between the different levels of the factors minimal pair and speaker group.

FactorComparisonEstimateSEdft.ratiop.value
Word pair fi(s)chte vs wi(s)ch 22.65 4.73 469 5.66 <0.0001 
 mi(s)ch vs wi(s)cht 20.06 4.01 133 4.29 <0.001 
Speaker group Kiel vs Berlin_KB 38.81 14.43 120 2.69 <0.05 
 Berlin vs Berlin_KB 48.79 8.99 120 2.69 <0.000 
FactorComparisonEstimateSEdft.ratiop.value
Word pair fi(s)chte vs wi(s)ch 22.65 4.73 469 5.66 <0.0001 
 mi(s)ch vs wi(s)cht 20.06 4.01 133 4.29 <0.001 
Speaker group Kiel vs Berlin_KB 38.81 14.43 120 2.69 <0.05 
 Berlin vs Berlin_KB 48.79 8.99 120 2.69 <0.000 

Previous work (Jannedy and Weirich, 2014a) showed that the difference between /ç/ and /ʃ/ is well perceived by native listeners of German but also that a variable production of /ç/ is associated with specific speaker groups. For a multitude of reasons, some native German speakers go on to reproduce this variability with varying degrees of acoustic and perceptual salience. However, the weakening of the contrast is not only progressing for linguistic reasons such as there being only few minimal pairs in the language (some of which are created by adding derivational affixes such as <-ig> or <-isch>) or due to distributional restrictions. Also social reasons such as the expression of group membership and local neighborhood identity (Jannedy et al., 2015) play a role in the weakening of the contrast and spreading of the change.

In this paper, we have attempted a quantification of the acoustic and spectral differences of [ç] and [ʃ] contained in minimal pairs collected from speakers from three different speech communities in the lower German dialect region. Two of these groups come from Berlin, thus, we investigated differences in the two urban varieties while simultaneously investigating changes in dialectal implementations in two different northern German regions.

We conducted a forced choice perception test revealing that fricatives were hardest to differentiate for the group of adolescent speakers from Berlin Kreuzberg (Berlin_KB), who seem to have merged the palatal fricative towards the postalveolar one, causing perceptual confusion between the two words Fichte and fischte. The productions by the students from Berlin and Kiel on the other hand were differentiated reliably by the listeners. Acoustically, however, the fricatives are very similar and hence difficult to differentiate. The quantification of this acoustic difference turned out to be quite complicated due to very similar spectral shapes of the two fricatives [ç] and [ʃ] and the rather minute differences between the spectra. We have explored the quantification of the acoustic measures parameterizing the shape of the spectra.

Our results indicate that the DCT coefficients seem to provide a more detailed analysis of the differences between the palatal and postalveolar fricatives /ç/ and /ʃ/ in German compared to the four spectral moments COG, SD, skewness, and kurtosis. This is not surprising as the DCT coefficients quantify the entire shape of the spectrum rather than just the central frequency or the weighting of the higher or lower frequencies. While skewness and kurtosis revealed no difference at all between the fricatives, COG and SD measures varied between the sounds depending on the speaker group. As expected, the adolescent Hood German speakers (Berlin_KB) showed the least contrast of the three groups. Even though the spectra of both the Berlin and Kiel students varied between the fricatives, only for the Berlin students did the COG×SD analysis succeed in finding significant differences. The DCT analysis, however, mirrored the acoustic differences between the fricatives apparent from the comparison of the spectral shapes and the results from the perception test. Albeit minute, spectral differences between the [ç] and [ʃ] productions were reflected in the DCT measures for Kiel and Berlin students, and the contrast realizations were significantly larger than the ones produced by the Berlin-Kreuzberg (Hood German) speakers.

Furthermore, we were able to find word-pair specific effects in our analysis of the fricative contrast. In the DCT analysis (which we think is more reliable than the spectral moment analysis because it mirrors the results of the perception test), the speaker groups varied in their magnitude of contrast realization between the word pairs. The smallest contrast was found for the Wicht-wischt pair; it differed significantly from the Fichte-fischte and the mich-misch pair. A possible explanation for this word-specific phonetic difference (Pierrehumbert, 2002) might be a lexical frequency effect, because especially the first member of the Wicht-wischt pair is rather restricted in use, occurring rarely and only in specific literary genres (fairy tales). Further studies are planned to shed light on possible interactions of the /ç/–/ʃ/ merger in progress in Berlin and word frequency effects. We are now finding the merger to be more probable in less frequent words such as Wicht, and less probable in high frequency words such as mich. Wedel et al. (2013), however, state that in their dataset they did not find any measure based on word frequency that was additionally predictive of merger probability.

A factor which we have neglected in this analysis but which is rather relevant to keep in mind for future work on the production and perception of this fricative contrast is the quality of the preceding vowel. Our work has shown that anticipatory rounding on a high front vowel before a postalveolar fricative (which has lip rounding) facilitates the identification of /ʃ/ compared to /ç/ (Jannedy and Weirich, 2014a). Listeners and speakers (especially Kiel vs Berlin) might use this additional cue—measurable in the F2-transition from the preceding vowel to the fricative—to varying degrees. Even in Japanese, a language that has no rounding on /ʃ/, onset F2 frequency is relevant in addition to the centroid frequency and SD (Li et al., 2009). Thus, it is advisable to investigate a range of phonetic cues in the fricative (or in the transition into or out of the fricative) that may be used to more or lesser degrees to enhance or weaken contrasts acoustically or perceptually.

This work was supported by the German Federal Ministry for Education and Research (BMBF)—Grant No. 01UG0711. We thank our research assistants for all their diligent work and are very grateful for all the help we received from our experiment participants. Our work has also benefited from discussions with our colleagues at ZAS, the Friedrich-Schiller University of Jena, and with Jonathan Harrington (LMU Munich). We also thank three anonymous reviewers and our editor, Cynthia Clopper, for insightful comments and helpful guidance. All remaining errors are our own.

1.
Auer
,
P.
(
2003
). “
Türkenslang. Ein jugendsprachlicher Ethnolekt des Deutschen und seine Transformationen
” (“Turkslang. A youth language ethnolect of German and its transformations”), in
Spracherwerb und Lebensalter
, edited by
A.
Häki-Buhofer
(
Francke
,
Tübingen, Germany
), pp.
255
264
.
2.
Baayen
,
R. H.
,
Piepenbrock
,
R.
, and
Gulikers
,
L.
(
1995
). “
The CELEX lexical database
” (Release 2, CD-ROM), LDC catalogue No.: LDC96L14, Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA.
3.
Bates
,
D.
,
Maechler
,
M.
, and
Bolker
,
B.
(
2011
). lme4: Linear mixed-effects models using S4 classes, R package version 0.999375-42.
4.
Boersma
,
P.
, and
Weenink
,
D.
(
2016
). Praat: Doing phonetics by computer [Computer program]. Version 6.0.23, retrieved 12 December 2016 from http://www.praat.org/.
5.
Bukmaier
,
V.
, and
Harrington
,
J.
(
2016
). “
The articulatory and acoustic characteristics of Polish sibilants and their consequences for diachronic change
,”
J. Int. Phonetic Assoc.
46
,
311
329
.
6.
Cheon
,
S. Y.
, and
Anderson
,
V. B.
(
2008
). “
Acoustic and perceptual similarities between English and Korean sibilants: Implications for second language acquisition
,”
Korean Linguist.
14
,
41
64
.
7.
Czaplicki
,
B.
,
Zygis
,
M.
,
Pape
,
D.
, and
Jesus
,
L. M. T.
(
2016
). “
Acoustic evidence of new sibilants in the pronunciation of young Polish women
,”
Poznan Stud. Contemp. Linguist.
52
(
1
),
1
42
.
8.
Dirim
,
I.
, and
Auer
,
P.
(
2004
). Türkisch sprechen nicht nur die Türken. Über die Unschärfebeziehung zwischen Sprache und Ethnie in Deutchland (Turkish is Not Only Spoken by Turks. On the Fuzzy Relationship between Language and Ethnos in Germany) (Walter de Gruyter, Berlin), pp.
1
255
.
9.
Evers
,
V.
,
Reetz
,
H.
, and
Lahiri
,
A.
(
1998
). “
Crosslinguistic acoustic categorization of sibilants independent of phonological status
,”
J. Phonetics
26
,
345
370
.
10.
Forrest
,
K.
,
Weismer
,
G.
,
Milenkovic
,
P.
, and
Dougall
,
R. N.
(
1988
). “
Statistical analysis of word-initial voiceless obstruents: Preliminary data
,”
J. Acoust. Soc. Am.
84
(
1
),
115
123
.
11.
Gordon
,
M.
,
Barthmaier
,
P.
, and
Sands
,
K.
(
2002
). “
A cross-linguistic acoustic study of voiceless fricatives
,”
J. Int. Phonetic Assoc.
32
,
141
174
.
12.
Guzik
,
K.
, and
Harrington
,
J.
(
2007
). “
The quantification of place of articulation assimilation in electropalatographic data using the similarity index (SI)
,”
Adv. Speech Lang. Pathol.
9
(
1
),
109
119
.
13.
Hall
,
T. A.
(
2014
). “
Alveolopalatalization in Central German as markedness reduction
,”
Trans. Philos. Soc.
112
,
143
166
.
14.
Harrington
,
J.
(
2010
).
Phonetic Analysis of Speech Corpora
(
Wiley-Blackwell
,
Chichester, UK
), pp.
1
424
.
15.
Herrgen
,
J.
(
1986
).
Koronalisierung und Hyperkorrektion. Das palatale Allophon des /CH/-Phonems und seine Variation im Westmitteldeutschen (Coronalisation and hypercorrection. The palatal allophone of the /CH/-phoneme and its variation in West Middle German
) (
Franz Steiner
,
Stuttgart, Germany
), pp.
1
278
.
16.
Hughes
,
G.
, and
Halle
,
M.
(
1956
). “
Spectral properties of fricative consonants
,”
J. Acoust. Soc. Am.
28
,
303
310
.
17.
Jannedy
,
S.
, and
Weirich
,
M.
(
2012
). “
Some aspects of individual speaking style features in Hood German
,” in
Proceedings of Speech Prosody
, Dublin (SP7), pp.
1
4
.
18.
Jannedy
,
S.
, and
Weirich
,
M.
(
2014a
). “
Sound change in an urban setting: Category instability of the palatal fricative
,”
Lab. Phonol.
5
(
1
),
91
122
.
19.
Jannedy
,
S.
, and
Weirich
,
M.
(
2014b
). “
Linguistic influences on diphthong realization of /ɔɪ/ in Hood German
,” in
Proceedings of the International Seminar on Speech Production (ISSP)
Cologne, Germany
, pp.
214
217
.
20.
Jannedy
,
S.
, and
Weirich
,
M.
(
2016
). “
The acoustics of fricative contrasts in two German dialects
,” in
Proceedings of Phonetics and Phonology in German Speaking Areas
, edited by
Chr.
Draxler
and
F.
Kleber
(
LMU
,
Munich, Germany
), pp.
70
73
.
21.
Jannedy
,
S.
,
Weirich
,
M.
, and
Brunner
,
J.
(
2011
). “
The effect of inferences on the perceptual categorization of Berlin German fricatives
,” in
Proceedings of the XVII ICPhS
,
Hong Kong
, pp.
962
965
.
22.
Jannedy
,
S.
,
Weirich
,
M.
, and
Helmeke
,
L.
(
2015
). “
Acoustic analyses of differences in [ç] and [ʃ] productions in Hood German
,” in
Proceedings of the 18th International Congress of Phonetic Sciences
, edited by
The Scottish Consortium for ICPhS 2015
(
University of Glasgow
,
Glasgow, UK
), pp.
1
5
.
23.
Jassem
,
W.
(
1995
). “
The acoustic parameters of Polish voiceless fricatives: Analysis of variance
,”
Phonetica
52
,
251
258
.
24.
Jongman
,
A.
,
Wayland
,
R.
, and
Wong
,
S.
(
2000
). “
Acoustic characteristics of English fricatives
,”
J. Acoust. Soc. Am.
108
,
1252
1263
.
25.
Labov
,
W.
(
1972
).
Sociolinguistic Patterns
(
University of Pennsylvania Press
,
Philadelphia, PA
), pp.
1
344
.
26.
Lenth
,
R. V.
(
2016
). “
Least-squares means: The R package lsmeans
,”
J. Statist. Software
69
(
1
),
1
33
.
27.
Li
,
F.
,
Edwards
,
J.
, and
Beckman
,
M. E.
(
2009
). “
Contrast and covert contrast: The phonetic development of voiceless sibilant fricatives in English and Japanese toddlers
,”
J. Phonetics
37
,
111
124
.
28.
Li
,
F.
,
Munson
,
B.
,
Edwards
,
J.
,
Yoneyama
,
K.
, and
Hall
,
K. C.
(
2011
). “
Language specificity in the perception of voiceless sibilant fricatives in Japanese and English: Implications for cross-language differences in speech-sound development
,”
J. Acoust. Soc. Am.
129
,
999
1011
.
29.
Mielke
,
J.
(
2008
).
The Emergence of Distinctive Features
(
Oxford University Press
,
Oxford
), pp.
1
304
.
30.
Newman
,
R. S.
(
2003
). “
Using links between speech perception and speech production to evaluate different acoustic metrics: A preliminary report
,”
J. Acoust. Soc. Am.
113
(
5
),
2850
2860
.
31.
Nowak
,
P. M.
(
2006
). “
The role of vowel transitions and frication noise in the perception of Polish sibilants
,”
J. Speech Hear. Res.
34
,
139
152
.
32.
Pierrehumbert
,
J.
(
2002
). “
Word-specific phonetics
,” in
Laboratory Phonology 7
(
Mouton de Gruyter
,
Berlin
), pp.
101
139
.
33.
R Development Core Team
(
2008
). “
R: A language and environment for statistical computing
,” R Foundation for Statistical Computing, Vienna, Austria, URL http://www.R-project.org (Last viewed 3/8/2016).
34.
Traunmüller
,
H.
(
1997
). “
Auditory scales of frequency representation
,” http://www.ling.su.se/staff/hartmut/bark.htm (Last viewed 3/8/2016).
35.
Watson
,
C. I.
, and
Harrington
,
J.
(
1999
). “
Acoustic evidence for dynamic formant trajectories in Australian English vowels
,”
J. Acoust. Soc. Am.
106
,
458
468
.
36.
Wedel
,
A.
(
2016
). (private communication).
37.
Wedel
,
A.
,
Jackson
,
S.
, and
Kaplan
,
A.
(
2013
). “
Functional Load and the Lexicon: Evidence that syntactic category and frequency relationships in minimal lemma pairs predict the loss of phoneme contrasts in language change
,”
Lang. Speech
56
,
395
417
.
38.
Weirich
,
M.
(
2012
). “
The influence of NATURE and NURTURE on speaker-specific parameters in twins' speech: Articulation, acoustics and perception
,” Ph.D. thesis,
Humboldt-Universität zu Berlin
,
Berlin
, pp.
1
284
.
39.
Wiese
,
H.
(
2009
). “
Grammatical innovation in multiethnic urban Europe: New linguistic practices among adolescents
,”
Lingua
119
,
782
806
.