Acoustic characteristics of sibilant fricatives and affricates in Mandarin-speaking children with cochlear implants 

: The purpose of the study was to examine the acoustic features of sibilant fricatives and affricates produced by prelingually deafened Mandarin-speaking children with cochlear implants (CIs) in comparison to their age-matched normal-hearing (NH) peers. The speakers included 21 children with NH aged between 3.25 and 10 years old and 35 children with CIs aged between 3.77 and 15 years old who were assigned into chronological-age-matched and hear-ing-age-matched subgroups. All speakers were recorded producing Mandarin words containing nine sibilant fricatives and affricates (/s, ˆ , (cid:2) , ts, ts h , t ˆ , t ˆ h , t (cid:2) , t (cid:2) h /) located at the word-initial position. Acoustic analysis was conducted to examine consonant duration, normalized amplitude, rise time, and spectral peak. The results revealed that the CI children, regardless of whether chronological-age-matched or hearing-age-matched, approximated the NH peers in the features of duration, amplitude, and rise time. However, the spectral peaks of the alveolar and alveo-lopalatal sounds in the CI children were signiﬁcantly lower than in the NH children. The lower spectral peaks of the alveolar and alveolopalatal sounds resulted in less distinctive place contrast with the retroﬂex sounds in the CI children than in the NH peers, which might partially account for the lower intelligibility of high-frequency consonants in children with CIs. V C 2023 Acoustical Society of America . https://doi.org/10.1121/10.0019803


I. INTRODUCTION
Among different types of speech sounds, fricatives and affricates pose a great difficulty during the acquisition process in typically developing children (Ingram et al., 1980).There has been abundant evidence showing that these two types of obstruent consonants emerge later and are not fully mastered until seven years of age or later, not only in English-speaking children but also in children from other language backgrounds (e.g., Cantonese, So and Dodd, 1995;English, Dodd et al., 2003;Prather et al., 1975;Smit et al., 1990;French, MacLeod et al., 2011;Korean, Kim et al., 2017;Mandarin, Hua and Dodd, 2000;Spanish, Goldstein et al., 2005;Turkish, Topbas ¸, 1997).
Similar to typically developing children with normal hearing (NH), children with cochlear implants (CIs) show delayed acquisition of fricatives and affricates (Blamey et al., 2001;Gaul Bouchard et al., 2007;Iyer et al., 2017;Serry and Blamey, 1999;Sundarrajan et al., 2020).Serry and Blamey (1999) conducted a four-year longitudinal study documenting the development of phonetic inventory in English-speaking children with CIs.The authors adopted targetless (at least two recognizable tokens of a phone produced in a speech sample) and target criteria (at least two tokens or 50% of the attempts of a phone produced correctly in intelligible words) to define the acquisition.The data showed that the fricative /s/ and affricate /D/ attained the targetless criterion at 36 months post-implantation, and the fricative /z/ and affricate /T/ reached the targetless criterion at 48 months post-implantation.However, the fricatives /Z/ and /h/ did not meet either criterion after 48 months postimplantation.A follow-up study by Blamey et al. (2001) reported that at six years post-implantation, some phones including /Z, s, z, T, h/ still did not attain the target criterion.In a more recent study by Iyer et al. (2017), the authors documented consonant acquisition in 13 children with CIs who received implantation before three years of age over the first two years post-implantation.The results showed that the fricatives /s/, /f/, /S/ and affricates /T/ and /D/ did not meet the requirement of mastery use (90% of group) and some fricatives /v, h, ð, z, Z/ did not meet the requirement of customary use (50% of group) after 24 months of extensive hearing experience.Sundarrajan et al. (2020) examined consonant development in 129 CI children implanted between 6 and 38 months.The children were tested at 3.5 and 4.5 years old.Again, the data revealed that fricatives and affricates were acquired later and were produced with lower accuracies than stops, nasals, and approximants.These phonological studies suggest that fricatives and affricates are challenging to acquire not only for children with NH but also for children with CIs.
Among the many factors involved, such as input frequency in the ambient language, function load, physiological constraint, etc. (Stokes and Surendran, 2005), the gradual development of oromotor control and the articulatory complexity of individual sounds provide a biological account to explain the similar acquisition trend observed in different languages and the delayed acquisition of fricatives and affricates (Davis and MacNeilage, 1995;MacNeilage et al., 2000).Kent (1992) analyzed the physiological characteristics and motoric adjustments for the four sets of English consonants described in Sander (1972).Set 1 includes /p, m, n, w, h/; set 2 includes /b, d, k, g, j, f/, set 3 includes /t, ò, l, ˛/; and set 4 includes /s, z, S, Z, v, h, ð, T, D/.The sounds in higher number sets reflect a higher level of articulatory complexity and increasing motor control.For example, set 4 requires tongue configuration and fine force regulations for fricatives at different places.Among various consonants in different languages, most fricatives and affricates present a high level of articulatory complexity.Fricative production in places at dental, alveolar, and palatal regions require a precise airway configuration and fine force regulation for turbulence generation.Affricate production starts with a stop closure that releases into a fricative.This two-target articulatory sequence and a rapid transition from one target to the other present a special challenge during the acquisition process.As children with CIs share the same intact articulatory structure as children with NH, the articulatory constraints associated with the delayed acquisition of fricatives and affricates in children with NH would likely be experienced by children with CIs too.
English has a rich number of fricatives but only two affricates (/T/ and /D/) that are produced near the palatal region.Of the nine fricatives in English, the four sibilants (/s, z/ and /S, Z/) form a two-way place contrast.By contrast, Mandarin has an alveolar-alveolopalatal-retroflex threeway place contrast for both fricatives (/s/, /ˆ/, /ó/) and affricates (/ts/, /ts h /, /tˆ/, /tˆh/, /tó/, /tó h /).The articulation of Mandarin sibilants has been documented in previous studies (Lee, 1999;Ladefoged and Wu, 1984;Stokes and Zhen, 1998;Toda and Honda, 2003).According to Ladefoged and Wu, (1984), Mandarin /s/ is produced with the tongue tip touching the teeth or alveolar ridge.Compared with other alveolar sounds such as /t/ or /t h /, /s/ shows a hollowing of the tongue.Mandarin /ó/ is labeled as retroflex but it is produced with the upper surface of the tongue tip forming a constriction at the center of the alveolar ridge, which differs from true retroflex sounds that have the tongue tip curling up and backward.Mandarin /ˆ/ has a different tongue shape from /s/ and /ó/ in that it has a relatively long and flat constriction channel higher in the mouth.Based on the palatograms and linguagrams taken from speakers of Beijing Mandarin, Lee (1999) described Mandarin /s/ as a plain apical or denti-alveolar or alveolar fricative, /ó/ as an apical or upperapical postalveolar fricative, and /ˆ/ as a laminal or anterodorsal postalveolar fricative.As for the articulatory features of Mandarin affricates, researchers found that the unaspirated and aspirated affricates are similar to each other (Ladefoged and Wu, 1984).Affricates are similar to the corresponding fricatives (Ladefoged and Wu, 1984;Lee, 1999) but the articulatory contacts of affricates are slightly fronted on the upper articulator in comparison to corresponding fricatives (Lee, 1999).
Acoustically, the frication noise of fricatives and affricates at different places is characterized by a number of features including frication duration, spectral peak location, spectral moments, spectral slope, F2 onset frequency, normalized amplitude, and relative amplitude (Gordon et al., 2002;Jongman, 2000;Lee et al., 2014;Li and Munson, 2016;Shadle, 1985).While there is no consensus regarding whether there is one single acoustic measure to distinguish all fricative contrasts and which measure it is, it is well established that the acoustic information regarding the place of articulation is mainly represented in spectral peak location and spectral moments (Jongman et al., 2000;Gordon et al., 2002;Lee et al., 2014).Spectral peak specifies the highest amplitude point of the noise spectrum and spectral moments (i.e., spectral mean, spectral variance, skewness, and kurtosis) define the distribution of acoustic energy and the shape of the noise spectrum.Generally, frication noise produced with the constriction formed at a more fronted position has the spectral energy of fricative noise concentrated at a higher frequency region (Strevens, 1960).Compared with non-sibilant fricatives, sibilant fricatives have greater acoustic energy.Of the two places of English sibilants, /s/ has a spectral peak of around 7000 Hz and a spectral mean of over 6000 Hz, and /S/ has a spectral peak and spectral mean of around 4000 Hz (Jongman et al., 2000).According to Lee et al. (2014), of the three places of Mandarin sibilant fricatives, the average spectral peak is around 7600, 6500, and 3800 Hz for the alveolar /s/, the alveolopalatal /ˆ/, and the retroflex /ó/, respectively.
Distinct from fricatives, affricates are characterized by a shorter amplitude rise time from the onset of frication to the maximum frication amplitude because there is a stop component preceding the frication portion.While English has only one pair of voiced and voiceless palatal-alveolar affricates, Mandarin has six that are composed of three unaspirated-aspirated pairs produced at alveolar, alveolopalatal, and postalveolar regions.Li and Gu (2016) reported that the spectral peak and spectral mean of the frication noise was 7000-8000 Hz for Mandarin alveolar affricates /ts, ts h /, 5000-6000 Hz for the alveolopalatal affricates /tˆ, tˆh/, and around 4000 Hz for the retroflex affricates /tó, tó h /.The amplitude analysis showed that among the three places, the alveolopalatal affricates had the highest amplitude and the alveolar affricates had the lowest amplitude.The analysis of the normalized duration of the stop closure and frication noise revealed a significantly longer noise duration in the aspirated affricates than in the unaspirated ones.Among the three places, the retroflex affricates had a longer closure duration than the sounds of the other two places.
The articulation and acoustic analysis on adult speakers revealed the nature of high-frequency sounds for sibilant fricatives and affricates.Previous studies have reported that children with CIs experience difficulties in differentiating and recognizing high-frequency sounds (Grandon and Vilain, 2020;Liker et al., 2007;Todd et al., 2011;Reidy et al., 2017), which might be largely due to the coarse representation of high-frequency information in CIs as well as the common basal shift of CI stimulations (Zhou et al., 2010).Presumably, the lack of spectral resolution for place identity in the auditory input would also be reflected in the speech production of children with CIs.So far, a few studies examined the acoustic representations of fricative productions in children with CIs speaking English (Bharadwaj et al., 2006;Reidy et al., 2017;Todd et al., 2011;Uchanski and Geers, 2003), Croatian (Liker et al., 2007;Mildner and Liker, 2008), Mandarin (Yang et al., 2017), French (Grandon and Vilain, 2020), and bilingual English-Spanish (Li et al., 2017).While these studies varied in research design and participant characteristics, a consensus finding was that children with CIs were less likely to produce place contrasts and tended to show atypical spectral frequency features for fricatives with reference to their NH peers.
Although people have gained some knowledge about speech characteristics of fricatives in children with CIs, little research has been done on affricates.The three-way place contrast in Mandarin provides a valuable context to investigate how the sparse auditory input would affect the production and development of consonants in children with CIs.The acquisition of sibilant fricatives and affricates in typically developing Mandarin-speaking children has been documented in several studies (Li and Munson, 2016;Hua and Dodd, 2000;Si, 2006).Hua and Dodd (2000) studied the phonetic development of 129 Mandarin-speaking children aged between 1;6 and 4;6.Among the Mandarin sibilant fricatives and affricates, /ˆ/ was acquired relatively early before three years of age./s, tˆ, tˆh/ were acquired between 4;1 and 4;6, and /ó, tó, tó h , ts, ts h / were not acquired until 4;7 and older.In another study on phonetic development of Mandarin-speaking children, Si (2006) reported that /ˆ/ was the earliest acquired fricative that was followed by / tˆ, tˆh/.However, the productions of /s, ó, ts, ts h , tó, tó h / had not been stabilized and these sounds were not fully acquired until five years of age.More recently, Li and Munson (2016) carried out an auditory-based transcription analysis and acoustic-based statistical analysis to examine the development of sibilant fricatives /s, ó, ˆ/ in Mandarin-speaking children aged between two and five years old.Two acoustic parameters: spectral mean and F2 onset were used for the statistical modeling.Both phonetic and acoustic analyses revealed the earliest acquisition of /ˆ/, followed by /ó/, and the last mastery of /s/.In addition, the acoustic data revealed a gradual separation process of the three phonetic categories in the acoustic space.
With regard to Mandarin-speaking children with CIs, Peng et al. (2004) examined the production accuracy of word-initial consonants produced by 30 Mandarin-speaking children with CIs aged between 6 and 12.5 years old with an average length of device experience of 3.58 years.Compared to stops and nasals, fricatives and affricates showed lower accuracies.Among the 21 consonants, /s/, / ts h /, and /tó h / had the lowest accuracies.Yang et al. (2017) compared the amplitude and spectral features of four fricatives /f, s, ˆ, ó/ in 14 children with CIs aged between 2.9 and 8.3 years old and 60 age-matched children with NH.The children with CIs showed less distinctive acoustic features among different places and differed from NH peers in both amplitude and spectral properties.So far, no data has been reported on the acoustic analysis of affricate production in Mandarin-speaking children with CIs.Moreover, the children with CIs in Yang et al. (2017) study had a relatively short duration of CI use.It remains unknown whether children with CIs would show improvement in fricative/affricate production with a longer duration of auditory experience.Given that these two types of consonants are late-acquired sounds that require complex articulatory control, one possible outcome is that children with CIs will show continuing improvement in fricative and affricate production with improved articulatory control and increased language experience.The other possible outcome is that they will remain worse-than-normal performance on fricative/affricate production and still deviate from NH targets, due to the inherent deficit of spectral distortion caused by the device.In a recent study by Yang et al. (2023), productions of 17 Mandarin obstruent consonants including six stops, five fricatives, and six affricates from 22 NH children and 35 children with CIs were identified by 100 naive adult listeners.The children with CIs received the implantation at an average age of 3.3 years old and had an average length of device use of 5.04 years.The results revealed that the children with CIs, regardless of chronological-age-matched or hearing-age-matched with the NH controls, showed lower intelligibility for all tested obstruent consonants.The major difficulty was in the sibilant fricatives and affricates.Of the three places (alveolar, alveolopalatal, and retroflex), the children with CIs had the lowest intelligibility in the alveolar sounds.In addition, the adult listeners showed a confusion pattern of the sibilants produced by the CI children different from that by the NH controls.
To further examine how the difficulties in obstruent production are reflected in the spectral and temporal features, the present study was carried out to characterize the acoustic representation of sibilant fricatives and affricates produced by the same cohort of CI participants and NH controls in Yang et al. (2023).The research aim is twofold: (1) To investigate whether and to what extent children with CIs deviate from NH peers in the acoustic properties of sibilant fricatives and affricates; (2) To examine whether the acoustic features of sibilants produced by children with CIs show manner, place, and aspiration contrasts.

A. Speakers
The participants included 21 children with NH (13 girls and 8 boys) aged between 3.25 and 10 years old (mean, M ¼ 6.29 years, standard deviation, SD ¼ 1.62 years) and 35 children with CIs (17 girls and 18 boys) aged between 3.77 and 15 years old (Mean ¼ 8.22 years, SD ¼ 2.58 years).All children were recruited in the Beijing area, China.The children with CIs were all prelingually deafened and received unilateral implantation before eight years of age (Mean ¼ 3.30 years, SD ¼ 1.55 years).The length of CI-use varied between 0.08 and 9.49 years (Mean ¼ 5.04 years, SD ¼ 2.50 years).Due to the wide range of chronological age in the children with CIs, they were assigned into chronological-age-matched and hearing-age-matched subgroups (see Table I

B. Speech materials and recording
The materials used for speech sample elicitation were 27 disyllabic and trisyllabic words (see Appendix A) composed of Mandarin sibilant fricatives and affricates /s, ˆ, ó, ts, ts h , tˆ, tˆh, tó, tó h / followed by three Mandarin corner vowels /a, i, u/.Due to the phonotactic constraints in Mandarin, /ia/, /i/, and /y/ were used for the alveolopalatal fricative and affricates.The allophonic variant /-/ was used to substitute /i/ for the alveolar sounds, and the allophonic variant /-/ was used to substitute /i/ for the retroflex sounds.All target fricatives and affricates occupied the onset position of the first CV syllable.The vocabulary level of young children and the picturability of stimuli were taken into consideration when developing the word list.The lexical tone of the target syllables was not strictly controlled but an effort was made to choose syllables in tone 1 to reduce the potential impact of lexical tone on segment features.
Productions of the tested words were elicited through a visual-auditory word repetition task in a quiet room.Participants were seated in front of a laptop computer through which pictures showing the tested words were displayed on the screen.For each word, an audio prime produced by a native Mandarin speaker was played immediately after the picture was shown.The pictures and the accompanied audio prompts were randomized and played using a custom MATLAB program.The participants were asked to repeat each word once, which was recorded on a digital recorder (Zoom H4n, Zoom North America, Hauppauge, NY) with a 44.1 kHz sampling rate and a 16-bit quantization rate.Prior to the real testing, practice trials composed of five Mandarin words were conducted to familiarize the participants with the recording procedure.The words used in the practice trials were not included in the real testing.The recorded samples were transferred to a desktop hard disk with the same settings.Then, the recordings were segmented into individual words and saved in .wavformat for further acoustic analyses.Excluding 144 missing tokens and 31 tokens due to the peak-clipping issue, overlapped voices, poor recording, etc., a total of 1337 tokens were obtained for further analysis.

C. Acoustic analysis
Prior to the acoustic analysis, a phonetically-trained experimenter conducted a perception screening.The first syllable of all segmented words was selected, randomized, and presented to the experimenter who was required to identify whether the initial consonant of each target syllable was produced as a fricative/affricate or not.The purpose of this procedure was to exclude the segments that were produced as stops or sonorants because those were not appropriate for frication noise spectrum analysis.Therefore, as long as the production was perceived as a fricative or affricate, whether or not produced as the intended sound, the production was kept for acoustic analysis.Of all recorded productions, 116 tokens (8.68%) were removed after the perception screening.A total of 1221 tokens were used and down-sampled to 20 kHz for subsequent acoustic analysis.
The acoustic analysis focused on the duration and normalized amplitude of the consonants and amplitude rise time and spectral peak of the frication noise of the target sounds.In order to avoid the interference caused by lowfrequency electrical current sounds and other potential noise, high-pass filtering with a 1000-Hz cut-off was applied.For each syllable containing the target sounds, the onsets and offsets of the fricative/affricate and the following vowel were manually located using Adobe Audition 3.0.The landmark locations of the stop and fricative components of affricates were marked separately.Based on the landmark locations, the duration of fricative was defined as the start of frication to the start of vowel periodicity.The duration of affricate was defined as the start of the stop burst to the start of vowel periodicity.Note that not all affricates were produced with an observable burst.For the affricates with no visible stop burst, the affricate duration was the same as the duration of the fricative component.Normalized amplitude was defined as the difference in dB between the root-meansquare (rms) amplitude of the target consonant and the rms amplitude of the subsequent vowel portion in the same syllable.Amplitude rise time and spectral peak were obtained from the frication noise of the target sounds.Rise time was defined as the time interval from the beginning of the frication to the point of peak intensity.The spectral peak was computed based on fast Fourier transforms (FFT) conducted on a 40 ms hamming window in the middle of frication noise.Note that some unaspirated affricates had a very short frication noise.As reported by Wu (1986), Mandarin unaspirated affricates such as /tˆ/ and /tó/ have a very short duration (< 50 ms) when followed by certain vowels such as /ia/ and /a/.In such cases, a 20-ms hamming window was used when the duration of frication noise was shorter than 40 ms.
The spectral peak was measured as the highest amplitude peak on the FFT spectrum.

D. Statistical analysis
The acoustic data were fitted with Linear Mixed-effects models (LMM) in SPSS (Statistical Package for the Social Sciences, IBM Corp. version 28, 2021).to compare the group difference between the NH and CI children.The CA and HA subgroups were compared with the NH controls separately.No comparison was made between the CA and HA subgroups.For each LMM model, the factors of group (NH vs CA or NH vs HA), place (alveolar, alveolopalatal, retroflex), and type (fricative, unaspirated affricate, aspirated affricate) were defined as fixed effects with a full factorial design including all interactions.Participants and items were defined as random effects with by-subject intercept and by-item intercept included.Because Mandarin affricates present an unaspirated vs aspirated contrast but Mandarin fricatives do not have this contrast, the factor of consonant type included both manner and aspiration contrasts.
To address the second research question of whether children with CIs differentiate manner, place, and aspiration features in their production of Mandarin sibilant fricatives and affricates, an LMM was conducted on each of the four acoustic properties for children with CIs only.Close observation of the participants' characteristics (summary in Table I) showed that almost all tested children with CIs received implantation before five years of age with at least two years of CI device use (only three children had the age at implantation older than 6 and three children had the length of CI use shorter than two years).For this analysis, the children with CIs were tested as one group with no further division.The place (alveolar, alveolopalatal, retroflex), type (fricative, unaspirated affricate, aspirated affricate), and the interaction between these two factors were defined as the fixed effects.Participants and items were set as the random effects with by-subject and by-item intercepts included.

III. RESULTS
Figure 1 presents the durations of the tested fricatives and affricates in children with NH and CIs (see Appendix B for the numeric values of all tested acoustic measures).The children with CIs, regardless of CA-matched or HA-matched subgroups, approximated the NH controls in the temporal features for all tested sibilant fricatives and affricates.Of the nine sibilants, the fricatives were longer than the affricates and the aspirated affricates were longer than the unaspirated affricates in both NH and CI children.An LMM was implemented to compare the nine sibilants between the NH and each CI subgroup.As for the NH vs CA comparison, the results revealed no group difference or group-related interactions but a significant main effect of type [F(2, 962.9) ¼ 534.40, p < 0.001] and a significant place-by-type interaction [F( 4 similarly to the NH vs CA comparison, no significant group effect or group-related interactions were found.However, there was a significant type effect [F(2, 1004.3)¼ 621.74, p < 0.001] and a significant place-by-type interaction [F(4, 1001.8)¼ 15.30, p < 0.001].Pairwise comparison results revealed that there were significant differences between the fricatives and the affricates (both unaspirated and aspirated) as well as between the unaspirated and aspirated affricates for both NH vs CA and NH vs HA comparisons (all p < 0.001).
The average duration of each target sibilant in the CI children is presented in Fig. 2. The children with CIs, as a group, produced longer durations for the fricatives than for the affricates.Also, the durations of the aspirated affricates were considerably longer than those of the unaspirated affricates.Within each type, the duration varied with the place of articulation.However, the durational change with the place was different between the unaspirated and aspirated affricates as well as between the fricatives and the affricates.The LMM on the consonant duration of the CI children revealed a significant type effect [F(2, 713.2) ¼ 356.55, p < 0.001], and place-by-type interaction effect [F(4, 706.0) ¼ 10.90, p < 0.001].The pairwise post hoc analysis revealed a significant difference between the fricatives and the unaspirated affricates (p < 0.001), and between the fricatives and the aspirated affricates (p < 0.001).In addition, the durations of the unaspirated and aspirated affricates showed a significant difference (p < 0.001).
Figure 3 shows the normalized amplitude of the tested fricatives and affricates in the children with NH and CIs.Of the nine sibilant consonants, the normalized amplitude varied as a function of manner, place, and aspiration features.The unaspirated affricates presented a lower amplitude than the fricatives and aspirated affricates.Of the three places, the alveolar sounds /s, ts, ts h / showed a lower amplitude than the alveolopalatal /ˆ, tˆ, tˆh/ and retroflex /ó, tó, tó h /.The two CI subgroups were similar to the NH controls for most sounds except for /tˆ/ and /ts h /.For both NH vs CA and NH vs HA comparisons, the LMM revealed no significant group effect but significant main effects of type [NH vs CA: F(2,948.3)¼ 41.0 p < 0.001; NH vs HA: F(2,994.7)¼ 52.66, p < 0.00] and place [NH vs CA: F(2,12.61)¼ 9.82, p ¼ 0.003; NH vs HA: F(2,10.43)¼ 13.01, p ¼ 0.001)].As for the interaction effects, a significant place-by-type interaction was yielded for both NH vs CA [F(4,946.3)¼ 5.06, p < 0.001] and NH vs HA [F(4,994.2) ¼ 5.81, p < 0.001] comparisons.A significant group-by-type interaction [F(2,996.9)¼ 3.96, p ¼ 0.019] was also found in the NH vs HA comparison.Pairwise comparisons suggested that for both NH vs CA and NH vs HA comparisons, the amplitude of the alveolar sounds was significantly lower than that of the retroflex sounds (all p < 0.001).The pairwise comparison results also demonstrated a significant difference between the fricatives and the affricates (both unaspirated and aspirated) as well as between the unaspirated and aspirated affricates for the NH vs CA comparison (all p < 0.05).For the NH vs HA comparison, the pairwise comparison yielded a significant difference between the fricatives and the unaspirated affricates and between the unaspirated and aspirated affricates (both p < 0.001).The normalized amplitude of the tested sibilant consonants in the children with CIs is displayed in Fig. 4. Of the three places, the alveolar sounds had a lower normalized amplitude than the sounds of the other two places.Of the two manners of articulation, the fricatives were produced with a higher amplitude than the unaspirated affricates.As for the aspiration contrast, the children with CIs produced the unaspirated affricates with a lower amplitude than they did for the aspirated affricates.The LMM confirmed these observations.In particular, there was a significant place effect [F(2,10.94) ¼ 7.80, p ¼ 0.008], type effect [F(2,700.6)¼ 48.0, p < 0.001], and place-by-type interaction effect [F(4,697.7)¼ 6.02, p < 0.001].The pairwise comparisons revealed a significantly lower amplitude of the alveolar sounds /s, ts, ts h / than the retroflex sounds /ó, tó, tó h / (p < 0.001).For the aspiration and manner comparisons, the post hoc analysis revealed a significant difference between the unaspirated and aspirated affricates and a significant difference between the fricatives and the unaspirated affricates (both p < 0.001).No significant difference was yielded between the fricatives and the aspirated affricates.
Figure 5 displays the amplitude rise time of the frication noise in the children with NH and CIs.Of the nine tested sibilants, the fricatives showed the longest rise time and the unaspirated affricates showed the shortest rise time.The LMM revealed no group difference nor group-related interactions for both NH vs CA and NH vs HA comparisons.However, both comparisons showed a significant type effect [NH vs CA: F(2,953.6)¼ 300.39, p < 0.001; NH vs HA: F(2,993.2) ¼ 353.94, p < 0.001].Pairwise comparisons revealed significant differences between the fricatives and the affricates (both unaspirated and aspirated) as well as between the unaspirated and aspirated affricates for both NH vs CA and NH vs HA comparisons (all p < 0.001).
The amplitude rise time of the frication noise of the tested sibilant consonants in the children with CIs is demonstrated in Fig. 6.Consistent with the finding of longer rise time for fricatives than affricates in NH people, our data showed the same trend in the children with CIs.Our data also revealed that the rise time of the aspirated affricates was longer than that of the unaspirated affricates.The LMM revealed a significant type effect [F(2,703.8)¼ 196.45, p < 0.001] and type-by-place interaction effect [F(4,697.2) ¼ 7.99, p < 0.001].No significant place difference was yielded.The pairwise comparisons showed a significantly longer rise time in the fricatives than in both types of affricates and a significantly longer rise time in the aspirated affricates than in the unaspirated affricates (all p < 0.001).
Figure 7 presents the spectral peak of the nine tested fricatives and affricates in the children with NH and CIs.As shown in the NH group, not much difference was observed between the fricatives and the affricates, but the spectral peaks of the alveolar sounds /s, ts, ts h / and the alveolopalatal sounds /ˆ, tˆ, tˆh/ were higher than those of the retroflex sounds /ó, tó, tó h /.Compared with the NH controls, the children with CIs, regardless of CA-matched or HA-matched, demonstrated evidently lower spectral peaks for the alveolar and alveolopalatal fricatives and affricates.The LMM revealed a significant group difference for both NH vs CA [F(1,43.5)¼ 17.68, p < 0.001]  and NH vs HA [F(1,45.1)¼ 24.22, p < 0.001] comparisons.Meanwhile, both comparisons showed a significant type effect [NH vs CA: F(2, 899.5) ¼ 4.53, p ¼ 0.011; NH vs HA: F(2,950.5)¼ 6.80, p ¼ 0.001)] and place effect [NH vs CA: F(2,12.2) ¼ 17.14, p < 0.001; NH vs HA: F(2,10.8)¼ 22.51, p < 0.001].No significant interactions were found except for a significant group-by-place interaction [F(2, 954.5) ¼ 4.31, p ¼ 0.014] for the NH vs CA comparison.Post hoc pairwise comparisons revealed that the place difference was manifested between the retroflex and alveolar sounds for both NH vs CA and NH vs HA comparisons (all p < 0.001).In addition, there was a manner difference between the fricatives and the aspirated affricates on the spectral peak for both NH vs CA and NH vs HA comparisons (all p < 0.05).
The spectral peak of the frication noise in the tested sibilant consonants of the children with CIs is shown in Fig. 8.The place difference was clearly shown and reflected as a lower spectral peak in the retroflex sounds /ó, tó, tó h / than in the sounds of the other two places.No evident difference was observed between the alveolar sounds /s, ts, ts h / and alveolopalatal sounds /ˆ, tˆ, tˆh/.The LMM revealed a significant place effect [F(2, 11.4) ¼ 8.19, p ¼ 0.006] and type effect [F(2, 680.1) ¼ 4.45, p ¼ 0.012] but no significant interaction effect.The pairwise comparisons revealed a significantly lower spectral peak in the retroflex sounds than in the alveolar sounds (p < 0.001) and revealed a significant difference between the fricatives and the aspirated affricates.

IV. DISCUSSION
The present study examined the acoustic characteristics including duration, normalized amplitude, amplitude rise time, and the spectral peak of sibilant fricatives and affricates produced by Mandarin-speaking children with CIs.The purpose was to identify whether children with CIs, after a relatively long period of CI use, produce NH-like acoustic properties in sibilant consonants and whether they could differentiate place, manner, and aspiration features in their production of sibilant sounds.The first research question focused on the comparison between CI and NH children.Due to the relatively large age range of the CI participants, they were assigned into two subgroups that were either CA-matched or HA-matched to the NH controls.For three out of the four tested acoustic properties (i.e., duration, amplitude, and rise time), the children with CIs, including both CA-matched and HA-matched subgroups, showed no significant difference from the NH controls.The major group difference resided only in the spectral peak.As shown in Fig. 7, the NH children produced the alveolar and alveolopalatal sibilants /s, ts, ts h , ˆ, tˆ, tˆh/ with higher spectral peaks than the retroflex sounds /ó, tó, tó h /.This result was expected because according to the acoustic theory of speech production, a more fronted constriction point is associated with a higher spectral prominence.Compared with the NH children, the CI children, including both CA and HA subgroups, produced sibilants, especially the alveolar and alveolopalatal sibilants with lower spectral peaks than the NH peers.In the meantime, the children with CIs approximated the NH peers in the retroflex sounds that are characterized by acoustic energy concentrated in a relatively low-frequency region.Therefore, compared to the NH children, the children with CIs showed less distinctive spectral peaks among the three places of articulation.It is known that the place difference of obstruents is mainly reflected in spectral characteristics like the spectral peak and spectral mean (Jongman et al., 2000;Lee et al., 2014).The less distinctive spectral peak of the three places echoed the finding of highly confused three-way place contrast in Mandarin sibilants reported in the perception study (Yang et al., 2023).Furthermore, the perception data of Yang et al. (2023) revealed that the adult listeners experienced greater confusion in identifying the high-frequency sibilants such as /s, ts, ts h / than they did for the retroflex /ó, tó, tó h /.These findings together suggested that the major challenge of sibilant consonants in children with CIs resides in the high-frequency sounds, not in the low-frequency sounds.This reflected the inherent deficits with high-frequency sounds in CI devices (Loizou, 2006;Reidy et al., 2017).
In addition to the comparison between the CI and NH children, whether the CI participants produced place, manner, and aspiration contrasts is of particular interest in the present study.Our data revealed that the CI children showed significant differences in the tested acoustic properties among the three places, between the fricatives and affricates, and between the unaspirated and aspirated affricates.For the consonant duration, aspirated consonants usually have a longer duration than unaspirated consonants in the same place and manner, due to the extra airflow.In the present study, the children with CIs also produced significantly longer durations for the aspirated affricates than for the unaspirated affricates.Meanwhile, they produced longer durations for the fricatives than for the aspirated affricates, which might be because the stop component in the affricates shortened the duration of the whole unit.Of the four examined acoustic properties, the normalized amplitude is associated with respiratory and airflow control during speech articulation.Previous studies have shown that the amplitude of aspiration noise or frication noise can be used to differentiate voiced-voiceless contrast (Repp, 1979;Nirgianaki, 2014), sibilant from non-sibilant fricatives (Nirgianaki, 2014), place of articulation in fricatives (Behrens and Blumstein, 1988;Jongman et al., 2000).Due to the lack of extra airflow, unaspirated sounds usually show a lower amplitude than aspirated sounds sharing the same place and manner (Li and Gu, 2016).In the current study, the children with CIs produced the unaspirated affricates with a lower amplitude than they did for the aspirated affricates and fricatives.Consistent with the findings of lower amplitude in Mandarin alveolar fricative than in the alveolopalatal and retroflex fricatives reported in previous studies (Lee et al., 2014), the children with CIs produced alveolar fricative and affricates with lower amplitude than they did for the sounds of the other two places.
Of the other two features, amplitude rise time plays a key role in differentiating fricatives and affricates of the same place (Dorman et al., 1980;Howell and Rosen, 1983;Kluender and Walsh, 1992;Mahmoodzade and Bijankhan, 2007;Mitani et al., 2006).In particular, fricatives are characterized by a much longer rise time than affricates (Howell and Rosen, 1983;Mahmoodzade and Bijankhan, 2007).In the present study, the amplitude rise time of the frication noise in the fricatives was significantly longer than that in the unaspirated and aspirated affricates.This result indicated that the children with CIs could reliably produce the manner contrast in their speech production.In addition, as Mandarin affricates are characterized by an aspirated vs unaspirated contrast, the extra airflow in aspirated affricates causes longer amplitude rise time in aspirated affricates than in unaspirated affricates.This pattern was also manifested in the production of children with CIs.Referring to the perception data in Yang et al. (2023), the adult listeners' confusion about the productions of obstruent consonants produced by the CI children was predominately manifested in the threeway place contrast, but rarely across different manners or between unaspirated and aspirated consonants.The perception and acoustic data, together, indicate that children with CIs could reliably distinguish the manner feature and aspiration feature in their production of sibilant consonants.
For the last examined feature of spectral peak, our data showed that the retroflex sounds were produced with a significantly lower spectral peak than the sounds of the other two places in the children with CIs.The comparison between the NH and CI groups revealed that the CI children produced significantly lower spectral peaks for the alveolar and alveolopalatal sounds although they produced retroflex sounds with spectral peaks similar to the NH peers, which resulted in less distinctive spectral peaks among the three places.The perception data in Yang et al. (2023) revealed that the CI children's productions of Mandarin sibilants at the three places, especially the alveolar sounds, were identified with very low accuracy and demonstrated considerable confusion with the retroflex sounds.These findings together suggested that even though there was a statistically significant difference in the spectral peak among the three places, the difference was not great enough to avoid perceptual confusion.In previous studies that examined the influence of frequency compression on fricative recognition by Mandarin-speaking NH listeners and listeners with hearing loss (Chen et al., 2020, Qi et al., 2021;Yang et al., 2018), the authors found that when the spectral features of highfrequency sounds, alveolar /s/ and alveolopalatal /ˆ/, were modified by frequency lowering and became less distinctive from the retroflex /ó/, both NH and hearing-impaired listeners showed increased difficulty and more confusion in recognizing and identifying the Mandarin fricatives, especially the alveolar /s/.
In general, the acoustic analyses of the speech data revealed that the children with CIs produced manner and aspiration contrasts for Mandarin fricatives and affricates.Compared with the NH targets, the children with CIs showed NH-like patterns on consonant duration, normalized amplitude, and frication noise amplitude rise time.However, the spectral peaks of the alveolar and alveolopalatal sounds in the CI children were significantly lower than those of the NH children and were less distinctive from the retroflex sounds in comparison to that of the NH peers.The reduced distinction of spectral peak among the three places of Mandarin sibilants produced by the children with CIs might have resulted in the lowered intelligibility of those high-frequency consonants produced by them.
The findings of the present study provide valuable information in the development of CI processor technology and bear important clinical implications in speech training and aural rehabilitation for children with CIs.The CI participants in the present study were prelingually deafened and most of them received implantation at a young age.Our data revealed that even after a relatively long period (group average of 5.04 years) of CI device use, they did not produce NH-like spectral features for place contrasts, and their production of highfrequency sounds was poorly recognized by NH listeners (Yang et al., 2023).This is largely due to the low spectral resolution of high-frequency information in current CI processing.In order to improve CI users' perception of high-frequency sounds, a higher spectral resolution should be used to better deliver the fine-grained spectral features for place information.Clinically, because it is challenging for CI children to establish an accurate prototype through auditory exposure to highfrequency sounds, traditional aural and perceptual training may not be effective in improving CI children's production of sibilant fricatives and affricates.New techniques that can visualize the articulatory gestures of sibilants should be adopted to help CI users develop appropriate articulatory targets for the highfrequency sounds.
Although informative, the present study bears several limitations.First, we had a relatively small pool of CI participants.Due to the lack of enough CI participants who received implantation at a later age or had a short period of CI device use, we could not examine whether and how the acoustic properties of sibilants change in children with CIs.For future studies, a cross-sectional study with a larger sample size or a longitudinal study with a relatively small sample size should be conducted to keep track of the acoustic development in the pediatric CI population.Another caveat is that we used designed speech materials and played audio prompts to elicit speech samples from both NH and CI participants, which did not reflect their speech production in a natural setting.For future studies, speech samples should also be collected from connected or spontaneous speech in a more natural setting such as play or conversation.The comparison of speech acoustic in isolation vs connected speech in a controlled vs natural setting will help us understand whether and how speech properties are affected by other factors such as speaking style, phonetic environment, etc. in children with CIs.This type of data can also help researchers and clinicians better understand the speech abilities of pediatric CI users in everyday life.
Figure1presents the durations of the tested fricatives and affricates in children with NH and CIs (see Appendix B for the numeric values of all tested acoustic measures).The children with CIs, regardless of CA-matched or HA-matched subgroups, approximated the NH controls in the temporal features for all tested sibilant fricatives and affricates.Of the nine sibilants, the fricatives were longer than the affricates and the aspirated affricates were longer than the unaspirated affricates in both NH and CI children.An LMM was implemented to compare the nine sibilants between the NH and each CI subgroup.As for the NH vs CA comparison, the results revealed no group difference or group-related interactions but a significant main effect of type [F(2, 962.9) ¼ 534.40, p < 0.001] and a significant place-by-type interaction [F(4, 956.8) ¼ 17.35, p < 0.001].With regard to the NH vs HA comparison, FIG. 2. (Color online) Box plots showing consonant duration of Mandarin fricatives and affricates in children with CIs derived from the subject mean duration aggregated across vowel contexts.The box shows the 25 to 75 percentile and the horizontal line in the box shows the median of the data.The whiskers show the range of the data.The symbols show individual data points.Each symbol represents the average duration aggregated across vowel contexts of each child.

FIG. 3
FIG. 3. (Color online) Mean and standard error of normalized amplitude of Mandarin fricatives and affricates in the normal-hearing (NH) children, chronological-age-matched (CA) CI, and hearing-age-matched (HA) CI children.
FIG. 4. (Color online) Box plots showing normalized amplitude of Mandarin fricatives and affricates in children with CIs derived from the subject mean amplitude aggregated across vowel contexts.The box shows the 25 to 75 percentile and the horizontal line in the box shows the median of the data.The whiskers show the range of the data.The symbols show individual data points.Each symbol represents the average amplitude aggregated across vowel contexts of each child.

FIG. 5
FIG. 5. (Color online) Mean and standard error of amplitude rise time of the frication noise in Mandarin fricatives and affricates in the normal-hearing (NH) children, chronological-agematched (CA) CI, and hearing-agematched (HA) CI children.
FIG. 6. (Color online) Box plot showing amplitude rise time of the frication noise in Mandarin fricatives and affricates in children with CIs derived from the subject mean rise time aggregated across vowel contexts.The box shows the 25 to 75 percentile and the horizontal line in the box shows the median of the data.The whiskers show the range of the data.The symbols show individual data points.Each symbol represents the average rise time aggregated across vowel contexts of each child.

FIG. 7
FIG. 7. (Color online) Mean and standard error of spectral peak of the frication noise in Mandarin fricatives and affricates in the normal-hearing (NH) children, chronological-age-matched (CA) CI, and hearing-age-matched (HA) CI children.

FIG. 8
FIG. 8. (Color online) Box plots showing spectral peak of the frication noise in Mandarin fricatives and affricates in children with CIs derived from the subject mean spectral peak aggregated across vowel contexts.The box shows the 25 to 75 percentile and the horizontal line in the box shows the median of the data.The whiskers show the range of the data.The symbols show individual data points.Each symbol represents the average rise time aggregated across vowel contexts of each child.
for the descriptive information) with reference to the NH children.The chronological-age-matched subgroup (CA) included 26 children whose chronological age fell into the age range of the NH controls.The hearing-agematched subgroup (HA) included 26 children whose electrical hearing age (i.e., duration of CI use) fell into the age range of the NH controls.We regarded the length of electrical hearing as the CI children's hearing age because all CI children tested in this study were prelingually deafened.The pre-implantation acoustic experience from hearing aids, if any, played a limited role.Note that some children with CIs were both chronological-age-matched and hearing-agematched with the NH children.Therefore, they were assigned to both groups.The independent sample t-test on age revealed no significant difference between the NH and CA (p > 0.05) or between the NH and HA groups (p > 0.05).All children, including children with CIs and NH, had both parents speaking Mandarin in everyday life.None of the NH children was reported as having cognitive, language, or speech impairments.None of the children with CIs was reported as having additional disabilities other than hearing problems.

TABLE I .
Demographic and audiological characteristics of CA (chronological-age-matched) and HA (hearing-age-matched) subgroups of the children with CIs.