A reproducible method to measure the intelligibility of communication systems is required to assess their efficiency. The current study seeks to develop a French version of the Modified Rhyme Test (MRT) [(House, Williams, Heker, and Kryter (1965). J. Acoust. Soc. Am. 37, 158–66], an intelligibility test composed of 50 six-word lists, originally developed for military applications and now widely used. An evaluation of the authors’ French MRT was carried out, reproducing the original experimental conditions used by House and colleagues. Very similar results were found between the original MRT and the French MRT, validating the latter for further use and dissemination.
1. Introduction
The assessment of speech intelligibility of radio communication systems is crucial to prevent communication errors and, therefore, potential accidents. Two kinds of methods are commonly used: behavioral methods or acoustical measurements methods. When both methods can be applied, their results are highly correlated (Griffin, 1992; Yu et al., 2010; Brammer et al., 2011; Anderson and Kalb, 1987). However, acoustical methods cannot be used with radio communication systems that use bone conduction devices because of the lack of reliable transfer function for these devices (Pellieux et al., 2005). In military environments, bone conduction headphones are widely used because their compatibility with nonlinear earplugs maintains good situation awareness. They also offer the possibility to be combined with bone conduction microphones (Manning et al., 2017). Therefore, behavioral methods are crucial for measuring speech intelligibility in military environments.
The Modified Rhyme Test (MRT) has first been developed to assess speech intelligibility in military communication systems (House et al., 1965). During a MRT task, the listener has to recognize a spoken word among a list of six written possible monosyllabic words, differing in their initial or in their final consonant. These consonant sounds are easily distorted during radio communication. Therefore, the MRT appears particularly relevant in this context. MRT is, indeed, the most popular behavioral method of speech intelligibility in military environments (Letowski and Scharine, 2017; MIL-STD-1474E, 2015). MRT is also widely used in civilian applications (Brandewie and Zahorik, 2011; Brammer et al., 2017; Barker and Coyne, 2017) and has been chosen by the American National Standards Institute (ANSI/ASA S3.2, 1989) as the standard method for assessing speech intelligibility in communication systems.
In addition, the MRT has several specificities that explain its popularity, in particular for the military application. First, it is a closed-set speech test, thus reducing procedural learning. Second, with noisy background, the variation of MRT scores is highly correlated with the variation of the Speech Transmission Index (STI), an acoustical metric (Brammer et al., 2011; Anderson and Kalb, 1987).
Because of those advantages, several adaptations of the MRT in other languages have been suggested in the literature: Turkish (Arıöz and Günel, 2015), Czech (Tihelka and Matoušek, 2004), Spanish (Ball, 2011), and Catalan (Alías and Triviño, 2007).
In the present study, we developed a French version of the original MRT (hereafter called Fr-MRT), and conducted a behavioral assessment of it. To assess the Fr-MRT, we varied, based on the above-mentioned literature, three main parameters: the Signal-to-Noise Ratio (SNR), the talker, and a potential learning effect. We then compared the results obtained for the Fr-MRT with STI results, in order to contrast the behavioral results with an objective measurement of speech intelligibility.
We obtained results with the Fr-MRT that were similar to the original MRT, validating the latter for further use and dissemination. We provide in Table 1 the 50 lists of 6 words composing the Fr-MRT.
. | Word #1 . | Word #2 . | Word #3 . | Word #4 . | Word #5 . | Word #6 . |
---|---|---|---|---|---|---|
01 | Bouche | Bouge | Boude | Boule | Bourre | Bouffe |
02 | Bête | Bêche | Baisse | Bègue | Belle | Benne |
03 | Gagne | Gare | Gaffe | Gage | Gale | Gamme |
04 | Vigne | Vide | Ville | Vice | Vive | Vise |
05 | Phare | Fâche | Femme | Fane | Face | Phase |
06 | Fiche | Fige | Figue | File | Fine | Fils |
07 | Dort | Dogue | Dock | Donne | Dope | Dote |
08 | Dire | Digue | Digne | Dîne | Dix | Dite |
09 | Pire | Pige | Pique | Pile | Pipe | Pisse |
10 | Peigne | Père | Pêche | Pèle | Peine | Pèse |
11 | Coque | Colle | Cogne | Cosse | Cote | Corps |
12 | Jure | Juche | Juge | Jules | Jupe | Jute |
13 | Cours | Coule | Coude | Couve | Coupe | Couse |
14 | Tanche | Tende | Tangue | Tank | Tempe | Tente |
15 | Teigne | Terre | Thèse | Telle | Thème | Teck |
16 | Tort | Toge | Toque | Tome | Tonne | Top |
17 | Mère | Messe | Mêle | Même | Mène | Mette |
18 | Sourd | Souche | Soude | Soûle | Soupe | Soute |
19 | Sage | Sache | Sac | Sale | Sape | Sas |
20 | Lange | Lande | Langue | Lampe | Lance | Lente |
21 | Lave | Lâche | Lasse | Laque | Lame | Latte |
22 | Ruche | Rude | Rhume | Rune | Russe | Ruse |
23 | Rogne | Roche | Robe | Rock | Rome | Rote |
24 | Chaude | Chauffe | Chaume | Chausse | Chauve | Chose |
25 | Manche | Mange | Mande | Mangue | Manque | Menthe |
26 | Ronde | Fonde | Monde | Ponde | Sonde | Tonde |
27 | Beurre | Cœur | Leurre | Meurt | Peur | Sœur |
28 | Rôle | Gnôle | Khôl | Môle | Pôle | Tôle |
29 | Feigne | Daigne | Baigne | Peigne | Saigne | Teigne |
30 | Chère | Gère | Fer | Guère | Mer | Père |
31 | Jette | Bête | Dette | Fête | Cette | Tête |
32 | Saine | Chêne | Gêne | laine | Rêne | Naine |
33 | Dame | Femme | Gamme | Rame | Lame | Pâme |
34 | Jappe | Cape | Zappe | Nappe | Pape | Tape |
35 | Rage | Gage | Cage | Nage | Page | Sage |
36 | Bave | Gave | Cave | Lave | Rave | Pave |
37 | Chiche | Riche | Fiche | Biche | Miche | Niche |
38 | Bise | Dise | Guise | Lise | Mise | Vise |
39 | Bile | Cil | Fil | Mille | Pile | Ville |
40 | Douche | Couche | Louche | Mouche | Souche | Touche |
41 | Rousse | Douce | Gousse | Mousse | Pouce | Tousse |
42 | Choque | Roque | Coque | Loque | Moque | toque |
43 | Comme | Gomme | Nomme | Pomme | Somme | Tomme |
44 | Rente | Sente | Vente | Fente | Lente | Pente |
45 | Rance | Chance | Danse | lance | Panse | Tance |
46 | Jure | Cure | Dure | Mure | Pure | Sûre |
47 | Bois | Quoi | doigt | moi | noix | loi |
48 | Crin | Grain | Frein | Train | Brin | drain |
49 | douer | nouer | louer | Vouer | Rouer | Jouer |
50 | Bail | Caille | Faille | Maille | Paille | Taille |
. | Word #1 . | Word #2 . | Word #3 . | Word #4 . | Word #5 . | Word #6 . |
---|---|---|---|---|---|---|
01 | Bouche | Bouge | Boude | Boule | Bourre | Bouffe |
02 | Bête | Bêche | Baisse | Bègue | Belle | Benne |
03 | Gagne | Gare | Gaffe | Gage | Gale | Gamme |
04 | Vigne | Vide | Ville | Vice | Vive | Vise |
05 | Phare | Fâche | Femme | Fane | Face | Phase |
06 | Fiche | Fige | Figue | File | Fine | Fils |
07 | Dort | Dogue | Dock | Donne | Dope | Dote |
08 | Dire | Digue | Digne | Dîne | Dix | Dite |
09 | Pire | Pige | Pique | Pile | Pipe | Pisse |
10 | Peigne | Père | Pêche | Pèle | Peine | Pèse |
11 | Coque | Colle | Cogne | Cosse | Cote | Corps |
12 | Jure | Juche | Juge | Jules | Jupe | Jute |
13 | Cours | Coule | Coude | Couve | Coupe | Couse |
14 | Tanche | Tende | Tangue | Tank | Tempe | Tente |
15 | Teigne | Terre | Thèse | Telle | Thème | Teck |
16 | Tort | Toge | Toque | Tome | Tonne | Top |
17 | Mère | Messe | Mêle | Même | Mène | Mette |
18 | Sourd | Souche | Soude | Soûle | Soupe | Soute |
19 | Sage | Sache | Sac | Sale | Sape | Sas |
20 | Lange | Lande | Langue | Lampe | Lance | Lente |
21 | Lave | Lâche | Lasse | Laque | Lame | Latte |
22 | Ruche | Rude | Rhume | Rune | Russe | Ruse |
23 | Rogne | Roche | Robe | Rock | Rome | Rote |
24 | Chaude | Chauffe | Chaume | Chausse | Chauve | Chose |
25 | Manche | Mange | Mande | Mangue | Manque | Menthe |
26 | Ronde | Fonde | Monde | Ponde | Sonde | Tonde |
27 | Beurre | Cœur | Leurre | Meurt | Peur | Sœur |
28 | Rôle | Gnôle | Khôl | Môle | Pôle | Tôle |
29 | Feigne | Daigne | Baigne | Peigne | Saigne | Teigne |
30 | Chère | Gère | Fer | Guère | Mer | Père |
31 | Jette | Bête | Dette | Fête | Cette | Tête |
32 | Saine | Chêne | Gêne | laine | Rêne | Naine |
33 | Dame | Femme | Gamme | Rame | Lame | Pâme |
34 | Jappe | Cape | Zappe | Nappe | Pape | Tape |
35 | Rage | Gage | Cage | Nage | Page | Sage |
36 | Bave | Gave | Cave | Lave | Rave | Pave |
37 | Chiche | Riche | Fiche | Biche | Miche | Niche |
38 | Bise | Dise | Guise | Lise | Mise | Vise |
39 | Bile | Cil | Fil | Mille | Pile | Ville |
40 | Douche | Couche | Louche | Mouche | Souche | Touche |
41 | Rousse | Douce | Gousse | Mousse | Pouce | Tousse |
42 | Choque | Roque | Coque | Loque | Moque | toque |
43 | Comme | Gomme | Nomme | Pomme | Somme | Tomme |
44 | Rente | Sente | Vente | Fente | Lente | Pente |
45 | Rance | Chance | Danse | lance | Panse | Tance |
46 | Jure | Cure | Dure | Mure | Pure | Sûre |
47 | Bois | Quoi | doigt | moi | noix | loi |
48 | Crin | Grain | Frein | Train | Brin | drain |
49 | douer | nouer | louer | Vouer | Rouer | Jouer |
50 | Bail | Caille | Faille | Maille | Paille | Taille |
2. Creation of the word lists
The MRT (House et al., 1965) consists of 50-word lists of American English monosyllabic words. Each list is composed of 6 consonant-vowel-consonant (CVC) words [with few consonant-vowel (CV) and vowel-consonant (VC) words] with an identical central vowel. For a given list, the six words begin with, or end with, the same consonant. Thus, in each list, only one consonant phoneme differs between the six words. In the original MRT, 21 consonants, one semi-consonant, and 10 vowels were used. This is to be compared with a total of 22 to 24 consonants, 2 semi-consonants, and 15 to 19 in American English (Bizzocchi, 2017). The French language is composed of 20 consonants (with 3 semi vowels) and 14 vowels (Calliope and Fant, 1989).
The word lists of the Fr-MRT were chosen using the “Lexique3” French database (New et al., 2004), which provide for each word the occurrence frequency and a measure of the word familiarity. Using the database, monosyllabic CVC words with identical final VC or initial CV were automatically gathered. We obtained more than 120 different lists. In order to select the least bias as possible 50 lists, we first removed the lists that did not have at least 6 words, and second, we ranked the lists on the basis of the occurrence frequency statistics, so that we kept only words as familiar as possible. In order to reduce the number of lists and the number of words within each list, we tried to have a similar frequency of consonants occurrence in the lists [see Fig. 1(A)] as in natural French spoken language (Combescure, 1981). Finally, as in the original MRT, half of the 50 lists were contrasting the initial consonant and the second half the final consonant. The 4 final lists (lists 46–50; with the initial consonant varying) were designed to target specificities of the French language: they are composed of monosyllabic non-CVCs, with 2 semi-vowels. The lists of the Fr-MRT are presented in Table 1. Figure 1(B) compares the word occurrence frequency in each corpus language for Fr-MRT (New et al., 2004) and original MRT (OANC, 2015).
3. Validation of the lists
3.1 Participants
Twenty-six normal hearing listeners (8 females and 18 males) took part in the present study (average of 26 years old). All participants were French-native speakers. All had audiometric threshold below or equal to 20 dB hearing level for all octave frequencies from 125 to 8000 Hz. They all gave written informed consent to participate in this experiment. The study was approved by the ethics committee of Occitany Region Regional Health Agency (IDRCB 2017-A00859–44). For practical reasons, listeners were recruited and tested in two sites (the French forces biomedical research institute, in the Paris region and the French-German research center of Saint Louis). The procedure and the materials were identical in both sites.
3.2 Stimuli
Recordings of the word lists. Two male French native speakers (talker 1 and talker 2) without history of vocal pathology, were chosen for the recordings. The two speakers were from two different regions (“Pays de la Loire” west of France and “Lorraine,” east of France). However, none of the participants in the validation experiment noticed or commented on the accent of the talkers.
The words were pronounced using a carrier sentence, as in the original MRT, with the target word embedded in the middle of the sentence, in order to prevent the effect of sentence-final intonation. The carrier sentence was “Le mot X doit être indiqué” (“The word X should be indicated”).
The recordings were digitalized with a Teac LX10 at 48 kHz sampling frequency with a 16-bit resolution using a Brüel & Kjaer 1/2 in. microphone (type 4189) with a B&K preamplifier 2669 followed by B&K conditioner 5935 (Brüel & Kjaer Sound & Vibration Measurement, Naerum, Denmark). The recordings took place in an audiometric booth (IAC acoustics). The microphone was placed at a distance of 60 cm from the talker's mouth. All 300 sentences (50 lists of 6 words) were written on a computer screen placed in front of the talker and appeared one after the other every 3 s. It took an average of 2 s to pronounce a sentence. The experimenter visually checked through a window in the audiometric booth that the talker kept focusing on his task, and kept a constant distance between him and the microphone. Moreover, the signal waveform of the recordings was monitored in real time by the experimenter on a screen outside the audiometric booth.
Presentation of the stimuli. In the validation experiment, each target word was presented in noise, at a different SNR.
Six different SNRs were used, ranging from −16 to +4 dB in 4 dB steps (−16, −12, −8, −4, 0, and +4 dB). The masking noise signal was similar to the noise signal used in the original MRT (House et al., 1965): uniform spectrum up to 500 Hz followed by a decay of 9 dB per octave. The noise spectrum was close to those of military environments (aircraft) (Gee, 2005). This was consistent with the objective to use the Fr-MRT for military radio communication systems. The target word was emitted at 70 dB sound pressure level (Brüel and Kjær type 4157 ear simulator was used for calibration). The target word and the noise signal were presented simultaneously in the BeyerDynamic DT 770 headphone.
3.3 Procedure
Listeners sat in front of a computer in an audiometric booth. A trial began with the presentation of six written words on the screen. After the six words were displayed, the sentence with the target word was played on the headphone. Then, the listener had to indicate the word he/she heard, without time limit. The experiment was divided into two identical parts, corresponding to each talker (talker 1, talker2), with a 5-min break in between. The order of the two talkers was counterbalanced between the participants. For each talker, the listener performed six blocks of 50 trials, each trial corresponding to one list in the Fr-MRT. The six blocks corresponded to the six SNRs; they were presented in a random order.
Before data collection, the listeners performed a short training of ten-trial sets at 0 dB SNR. The experiment lasted approximately 60 min.
Finally, in order to assess a potential learning effect, a subgroup of 16 listeners repeated exactly the same experiment 30 days after.
3.4 Comparison between behavioral data and objective measurements
For each SNR, an acoustical measurement of speech intelligibility was computed and compared to the MRT results. This acoustical measurement was assessed by the STI (Houtgast and Steenken, 1971; Houtgast and Steeneken, 1982). STI is a numeric measure of speech intelligibility, varying from 0 (bad) to 1 (excellent). The STI of the headphone device used for the Fr-MRT listening tests has been measured for the six SNRs. The noise was the same as the one used to estimate the performance of the Fr-MRT list. A microphone was placed in the ear canal entrance of an artificial head and connected to the STI meter (AL1 acoustilyser from NTI). The STI uses a reference signal provided by NTI to simulate the modulations of speech in seven octaves. The artificial head was placed in the audiometric booth. The noise and the reference signal that was provided by the STI meter were played simultaneously in the headphone placed on the artificial head using a 4157 B&K ear coupler.
The Fr-MRT scores and the STI obtained for each SNR were compared to the data of a previous study (Brammer et al., 2011). In that study, the relationship between STI and MRT was presented as a curve. We first extracted from this curve the 6 MRT scores corresponding to the 6 STI values obtained for each of the SNRs of our study. Then, we compared the original MRT scores to the Fr-MRT scores measured during the first session (Day 1) corresponding to the same STI.
3.5 Statistical analysis
The effect of talker, SNR, and site of data collection was assessed using a repeated measures analysis of variance (ANOVA) on data measured during the first session (Day 1) with talker (talker1 vs talker2) and SNR (from −16 dB to +4 dB by 4 dB steps) as within factors, and site of data collection as between factor. Before conducting the ANOVA statistical analysis, the Fr-MRT scores were transformed from percent correct to rationalized arcsine unit to control for ceiling effect (Sherbecoe and Studebaker, 2004).
The learning effect was assessed using a repeated ANOVA in the subset of participants who did the two sessions, with sessions (Day 1 vs Day 30), talker, and SNR as factors. In the case of violation of sphericity, a Greenhouse-Geisser correction was applied.
4. Results
4.1 Speech intelligibility as a function of SNR and talkers
As expected, the Fr-MRT scores measured at Day 1 increased monotonically with increasing SNR [F(4.1, 97.2) = 1101, p < 0.0001]. The difference of Fr-MRT scores between the two talkers was not significant [F(1, 24) = 0.55, p = 0.46], nor the interaction between the talkers and the SNR [F(4.1, 97.2) = 0.20, p = 0.9]. No effect of the site of data collection was observed [F(1, 24) = 1.49, p = 0.23], nor interaction between the site and the talkers [F(1, 24) = 1.83, p = 0.18], nor interaction between the site and the SNR [F(4.1, 97.2) = 0.71, p = 0.89], nor triple interaction [F(4.1, 97.2) = 0.39, p = 0.82]. Because of the absence of talker and site effects, the Fr-MRT scores were averaged across talkers and sites and used as a dependent variable in the next analyses. Figure 2(A) presents the collapsed results over site.
4.2 Learning effect
As for the MRT, there was no learning effect for the Fr-MRT: scores did not significantly change between Day 1 and Day 30 [see Fig. 2(B)].
4.3 Performance of the lists with initial consonant vs lists with final consonants
Finally, we analyzed the performance as a function of the place of consonant (initial vs final), in order to investigate potential differences between the different lists of the Fr-MRT. Figure 3 shows the performance averaged across listeners and talkers during the first session (Day1) for each list, according to the place of the consonant (initial vs final). The repeated measures ANOVA on the list performance with SNR as a within factor and the place of consonant as a between factor evidenced no effect of the place of the consonant [F(1, 48) = 2.85; p = 0.09], nor interaction between SNR and the place of the consonant [F(4.1, 194.4) = 1.67; p = 0.14].
4.4 Comparison with the original MRT study
The Fr-MRT and the MRT scores showed similar behavior across the SNR (see Fig. 4). Moreover, the average MRT score fell within the interquartile interval range of Fr-MRT scores for the three most adverse SNRs (-16, -12, -8 dB). For the other SNRs, the average Fr-MRT and MRT scores (Table III, part A; House et al., 1965) were also close.
4.5 Relationship with the STI
Table 2 shows the STI and Fr-MRT scores obtained at each SNR, as well as the MRT score extracted from Brammer et al. (2011) corresponding to each STI value. The MRT scores fell close to the Fr-MRT scores for each condition.
SNR . | STI . | Fr- MRT Scores mean (sd) . | Extracted MRT scores from Brammer et al. (2011) . |
---|---|---|---|
−16 dB | 0.05 | 31.3 (6.2) | 30.9 |
−12 dB | 0.1 | 54.0 (6.7) | 50.2 |
−8 dB | 0.25 | 75.9 (4.8) | 76.2 |
−4 dB | 0.38 | 92.1 (3.3) | 87.6 |
0 dB | 0.50 | 97.0 (2.0) | 93.3 |
+4 dB | 0.62 | 98.6 (1.58) | 95.6 |
SNR . | STI . | Fr- MRT Scores mean (sd) . | Extracted MRT scores from Brammer et al. (2011) . |
---|---|---|---|
−16 dB | 0.05 | 31.3 (6.2) | 30.9 |
−12 dB | 0.1 | 54.0 (6.7) | 50.2 |
−8 dB | 0.25 | 75.9 (4.8) | 76.2 |
−4 dB | 0.38 | 92.1 (3.3) | 87.6 |
0 dB | 0.50 | 97.0 (2.0) | 93.3 |
+4 dB | 0.62 | 98.6 (1.58) | 95.6 |
5. Discussion
The goal of the present study was to design a French version of the MRT (here called Fr-MRT) to assess radio communication systems with French speaking users. As for the original MRT (House et al., 1965), the Fr-MRT evidenced no learning effect (same performance 30 days after the first testing). Moreover, the variation of the Fr-MRT as a function of the SNR was very similar to the MRT one. Finally, the relationship between the Fr-MRT and the STI was also very similar to the one between the MRT and the STI (Brammer et al., 2011). This is an important point because STI measurements are not possible in unconventional microphones for voice recording, such as bone conduction microphones or microphones integrated in earplugs. Hence, the development of the Fr-MRT enables to estimate the intelligibility for those specific microphone devices, with French speakers. The same reasoning could be applied in the future in the development of the test in other languages.
The previous adaptations of the MRT in a non-English language differed in their approach compared to the current study and, as a result, there is a lack of data to confront our results with. The Czech (Tihelka and Matoušek, 2004) and the Catalan (Alías and Triviño, 2007) MRTs did not provide any behavioral data. The Turkish MRT showed data of hearing-impaired listeners only, without estimation of neither a learning effect nor an effect of the SNR (Arıöz and Günel, 2015). The Spanish MRT assessed scores in normal hearing listeners but without an estimation of neither a learning effect nor an effect of the SNR (Ball, 2011). Here, we reproduced the experimental conditions of the original MRT (House et al., 1965) in order to compare our results with the original ones. In addition, in French, most of the phonemes are shared with the English language. That characteristic has probably helped to build a Fr-MRT with similar properties as the original MRT.
Furthermore, the absence of a learning effect has interesting practical consequences: it allows native listeners to be tested with minimal familiarization, as underlined with the original MRT (House et al., 1965).
Future work will employ female talkers to account for the increased number of military servicewomen.
Finally, we can easily imagine that French–English bilingual listeners could help to characterize the effects of a radio communication system on human voice features by comparing the MRT and Fr-MRT scores.