This paper reports on the concurrent use of electroglottography (EGG) and electromagnetic articulography (EMA) in the acquisition of EMA trajectory data for running speech. Static and dynamic intersensor distances, standard deviations, and coefficients of variation associated with inter-sample distances were compared in two conditions: with and without EGG present. Results indicate that measurement discrepancies between the two conditions are within the EMA system's measurement uncertainty. Therefore, potential electromagnetic interference from EGG does not seem to cause differences of practical importance on EMA trajectory behaviors, suggesting that simultaneous EMA and EGG data acquisition is a viable laboratory procedure for speech research.
1. Introduction
A limitation faced by researchers and clinicians alike in the examination of speech production is that no single instrument is capable of simultaneously examining all of the articulatory subsystems involved in speech production. Researchers have circumvented the limitations of individual measurement systems through the simultaneous use of multiple instruments with complementary strengths in terms of the nature of the data acquired, principally within their respective temporal and spatial resolutions. Much of the work in this area has focused on instrumental pairings that facilitate the examination of either oral articulation (only) or laryngeal activity (only) during speech, including but not limited to electropalatography and ultrasound (Zharkova, 2008), ultrasound and electromagnetic articulography (EMA) (Aron et al., 2016), EMA and electropalatography (Hoole et al., 1993; Rouco and Recasens, 1996), laryngoscopy and laryngeal ultrasound (Moisik et al., 2014), and high-speed laryngeal imaging and electroglottography (EGG) (Pulakka et al., 2004). However, there is still substantial room for growth in terms of research combining instruments that permit the simultaneous measurement of both supralaryngeal and laryngeal activity. Articulatory research examining these two production systems simultaneously is not only rare but frequently relies on relatively invasive and difficult to implement methods like electromyography and laryngeal transillumination (Löfqvist and Yoshioka, 1984; Perkell et al., 1992; Hoole and Bombien, 2014).
In this study we investigate the acceptability of the concurrent use of two articulatory measurement systems—electromagnetic articulography (EMA) (Perkell et al., 1992; Zierdt et al., 1999) and electroglottography (EGG)—in the collection of speech data. The simultaneous use of EMA and EGG would facilitate both linguistic (Löfqvist and Yoshioka, 1980, 1984; Romero, 1999; Hoole and Bombien, 2014), and clinical research (Gracco et al., 1992; Max and Gracco, 2005) given its potential contributions to our understanding of laryngeal and supralaryngeal coordination in speech. EMA is a point-tracking system designed for the measurement of speech-related articulatory movement. EMA systems use the principle of electromagnetic induction to track the movement of small (∼3 mm) sensor coils through a weak magnetic field; as explained by Gafos (2006), transmitter coils in the EMA system generate electromagnetic fields that pass through and induce voltages in the sensor coils. The precise voltage induced is dependent upon the position of the sensor with respect to the transmitter coils and subsequently can be used to calculate the change in the position of the sensor over time. However, given that EGG systems typically use two flat metal electrodes to measure electrical impedance across the larynx over time, their simultaneous use with EMA may disrupt the pre-defined magnetic field comprising the EMA measurement space and thus potentially produce erroneous EMA sensor trajectory data (Nixon et al., 1998; Northern Digital Inc., 2016).
One previous study has observed that interference effects of simultaneous EGG and EMA use did not exceed the position estimation errors caused by the system itself. Specifically, Vonderhaar (2016) found that the presence of EGG electrodes does not cause measurement interference for sensors adhered to a static rigid body when stationary or following a repetitive path of motion. While these results are extremely encouraging for simultaneous EGG and EMA use, the question remains as to whether this lack of interference carries over to EMA data acquisition with natural speech recordings.
2. Methods
2.1 EMA-EGG system setup
The NDI Wave EMA system (Northern Digital Inc., Waterloo, Ontario, Canada) and a dual-channel Glottal Enterprises EGG device (model EG2-PCX) were used in this study. These systems operate using the same basic principles and components as other generally available systems, and, of relevance due to the specific aims of this study, the electrodes used by the EG2-PCX system are similar in size and shape to those used by other EGG models.1
Speech recordings were shared between the two systems. EMA sensor movement trajectories were sampled at 400 Hz, and speech audio was recorded by a Sennheiser e 865-S microphone sampled at 44.1 kHz. The EGG signal was sampled at 16 kHz by two EGG electrodes (3.4 cm in diameter) placed on the throat at the level of the larynx, with optimal electrode placement determined by inspecting the larynx position indicator on the EGG device and by external visual and tactile inspection of the thyroid cartilage.
2.2 Data collection and analysis
EMA and EGG data were recorded with simultaneous audio for three healthy adult participants (1 female, 2 male) with simultaneous EGG recording (ON condition) and without EGG (ABS condition). Articulatory sensors were placed on the lower lip (LL), the tongue tip (TT), and the lower incisor (JAW) with static head reference sensors placed on the gum above the upper incisor (UI) and behind each ear on the mastoid process (REF-R and REF-L). In each condition, participants were first recorded twice holding a rigid acrylic biteplate with three sensors (right, left, and center) attached between their teeth, which served to record the occlusal plane. Next, participants were recorded while reading three repetitions of a list containing 12 two-word sentences and two sustained vowels, one repetition voiced and one whispered, for a total of 42 analyzable utterances per speaker per condition (294 utterances across speakers and conditions). The variation in segmental material and phonation in the stimuli was included so as to engender a range of supralaryngeal and glottal postures in the collected data.
As previous work has already provided strong evidence that the simultaneous use of EGG and EMA does not significantly impact the recorded EGG signal (Vonderhaar, 2016), collected EGG data were visually checked to ensure that there were no obvious defects due to EMA presence, but no further analyses were conducted on the EGG data. Our attention was focused on any effects of the presence of EGG data acquisition on EMA data and trajectories.
Prior to analysis, EMA data from each repetition in the ABS and ON conditions were manually segmented into utterances, with audio recordings used to facilitate the segmentation of the kinematic data. Between-sensor Euclidean distances were calculated for all static cases (biteplate and head reference sensors) and dynamic cases (articulatory sensors) at each sampled time point to obtain intersensor distance measurements (Fig. 1); the mean and standard deviation of these measurements were used in the assessment of the EMA data. Intersensor distances for each biteplate sensor comparison were averaged across the entire duration of the biteplate-capture to obtain their mean values, while for the head reference and articulatory sensors, distances were averaged within each elicited utterance. For head reference sensor comparisons, the mean intersensor distances were additionally averaged over utterances from all repetitions. Standard deviations for the intersensor distance measurements were calculated across all samples within an utterance, and were also averaged across all utterances for the head reference sensor comparisons.
(Color online) Schematic representation of intersensor distance measurements. Colored dots represent individual sensor positions within the generated EMA field (X, Y, and Z coordinates in millimeters from the coordinate system origin). Dotted arrows are used to orient schematic representation with respect to anatomical landmarks. (a) Head reference sensor placement within the 3-D EMA field. Solid arrows represent the measured intersensor distances used in all analyses (REF-R∼UI, REF-L∼UI, and REF-R∼REF-L). (b) Articulatory sensor placement within the 3-D EMA field. Solid arrows represent the measured intersensor distances used in all analyses (REF-R∼UI, REF-L∼UI, and REF-R∼REF-L).
(Color online) Schematic representation of intersensor distance measurements. Colored dots represent individual sensor positions within the generated EMA field (X, Y, and Z coordinates in millimeters from the coordinate system origin). Dotted arrows are used to orient schematic representation with respect to anatomical landmarks. (a) Head reference sensor placement within the 3-D EMA field. Solid arrows represent the measured intersensor distances used in all analyses (REF-R∼UI, REF-L∼UI, and REF-R∼REF-L). (b) Articulatory sensor placement within the 3-D EMA field. Solid arrows represent the measured intersensor distances used in all analyses (REF-R∼UI, REF-L∼UI, and REF-R∼REF-L).
Motivated by the expectation that distances between static sensors would remain constant, the mean measurement error of the signal between the ABS and ON conditions was calculated for all static cases (Berry, 2011; Vonderhaar, 2016). Measurement error was calculated by first taking the difference between the mean intersensor distances in the ABS and ON conditions for each static sensor pair, and then adding this difference to the standard deviation in the ON condition. As the positional accuracy specification of the NDI Wave system is ±0.5 mm (Berry, 2011), measurement error within that range would indicate that simultaneous EGG and EMA use is not causing interference greater than that expected for the system usage itself. Measurement error was not used as an evaluation metric for dynamic sensor comparisons as the distance between the moving articulatory sensors does not remain constant across recordings.
For dynamic cases, the coefficient of variation (COV) associated with the intersensor distance variability was calculated for each pair of sensors and compared across conditions. This quantity reflects a global instantaneous uncertainty in the estimation of sample-to-sample sensor positions by the EMA system, and therefore reflects any potential increased measurement variability due to interference in the ON condition.
3. Results
3.1 Static case: Behavior of fixed biteplate and head reference sensor trajectories
Results for the biteplate and head-reference sensor measurements (i.e., static EMA data) obtained for each speaker are shown in Table 1 and Table 2, respectively. For the biteplate measurements, measurement error was within the positional accuracy specification of the NDI Wave system (0.5 mm) for almost all comparisons across speakers, suggesting that these differences fall within the inherent measurement uncertainty of the EMA system. The one exception was the comparison of the distance between the left and right biteplate sensors for speaker S1 (error = 1.34), which displayed an unusually high measurement error between the ABS and ON conditions.
Comparison of intersensor distances for biteplate sensors in ABS and ON conditions for all speakers (R∼C=distance between right and center sensors; L∼C=distance between left and center sensors; R∼L=distance between right and left sensors). Mean and SD values are averaged across two recordings of the biteplate capture for each speaker. Error values above the NDI system specification are italicized.
. | . | ABS . | ON . | . | ||
---|---|---|---|---|---|---|
. | . | Mean . | SD . | Mean . | SD . | Error . |
S1 | R∼C | 37.51 | 0.06 | 37.43 | 0.08 | 0.16 |
L∼C | 38.29 | 0.07 | 37.98 | 0.07 | 0.38 | |
R∼L | 28.97 | 0.05 | 27.70 | 0.07 | 1.34 | |
S2 | R∼C | 37.32 | 0.05 | 37.71 | 0.07 | 0.46 |
L∼C | 37.81 | 0.05 | 37.80 | 0.07 | 0.08 | |
R∼L | 29.21 | 0.05 | 29.43 | 0.06 | 0.28 | |
S3 | R∼C | 37.47 | 0.06 | 37.45 | 0.06 | 0.08 |
L∼C | 38.41 | 0.06 | 38.42 | 0.07 | 0.08 | |
R∼L | 29.61 | 0.05 | 29.38 | 0.05 | 0.28 |
. | . | ABS . | ON . | . | ||
---|---|---|---|---|---|---|
. | . | Mean . | SD . | Mean . | SD . | Error . |
S1 | R∼C | 37.51 | 0.06 | 37.43 | 0.08 | 0.16 |
L∼C | 38.29 | 0.07 | 37.98 | 0.07 | 0.38 | |
R∼L | 28.97 | 0.05 | 27.70 | 0.07 | 1.34 | |
S2 | R∼C | 37.32 | 0.05 | 37.71 | 0.07 | 0.46 |
L∼C | 37.81 | 0.05 | 37.80 | 0.07 | 0.08 | |
R∼L | 29.21 | 0.05 | 29.43 | 0.06 | 0.28 | |
S3 | R∼C | 37.47 | 0.06 | 37.45 | 0.06 | 0.08 |
L∼C | 38.41 | 0.06 | 38.42 | 0.07 | 0.08 | |
R∼L | 29.61 | 0.05 | 29.38 | 0.05 | 0.28 |
Comparison of intersensor distances for head-reference sensors in ABS and ON conditions for all speakers (R∼UI=distance between REF-R and UI sensors; L∼UI=distance between REF-L and UI sensors; R∼L=distance between REF-R and REF-L sensors). Mean and SD values are averaged across three repetitions of the stimuli. Error values above the NDI system specification are italicized.
. | . | ABS . | ON . | . | ||
---|---|---|---|---|---|---|
. | . | Mean . | SD . | Mean . | SD . | Error . |
S1 | R∼UI | 130.8 | 0.07 | 130.9 | 0.06 | 0.16 |
L∼UI | 128.6 | 0.09 | 128.8 | 0.12 | 0.32 | |
R∼L | 131.1 | 0.03 | 131.3 | 0.07 | 0.27 | |
S2 | R∼UI | 147.2 | 0.4 | 149.4 | 0.3 | 2.5 |
L∼UI | 149.1 | 0.25 | 150.8 | 0.29 | 1.99 | |
R∼L | 143.7 | 0.16 | 144.5 | 0.18 | 0.98 | |
S3 | R∼UI | 130.48 | 0.17 | 130.5 | 0.12 | 0.14 |
. | . | ABS . | ON . | . | ||
---|---|---|---|---|---|---|
. | . | Mean . | SD . | Mean . | SD . | Error . |
S1 | R∼UI | 130.8 | 0.07 | 130.9 | 0.06 | 0.16 |
L∼UI | 128.6 | 0.09 | 128.8 | 0.12 | 0.32 | |
R∼L | 131.1 | 0.03 | 131.3 | 0.07 | 0.27 | |
S2 | R∼UI | 147.2 | 0.4 | 149.4 | 0.3 | 2.5 |
L∼UI | 149.1 | 0.25 | 150.8 | 0.29 | 1.99 | |
R∼L | 143.7 | 0.16 | 144.5 | 0.18 | 0.98 | |
S3 | R∼UI | 130.48 | 0.17 | 130.5 | 0.12 | 0.14 |
The intersensor distance data for the head reference sensors are shown in Table 2. For speaker S3, only the REF-R∼UI comparison could be evaluated, as REF-L became dislodged and had to be replaced between the ABS and ON trials. For most comparisons across speakers the error is again below the error specification for the Wave system. Abnormally large (greater than 1.00 mm) error values were observed in S2's REF-R∼UI and REF-L∼UI comparisons. Because an unusually high standard deviation was observed for the distances calculated using the UI sensor for this speaker in both the ABS and the ON conditions (REF-R∼UI: ABS = 0.402, ON = 0.299; REF-L∼UI: ABS = 0.252, ON = 0.292), it is suspected that a loosened UI sensor attachment may be responsible for the high measurement error. Due to the temporary nature of the adhesive used to adhere sensors to speech articulators, it is not infrequent that sensors loosen or detach over the course of an experiment. That said, the extreme variation observed here suggests that we cannot discount the possibility that EGG presence may exacerbate measurement variability due to insecure sensor attachment, even in light of the lack of evidence for significant interference on securely attached sensors (discussed further in Sec. 4).
3.2 Dynamic case: Behavior of articulatory sensor trajectories
The global mean and standard deviation of the intersensor distance measurements for each pair of articulatory sensors across repetitions is shown in Table 3 (left). For dynamic cases, measurement error was not used as an evaluation metric as the distance between the moving articulatory sensors does not remain constant across recordings. A separate two-way repeated measures analysis of variance (ANOVA) was run for each intersensor comparison for each speaker to compare the effect of Condition (ABS or ON) and Repetition (1, 2 or 3) on the mean intersensor distance for each stimulus item. The decision to analyze each speaker's data separately was based on the potential for random between-subject variation consequent to the complete removal and reapplication of the EGG and EMA setup for the data collection session undertaken for each subject. A significant main effect of Condition was observed only for the S3 TT∼JAW comparison (F[1,64] = 11.21, p = 0.001). For all other intersensor comparisons, no significant differences were observed between the ABS and ON conditions.
(Left) Comparison of intersensor distances for articulatory sensors in ABS and ON conditions for all speakers. Mean and SD values are averaged across three repetitions of the stimuli. All p-values are for the factor Condition within a two-way repeated measures ANOVA with [1, 64] degrees of freedom. Italicized values are significant at the p=0.05 level. (Right) Comparison of COV for articulatory intersensor distance comparisons across ABS and ON conditions by speaker. (TT∼LL=distance between TT and LL sensors; JAW∼LL=distance between JAW and LL sensors; TT∼JAW=distance between TT and jaw sensors).
. | . | Intersensor distance analysis . | Variability analysis . | ||||
---|---|---|---|---|---|---|---|
. | . | Mean (SD) . | p . | COV . | p . | ||
. | . | ABS . | ON . | . | ABS . | ON . | . |
S1 | TT∼LL | 36.74 (0.86) | 36.7 (0.66) | 0.73 | 0.024 | 0.017 | 0.125 |
JAW∼LL | 15.75 (1.45) | 15.64 (1.38) | 0.19 | 0.028 | 0.029 | 0.679 | |
TT∼JAW | 25.49 (1.62) | 25.7 (1.65) | 0.07 | 0.06 | 0.051 | 0.142 | |
S2 | TT∼LL | 34.69 (1.83) | 34.72 (1.79) | 0.09 | 0.024 | 0.028 | 0.106 |
JAW∼LL | 14.34 (1.06) | 14.41 (1.22) | 0.2 | 0.019 | 0.021 | 0.172 | |
TT∼JAW | 23.32 (2.32) | 23.5 (2.06) | 0.85 | 0.04 | 0.033 | 0.256 | |
S3 | TT∼LL | 36.35 (1.95) | 35.67 (1.92) | 0.27 | 0.028 | 0.025 | 0.223 |
JAW∼LL | 17.62 (2.16) | 17.96 (1.7) | 0.76 | 0.049 | 0.051 | 0.273 | |
TT∼JAW | 23.58 (1.46) | 22.3 (1.48) | 0.001 | 0.052 | 0.056 | 0.627 |
. | . | Intersensor distance analysis . | Variability analysis . | ||||
---|---|---|---|---|---|---|---|
. | . | Mean (SD) . | p . | COV . | p . | ||
. | . | ABS . | ON . | . | ABS . | ON . | . |
S1 | TT∼LL | 36.74 (0.86) | 36.7 (0.66) | 0.73 | 0.024 | 0.017 | 0.125 |
JAW∼LL | 15.75 (1.45) | 15.64 (1.38) | 0.19 | 0.028 | 0.029 | 0.679 | |
TT∼JAW | 25.49 (1.62) | 25.7 (1.65) | 0.07 | 0.06 | 0.051 | 0.142 | |
S2 | TT∼LL | 34.69 (1.83) | 34.72 (1.79) | 0.09 | 0.024 | 0.028 | 0.106 |
JAW∼LL | 14.34 (1.06) | 14.41 (1.22) | 0.2 | 0.019 | 0.021 | 0.172 | |
TT∼JAW | 23.32 (2.32) | 23.5 (2.06) | 0.85 | 0.04 | 0.033 | 0.256 | |
S3 | TT∼LL | 36.35 (1.95) | 35.67 (1.92) | 0.27 | 0.028 | 0.025 | 0.223 |
JAW∼LL | 17.62 (2.16) | 17.96 (1.7) | 0.76 | 0.049 | 0.051 | 0.273 | |
TT∼JAW | 23.58 (1.46) | 22.3 (1.48) | 0.001 | 0.052 | 0.056 | 0.627 |
The coefficients of variation (COVs) for all intersensor trajectories are given by speaker in Table 3 (right). Statistical analyses of COV across conditions were conducted using a modified signed-likelihood ratio test developed specifically for the comparison of multiple COVs (Krishnamoorthy and Lee, 2014; Marwick and Krishnamoorthy, 2018). No significant difference was observed for any comparison, indicating that overall trajectory variabilities were not significantly different between the ABS and ON conditions.
4. Discussion
The comparison of static sensor measurements did not reveal significant measurement abnormality in the ON condition, corroborating the results of more limited previous research on concurrent EGG and EMA use (Vonderhaar, 2016). Further, as in the static case, no significant interference effect was observed on intersensor distance measurements or on the COV for dynamic sensor comparisons, suggesting that there is no significant effect of EGG presence on EMA sensor trajectory measurements in the acquisition of speech data. Additionally, no systematic variation in either intersensor distance measurements or COV was observed based on sensor position (i.e., TT, LL, and JAW), indicating that no position-dependent effects were observed as a consequence of proximity to the EGG electrodes. This suggests that the possible dispersion of EGG electric currents inside the vocal tract does not affect the physical characteristics of the sensors attached to the surface of articulators. Taken as a whole, these results indicate that the presence of EGG (at least the Glottal Enterprises system) is unlikely to have adverse effects on the collection and consequent analysis of EMA (at least NDI Wave system) data for speech research.
Although no systematic evidence for significant EGG interference was observed in the collection of EMA data, based on S2's head reference sensor data we cannot rule out the possibility that EGG presence could exacerbate measurement variability due to insecure sensor attachment. In order to avoid measurement outliers, extra caution should be taken to ensure that sensors are, and remain, securely attached and that subjects are carefully positioned within the useable near field of the Wave system (Berry, 2011), as deficits in these factors are likely to magnify variability and measurement error. That said, on the whole, the observation that the presence of the EGG in the vicinity of the electromagnetic field generated by the EMA system did not significantly affect the measurements of moving sensors provides strong support for the valid utility of the simultaneous use of EMA and EGG systems in speech research. This in turn allows for high quality articulator kinematic information to be complemented with glottal information during running speech.
Acknowledgments
This research was supported by NIH grant “Speech Prosody and Articulatory Dynamics in Spoken Language” (Grant No. R01 DC003172; D.B.).
The NDI Wave system differs from the other two commonly used 3-D EMA systems, the Carstens AG500 and AG501 (Carstens Medizinelectronik, GmbH), in terms of the number of transmitter coils used to generate the electromagnetic field, their operating frequency, and their position relative to the measurement field. For a more thorough comparison of these systems, see Savariaux et al. (2017).