This work examines the coupling between the acoustic and visual com- ponents of speech as it evolves through time. Previous work has shown a consistent correspondence between face motion and spectral acous- tics, and between fundamental frequency (F0) and rigid body motion of the head [Yehia et al. (2002), JPHON, 30, 555-568]. Although these correspondences have been estimated both for sentences and for running speech, the analyses have not taken into account the tempo- ral structure of speech. As a result, the role of temporal organization in multimodal speech cannot be assessed. The current study is a first effort to correct this deficit. We have developed an algorithm, based on recursive correlation, that computes the correlation between measurement domains (e.g., head motion and F0) as a time-varying function. Using this method, regions of high or low correlation, or of rapid transition (e.g., from high to low), can be associated with visual and auditory events. This analysis of the time-varying cou- pling of multimodal events has implications for speech planning and synchronization between speaker and listener.

This content is only available via PDF.