Models have long been used to understand the relation of anatomical structure and articulatory movement to the acoustics and perception of speech. Realized as speech synthesizers or artificial talkers, such models simplify and emulate the speech production system. One type of simplification is to view speech production as a set of simultaneously imposed modulations of the airway system. Specifically, the vibratory motion of the vocal folds modulates the glottal airspace, while slower movements of the tongue, jaw, lips, and velum modulate the shape of the pharyngeal and oral cavities, and coupling to the nasal system. The precise timing of these modulations produces an acoustic wave from which listeners extract phonetic and talker-specific information. The first aim of the presentation will be to review two historical models of speech production that exemplify a system in which structure is modulated with movement to produce intelligible speech. The second aim is to describe theoretical aspects of a computational model that allows for simulation of speech based on precise spatio-temporal modulations of an airway structure. The result is a type of artificial talker that can be used to study various aspects of how sound is generated by a speaker and perceived by a listener.