This paper presents a quantitative and comprehensive study of the lip movements of a given speaker in different speech/nonspeech contexts, with a particular focus on silences (i.e., when no sound is produced by the speaker). The aim is to characterize the relationship between “lip activity” and “speech activity” and then to use visual speech information as a voice activity detector (VAD). To this aim, an original audiovisual corpus was recorded with two speakers involved in a face-to-face spontaneous dialog, although being in separate rooms. Each speaker communicated with the other using a microphone, a camera, a screen, and headphones. This system was used to capture separate audio stimuli for each speaker and to synchronously monitor the speaker’s lip movements. A comprehensive analysis was carried out on the lip shapes and lip movements in either silence or nonsilence (i.e., audible events). A single visual parameter, defined to characterize the lip movements, was shown to be efficient for the detection of silence sections. This results in a visual VAD that can be used in any kind of environment noise, including intricate and highly nonstationary noises, e.g., multiple and/or moving noise sources or competing speech signals.
Skip Nav Destination
Article navigation
February 2009
February 01 2009
A study of lip movements during spontaneous dialog and its application to voice activity detection
David Sodoyer;
David Sodoyer
GIPSA-lab, Department of Speech and Cognition, UMR 5126 CNRS, Grenoble-INP, Université Stendhal,
Université Joseph Fourier
, 46 Avenue Félix Viallet, 38031 Grenoble, France
Search for other works by this author on:
Bertrand Rivet;
Bertrand Rivet
GIPSA-lab, Department of Speech and Cognition, UMR 5126 CNRS, Grenoble-INP, Université Stendhal,
Université Joseph Fourier
, 46 Avenue Félix Viallet, 38031 Grenoble, France
Search for other works by this author on:
Laurent Girin;
Laurent Girin
GIPSA-lab, Department of Speech and Cognition, UMR 5126 CNRS, Grenoble-INP, Université Stendhal,
Université Joseph Fourier
, 46 Avenue Félix Viallet, 38031 Grenoble, France
Search for other works by this author on:
Christophe Savariaux;
Christophe Savariaux
GIPSA-lab, Department of Speech and Cognition, UMR 5126 CNRS, Grenoble-INP, Université Stendhal,
Université Joseph Fourier
, 46 Avenue Félix Viallet, 38031 Grenoble, France
Search for other works by this author on:
Jean-Luc Schwartz;
Jean-Luc Schwartz
GIPSA-lab, Department of Speech and Cognition, UMR 5126 CNRS, Grenoble-INP, Université Stendhal,
Université Joseph Fourier
, 46 Avenue Félix Viallet, 38031 Grenoble, France
Search for other works by this author on:
Christian Jutten
Christian Jutten
GIPSA-lab, Department of Images and Signal, UMR 5126 CNRS, Grenoble-INP, Université Stendhal,
Université Joseph Fourier
, 46 Avenue Félix Viallet, 38031 Grenoble, France
Search for other works by this author on:
J. Acoust. Soc. Am. 125, 1184–1196 (2009)
Article history
Received:
June 19 2007
Accepted:
November 14 2008
Citation
David Sodoyer, Bertrand Rivet, Laurent Girin, Christophe Savariaux, Jean-Luc Schwartz, Christian Jutten; A study of lip movements during spontaneous dialog and its application to voice activity detection. J. Acoust. Soc. Am. 1 February 2009; 125 (2): 1184–1196. https://doi.org/10.1121/1.3050257
Download citation file:
Pay-Per-View Access
$40.00
Sign In
You could not be signed in. Please check your credentials and make sure you have an active account and try again.
Citing articles via
A survey of sound source localization with deep learning methods
Pierre-Amaury Grumiaux, Srđan Kitić, et al.
Variation in global and intonational pitch settings among black and white speakers of Southern American English
Aini Li, Ruaridh Purse, et al.
Related Content
Listener adaptive characteristics of vowel devoicing in Japanese dialogue
J Acoust Soc Am (August 1995)
Natural speech dialogue systems
J Acoust Soc Am (May 1998)
The Edinburgh Speech Production Facility’s articulatory corpus of spontaneous dialogue.
J Acoust Soc Am (October 2010)
Effect of background noise on dialogue in telephony
J Acoust Soc Am (November 2006)
Realization of rhythmic dialogue on spoken dialogue system using paralinguistic information
J Acoust Soc Am (November 2006)