Large-scale massively multimodal modelling of non-verbal behaviour in spontaneous dialogue
The aim is to provide a large-scale kinematic database based on motion capture of human conversational behaviour, as well as to build statistical models of multimodal non-verbal behaviour in dialogue.
The models will be implemented and demonstrated using virtual avatars, capable of producing appropriate real-time non-verbal behaviours.
Spoken face to face interaction is a rich and complex form of communication that includes a wide array of phenomena that are not fully explored or understood. While there has been extensive studies on many aspects in face-to-face interaction, these are traditionally of a qualitative nature, relying on limited hand annotated corpora. The scientific goals of the proposed project are: 1) To provide data for research on human interaction behaviour, in a way that has been previously infeasible. We propose a large-scale database of quality assured kinematic gesture data (upper body gesture, head movements and facial features such as mouth and eyebrow motion) obtained by optical motion capture and image processing of 60 hours of human-human spontaneous dialogue. The data will be made available to the research community, and will constitute a world-unique resource allowing gesture- and interaction research on a large number of topics in linguistics, cognitive science, computer science and human communication engineering. 2) To use state-of-the-art statistical approaches and machine learning techniques to analyze the data and to build quantitative, predictive models of multimodal non-verbal behaviour in dialogue. The models will be implemented and demonstrated using virtual avatars, capable of producing appropriate real-time non-verbal behaviours. Such models will be applicable in virtual tutors or assistants, spoken dialogue systems, computer games characters and social robots.
Group: Speech Communication and Technology
Jonas Beskow (Project leader)
Funding: VR (2010-4646)
Duration: 2011-01-01 - 2013-12-31
Computer Speech & Language, 28(2), 607-618. [pdf] (2014). Animated Lombard speech: Motion capture, facial animation and visual intelligibility of speech produced in adverse conditions.
Proceedings of AVSP2011 (pp. 103-106). [pdf]
(2011). Kinetic Data for Large-Scale Analysis and Modeling of Face-to-Face Conversation. In Salvi, G., Beskow, J., Engwall, O., & Al Moubayed, S. (Eds.),