Svensk version


Timing of intonation and gestures in spoken communication

The goal of the project is to understand timing relationships between intonation and gesture in spontaneous speech. This will be investigated through semi-automatic extraction of co-speech gestures from a large and varied dataset (audio, video, motion-capture), and analysis of function and synchronization of speech and gestures.

The melody of speech, or intonation, plays a crucial role in spoken interaction. By altering the speech melody, speakers can highlight important words and phrases making them prominent and more meaningful. Speakers also make use of changing melodies and rhythms to signal when it is time for the other speakers to talk (turntaking) as well as to give others feedback (such as mm or uhuh). The exact timing of melodies in speech is controlled with considerable precision by the speaker. These movements occur in particular places in relationship to syllables. Body and facial gestures regularly accompany the speech melody and often have the same function as intonation, but until now we have not been able to measure the timing of these gestures with the same precision as intonation. The aim of this research project is to measure with precision the timing relationship between the speech melodies and gestures using a large database of recorded conversations in Swedish. The participants have been recorded using high-quality audio and video and motion capture equipment in a specially designed studio. The results will have implications for our understanding of how speech and gestures are planned and coordinated in the brain, and will also enable better modeling of speech and gestures in such speech applications as robots and avatars.


David House (Project leader)
Jonas Beskow
Simon Alexanderson
Jens Edlund

Funding: RJ (Bank of Sweden Tercentenary Foundation)

Duration: 2012-08 - 2017-01

Related publications:


Alexanderson, S., House, D., & Beskow, J. (2016). Automatic Annotation of Gestural Units in Spontaneous Face-to-Face Interaction. In Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction (pp. 15-19). Tokyo, Japan. [pdf]

House, D., & Alexanderson, S. (2016). Temporal domains of co-speech gestures and speech prosody. In Seventh Conference of the International Society for Gesture Studies (pp. 365). Paris. [pdf]

Zellers, M., House, D., & Alexanderson, S. (2016). Prosody and hand gesture at turn boundaries in Swedish. In Proceedings of Speech Prosody 2016 (pp. 832-835). Boston, USA. [abstract] [pdf]


House, D., Alexanderson, S., & Beskow, J. (2015). On the temporal domain of co-speech gestures: syllable, phrase or talk spurt?. In Lundmark Svensson, M., Ambrazaitis, G., & van de Weijer, J. (Eds.), Proceedings of Fonetik 2015 (pp. 63-68). Lund University, Sweden. [abstract] [pdf]

Zellers, M., & House, D. (2015). Parallels between hand gestures and acoustic prosodic features in turn-taking. In 14th International Pragmatics Conference (pp. 454-455). Antwerp, Belgium. [pdf]


Alexanderson, S., Beskow, J., & House, D. (2014). Automatic speech/non-speech classification using gestures in dialogue. In The Fifth Swedish Language Technology Conference. Uppsala, Sweden. [pdf]


Alexanderson, S., House, D., & Beskow, J. (2013). Aspects of co-occurring syllables and head nods in spontaneous dialogue. In Proc. of 12th International Conference on Auditory-Visual Speech Processing (AVSP2013). Annecy, France. [pdf]

Alexanderson, S., House, D., & Beskow, J. (2013). Extracting and analysing co-speech head gestures from motion-capture data. In Eklund, R. (Ed.), Proc. of Fonetik 2013 (pp. 1-4). Linköping University, Sweden. [pdf]

Alexanderson, S., House, D., & Beskow, J. (2013). Extracting and analyzing head movements accompanying spontaneous dialogue. In Proc. Tilburg Gesture Research Meeting. Tilburg University, The Netherlands. [pdf]

Published by: TMH, Speech, Music and Hearing

Last updated: 2012-11-09