Classifying and deploying pauses for flow control in conversational systems

The project investigates how dialog system can employ pauses and fillers to help users better understand system utterance structure.

Spoken dialogue systems allow users to interact with machines through speech. Although speech interfaces are supposedly easy to use, users find themselves deliberately restricting their behaviour. This is true in terms of both lexical choice and timing. In this project, we aim to improve system behaviour which leads to the second of these two restrictions. We address three aspects, which include: (1) earlier detection of user floor yielding cues; (2) system deployment of filled pauses to signal recognition that it is the system´s turn to speak even before it knows what to say; and (3) system deployment of pauses to help users parse system output. Our goal is to test the impact of these three modifications in the context of a dialogue system. We believe that successful modelling of pauses will limit the need for users to restrict their behaviour, and will help users to better understand system utterance structure. Both of these functions will reduce users' cognitive workload, fulfilling the original promise behind spoken language interfaces.

Group: Speech Communication and Technology

Anna Hjalmarsson (Project leader)
José David Lopes

Funding: VR (2011-6152)

Duration: 2012-01 - 2016-12

Keywords: dialogue, pauses, fillers, turn-taking, interaction

Related publications:


Laskowski, K., & Hjalmarsson, A. (2015). An information-theoretic framework for automated discovery of prosodic cues to conversational structure. In ICASSP. Brisbane, Australia.


Skantze, G., & Hjalmarsson, A. (2013). Towards Incremental Speech Generation in Conversational Systems. Computer Speech & Language, 27(1), 243-262. [abstract] [pdf]

Skantze, G., Hjalmarsson, A., & Oertel, C. (2013). Exploring the effects of gaze and pauses in situated human-robot interaction. In 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue - SIGDial. Metz, France. (*) [abstract] [pdf]

(*) Nominated for Best Paper Award at SIGdial 2013

Strömbergsson, S., Hjalmarsson, A., Edlund, J., & House, D. (2013). Timing responses to questions in dialogue. In Proceedings of Interspeech 2013 (pp. 2584-2588). Lyon, France. [abstract] [pdf]


Edlund, J., & Hjalmarsson, A. (2012). Is it really worth it? Cost-based selection of system responses to speech-in-overlap. In Proc. of the IVA 2012 workshop on Realtime Conversational Virtual Agents (RCVA 2012). Santa Cruz, CA, USA. [abstract] [pdf]

Hjalmarsson, A., & Oertel, C. (2012). Gaze direction as a Back-Channel inviting Cue in Dialogue. In Proc. of the IVA 2012 workshop on Realtime Conversational Virtual Agents (RCVA 2012). Santa Cruz, CA, USA. [abstract] [pdf]

Published by: TMH, Speech, Music and Hearing

Last updated: 2012-11-09