Classifying and deploying pauses for flow control in conversational systems
The project investigates how dialog system can employ pauses and fillers to help users better understand system utterance structure.
Spoken dialogue systems allow users to interact with machines through speech. Although speech interfaces are supposedly easy to
use, users find themselves deliberately restricting their behaviour. This is true in terms of both lexical choice and timing. In
this project, we aim to improve system behaviour which leads to the second of these two restrictions. We address three aspects,
which include: (1) earlier detection of user floor yielding cues; (2) system deployment of filled pauses to signal recognition that it is the
system´s turn to speak even before it knows what to say; and (3) system deployment of pauses to help users parse system output.
Our goal is to test the impact of these three modifications in the context of a dialogue system. We believe that
successful modelling of pauses will limit the need for users to restrict their behaviour, and will help users to better understand system
utterance structure. Both of these functions will reduce users' cognitive workload, fulfilling the original promise behind
spoken language interfaces.
Group: Speech Communication and Technology
Anna Hjalmarsson (Project leader)
José David Lopes
Funding: VR (2011-6152)
Duration: 2012-01 - 2016-12
Keywords: dialogue, pauses, fillers, turn-taking, interaction
ICASSP. Brisbane, Australia. (2015). An information-theoretic framework for automated discovery of prosodic cues to conversational structure. In
Computer Speech & Language, 27(1), 243-262. [abstract] [pdf] (2013). Towards Incremental Speech Generation in Conversational Systems.
14th Annual Meeting of the Special Interest Group on Discourse and Dialogue - SIGDial. Metz, France. (*) [abstract] [pdf] (2013). Exploring the effects of gaze and pauses in situated human-robot interaction. In
(*) Nominated for Best Paper Award at SIGdial 2013
Proceedings of Interspeech 2013 (pp. 2584-2588). Lyon, France. [abstract] [pdf] (2013). Timing responses to questions in dialogue. In
Proc. of the IVA 2012 workshop on Realtime Conversational Virtual Agents (RCVA 2012). Santa Cruz, CA, USA. [abstract] [pdf] (2012). Is it really worth it? Cost-based selection of system responses to speech-in-overlap. In
Proc. of the IVA 2012 workshop on Realtime Conversational Virtual Agents (RCVA 2012). Santa Cruz, CA, USA. [abstract] [pdf] (2012). Gaze direction as a Back-Channel inviting Cue in Dialogue. In