Modelling utterance generation in conversational dialogue systems

The GenDial project spans research in several disciplines including computer science, human-human and human-machine spoken interaction and language technology. Our long term research goal is to develop a new class of conversational spoken dialog systems which to a large extent follows the principles of human-human interaction. A natural part of human conversation is to choose what we say, when we say it and how we say it depending on our intentions, our conversational partners and the current state of the dialogue. For machines to be perceived as natural conversational partners the output needs to be coherent with such an adaptive behaviour. This proposal addresses a challenge that so far to a large extend has been neglected - the utterance generation process. We will focus on how this process could be made context dependent and how such a generation model should be integrated in a spoken dialog system. The project plan includes: data analysis on human-human and human-machine interaction; development of speech generation models; integration of the models in our existing multimodal dialog system platform; evaluation of the models in isolation and as part of a conversational dialog system. The proposed research on utterance generation will strengthen the knowledge in an area that so far has attracted limited research effort despite fact that relevant and well formed system output is of great importance for the perception and acceptability of spoken dialogue systems.

Group: Speech Communication and Technology

Rolf Carlson (Project leader)
Anna Hjalmarsson
Gabriel Skantze

Funding: VR

Duration: 2008 - 2010

Related publications:


Hjalmarsson, A. (2010). Human interaction as a model for spoken dialogue system behaviour. Doctoral dissertation. [abstract] [pdf]

Schlangen, D., Baumann, T., Buschmeier, H., Buss, O., Kopp, S., Skantze, G., & Yaghoubzadeh, R. (2010). Middleware for Incremental Processing in Conversational Agents. In Proceedings of SigDial. Tokyo, Japan. [pdf]

Skantze, G., & Hjalmarsson, A. (2010). Towards Incremental Speech Generation in Dialogue Systems. In Proceedings of SIGdial (pp. 1-8). Tokyo, Japan. (*) [abstract] [pdf]

(*) Best Paper Award at SIGdial 2010


Schlangen, D., & Skantze, G. (2009). A general, abstract model of incremental dialogue processing. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL-09). Athens, Greece. [abstract] [pdf]

Skantze, G., & Gustafson, J. (2009). Attention and interaction control in a human-human-computer dialogue setting. In Proceedings of SigDial 2009. London, UK. [abstract] [pdf]

Skantze, G., & Schlangen, D. (2009). Incremental dialogue processing in a micro-domain. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL-09). Athens, Greece. [abstract] [pdf]


Hjalmarsson, A., & Edlund, J. (2008). Human-likeness in utterance generation: effects of variability. In Perception in Multimodal Dialogue Systems - Proceedings of the 4th IEEE Tutorial and Research Workshop on Perception and Interactive Technologies for Speech-Based Systems, PIT 2008, Kloster Irsee, Germany, June 16-18, 2008. (pp. 252-255). Berlin/Heidelberg: Springer. [abstract] [pdf]

Published by: TMH, Speech, Music and Hearing

Last updated: 2012-11-09