Seminar at Speech, Music and Hearing:

Human interaction as a model for spoken dialogue system behaviour

Doktorsdisputation i sal F3

Anna Hjalmarsson
fakultetsopponent är David Schlangen, Potsdam

This thesis is a step towards the long-term and high-reaching objec-tive of building dialogue systems whose behaviour is similar to a hu-man dialogue partner. The aim is not to build a machine with the same conversational skills as a human being, but rather to build a machine that is human enough to encourage users to interact with it accordingly. The behaviours in focus are cue phrases, hesitations and turn-taking cues. These behaviours serve several important communi-cative functions such as providing feedback and managing turn-taking. Thus, if dialogue systems could use interactional cues similar to those of humans, these systems could be more intuitive to talk to. A major part of this work has been to collect, identify and analyze the target behaviours in human-human interaction in order to gain a better understanding of these phenomena. Another part has been to reproduce these behaviours in a dialogue system context and explore listeners’ perceptions of these phenomena in empirical experiments.

The thesis is divided into two parts. The first part serves as an overall background. The issues and motivations of humanlike dia-logue systems are discussed. This part also includes an overview of research on human language production and spoken language gen-eration in dialogue systems.

The next part presents the data collections, data analyses and em-pirical experiments that this thesis is concerned with. The first study presented is a listening test that explores human behaviour as a model for dialogue systems. The results show that a version based on human behaviour is rated as more humanlike, polite and intelligent than a constrained version with less variability. Next, the DEAL dia-logue system is introduced. DEAL is used as a platform for the re-search presented in this thesis. The domain of the system is a trade domain and the target audience are second language learners of Swedish who want to practice conversation. Furthermore, a data col-lection of human-human dialogues in the DEAL domain is pre-sented. Analyses of cue phrases in these data are provided as well as an experimental study of turn-taking cues. The results from the turn-taking experiment indicate that turn-taking cues realized with a di-phone synthesis affect the expectations of a turn change similar to the corresponding human version.

Finally, an experimental study that explores the use of talkspurt-initial cue phrases in an incremental version of DEAL is presented. The results show that the incremental version had shorter response times and was rated as more efficient, more polite and better at indi-cating when to speak than a non-incremental implementation of the same system.

10:00 - 13:00
Friday September 3, 2010

The seminar is held in sal F3, Lindstedsvägen 26.

