Contact




SAMINK

Incremental processing in multimodal conversational systems

The aim of this project is to understand how conversational systems can engage in spoken face-to-face interaction in real-time.

Conversational systems allow humans to interact with machines by means of spoken language. This project will deal with one of the limitations that such systems currently have, namely that they typically wait for the user to finish speaking before processing the spoken input. As the computer doesn’t continuously interpret what the user is saying, it is very hard for it to accurately decide when to take the turn. This may result in interruptions and delayed responses. Contrary to this, humans interpret what is being said incrementally. Incremental processing will allow conversational systems not only to find more suitable places to take the turn, but also to give and receive continuous feedback in the form of backchannels (e.g., “okay”, “mhm”). The project will target conversational systems in a face-to-face setting, using an animated face and head-tracking (also called multimodal systems). Such a setting will also allow feedback in the form of facial expressions and head movements (such as nods). The project has four main goals. First, we will build a general model and test-bed for incremental dialogue processing, by improving our existing dialogue system framework. Second, we will implement at least two prototype systems within specific domains using this framework. Third, within these prototype systems, we will build data-driven models of turn-taking and feedback behaviour. Finally, the prototype systems will be tested in user studies.

Group: Speech Communication and Technology

Staff:
Gabriel Skantze

Funding: VR (2011-6237)

Duration: 2012-01-01 - 2015-12-31

Related publications:

2014

Meena, R., Boye, J., Skantze, G., & Gustafson, J. (2014). Crowdsourcing Street-level Geographic Information Using a Spoken Dialogue System. In Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL) (pp. 2-11). Philadelphia, PA, USA. [abstract] [pdf]

Meena, R., Boye, J., Skantze, G., & Gustafson, J. (2014). Using a Spoken Dialogue System for Crowdsourcing Street-level Geographic Information. In the 2nd Workshop on Action, Perception and Language, SLTC 2014. Uppsala, Sweden. [abstract] [pdf]

Meena, R., Skantze, G., & Gustafson, J. (2014). Data-driven Models for timing feedback responses in a Map Task dialogue system. Computer Speech and Language, 28(4), 903-922. [abstract] [pdf]

Skantze, G., Anna, H., & Oertel, C. (2014). User Feedback in Human-Robot Dialogue: Task Progression and Uncertainty. In Proceedings of the HRI Workshop on Timing in Human-Robot Interaction. Bielefeld, Germany.

Skantze, G., Hjalmarsson, A., & Oertel, C. (2014). Turn-taking, Feedback and Joint Attention in Situated Human-Robot Interaction. Speech Communication, 65, 50-66. [abstract] [pdf]

2013

Johansson, M., Skantze, G., & Gustafson, J. (2013). Head Pose Patterns in Multiparty Human-Robot Team-Building Interactions. In International Conference on Social Robotics - ICSR 2013. Bristol, UK. [abstract] [pdf]

Meena, R., Skantze, G., & Gustafson, J. (2013). A Data-driven Model for Timing Feedback in a Map Task Dialogue System. In 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL) (pp. 375-383). Metz, France. (*) [abstract] [pdf]

(*) Nominated for Best Paper Award at SIGdial 2013

Meena, R., Skantze, G., & Gustafson, J. (2013). The Map Task Dialogue System: A Test-bed for Modelling Human-Like Dialogue. In 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL) (pp. 366-368). Metz, France. [abstract] [pdf]

Skantze, G., & Hjalmarsson, A. (2013). Towards Incremental Speech Generation in Conversational Systems. Computer Speech & Language, 27(1), 243-262. [abstract] [pdf]

Skantze, G., Hjalmarsson, A., & Oertel, C. (2013). Exploring the effects of gaze and pauses in situated human-robot interaction. In 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue - SIGDial. Metz, France. (*) [abstract] [pdf]

(*) Nominated for Best Paper Award at SIGdial 2013

Skantze, G., Oertel, C., & Hjalmarsson, A. (2013). User feedback in human-robot interaction: Prosody, gaze and timing. In Proceedings of Interspeech. [abstract] [pdf]

2012

Skantze, G. (2012). A Testbed for Examining the Timing of Feedback using a Map Task. In Proceedings of the Interdisciplinary Workshop on Feedback Behaviors in Dialog. Portland, OR. (*) [abstract] [pdf]

(*) Selected for keynote presentation

Skantze, G., & Al Moubayed, S. (2012). IrisTK: a statechart-based toolkit for multi-party face-to-face interaction. In Proceedings of ICMI. Santa Monica, CA. [pdf]







Published by: TMH, Speech, Music and Hearing
Webmaster, webmaster@speech.kth.se

Last updated: 2012-11-09