Incremental processing in multimodal conversational systems
The aim of this project is to understand how conversational systems can engage in spoken face-to-face interaction in real-time.
Conversational systems allow humans to interact with machines by means of spoken language. This project will deal with one of the limitations that such systems currently have, namely that they typically wait for the user to finish speaking before processing the spoken input. As the computer doesn’t continuously interpret what the user is saying, it is very hard for it to accurately decide when to take the turn. This may result in interruptions and delayed responses. Contrary to this, humans interpret what is being said incrementally. Incremental processing will allow conversational systems not only to find more suitable places to take the turn, but also to give and receive continuous feedback in the form of backchannels (e.g., “okay”, “mhm”).
The project will target conversational systems in a face-to-face setting, using an animated face and head-tracking (also called multimodal systems). Such a setting will also allow feedback in the form of facial expressions and head movements (such as nods).
The project has four main goals. First, we will build a general model and test-bed for incremental dialogue processing, by improving our existing dialogue system framework. Second, we will implement at least two prototype systems within specific domains using this framework. Third, within these prototype systems, we will build data-driven models of turn-taking and feedback behaviour. Finally, the prototype systems will be tested in user studies.
Group: Speech Communication and Technology
Funding: VR (2011-6237)
Duration: 2012-01-01 - 2015-12-31