Situated AV-Interaction with Robots

Strategic Research Area ICT - The Next Generation

Documents

TNG kick-off presentation

Situated Audio Visual Interaction with Robots is a project within the Strategic Research Area ICT - The Next Generation.

The goal of the project is to build a research platform which combines spoken dialogue technology with visual object recognition in a robot. It will enable research on:

  • true mixed initiative, collaborative human-robot interaction.
  • cognitive modeling of visual scenes
  • learning by audiovisual interaction with humans
The robot should with the help of a human be able to:
  • acquire new knowledge
  • learn new skills
  • adapt the actions to the assisted user
  • adapt to the possibly changing environment
The project core directions, audiovisual interaction and robotics, is a challenging developing research area.

KTH is unique in having successful groups complementing each other through their backgrounds in each respective research discipline. The groups are engaged in robotics-related EU and VR projects such as IURO, GRASP, PACO-PLUS and CogX.

Groups and Researchers

Department of Speech, Music and Hearing (TMH), KTH

Computer Vision and Active Perception Lab (CVAP), KTH

Automatic Control Lab (ACCESS), KTH

Interaction Design and Innovation (IDI), SICS

Video demonstration

Publications

Al Moubayed, S., Beskow, J., Blomberg, M., Granström, B., Gustafson, J., Mirnig, N., & Skantze, G. (2012). Talking with Furhat - multi-party interaction with a back-projected robot head. In Proceedings of Fonetik'12. Gothenberg, Sweden. [abstract] [pdf]

Al Moubayed, S., Beskow, J., Skantze, G., & Granström, B. (2012). Furhat: A Back-projected Human-like Robot Head for Multiparty Human-Machine Interaction. In Esposito, A., Esposito, A., Vinciarelli, A., Hoffmann, R., & C. Müller, V. (Eds.), Cognitive Behavioural Systems. Lecture Notes in Computer Science (pp. 114-130). Springer.

Al Moubayed, S., & Skantze, G. (2012). Perception of Gaze Direction for Situated Interaction. In Proc. of the 4th Workshop on Eye Gaze in Intelligent Human Machine Interaction. The 14th ACM International Conference on Multimodal Interaction ICMI. Santa Monica, CA, USA. [abstract] [pdf]

Al Moubayed, S., Skantze, G., Beskow, J., Stefanov, K., & Gustafson, J. (2012). Multimodal Multiparty Social Interaction with the Furhat Head. In Proc. of the 14th ACM International Conference on Multimodal Interaction ICMI. Santa Monica, CA, USA. (*) [abstract] [pdf]

(*) Outstanding Demo Award at ICMI 2012

Al Moubayed, S., Beskow, J., Granström, B., Gustafson, J., Mirning, N., Skantze, G., & Tscheligi, M. (2012). Furhat goes to Robotville: a large-scale multiparty human-robot interaction data collection in a public space. In Proc of LREC Workshop on Multimodal Corpora. Istanbul, Turkey. [pdf]

Edlund, J., Heldner, M., & Gustafson, J. (2012). Who am I speaking at? - perceiving the head orientation of speakers from acoustic cues alone. In Proc. of LREC Workshop on Multimodal Corpora 2012. Istanbul, Turkey. [abstract] [pdf]

Edlund, J., Heldner, M., & Gustafson, J. (2012). On the effect of the acoustic environment on the accuracy of perception of speaker orientation from auditory cues alone. In Proc. of Interspeech 2012. Portland, Oregon, US. [abstract] [pdf]

Oertel, C., Wlodarczak, M., Edlund, J., Wagner, P., & Gustafson, J. (2012). Gaze Patterns in Turn-Taking. In Proc. of Interspeech 2012. Portland, Oregon, US. [abstract] [pdf]

Skantze, G. (2012). A Testbed for Examining the Timing of Feedback using a Map Task. In Proceedings of the Interdisciplinary Workshop on Feedback Behaviors in Dialog. Portland, OR. (*) [abstract] [pdf]

(*) Selected for keynote presentation

Skantze, G., Al Moubayed, S., Gustafson, J., Beskow, J., & Granström, B. (2012). Furhat at Robotville: A Robot Head Harvesting the Thoughts of the Public through Multi-party Dialogue. In Proceedings of IVA-RCVA. Santa Cruz, CA. [pdf]

Al Moubayed, S., & Skantze, G. (2011). Turn-taking Control Using Gaze in Multiparty Human-Computer Dialogue: Effects of 2D and 3D Displays. In Proceedings of AVSP. Florence, Italy. [pdf]

Johnson-Roberson, M., Bohg, J., Skantze, G., Gustafson, J., Carlson, R., Rasolzadeh, B., & Kragic, D. (2011). Enhanced Visual Scene Understanding through Human-Robot Dialog. In IEEE/RSJ International Conference on Intelligent Robots and Systems. [pdf]

Johnson-Roberson, M., Bohg, J., Kragic, D., Skantze, G., Gustafson, J., & Carlson, R. (2010). Enhanced Visual Scene Understanding through Human-Robot Dialog. In Proceedings of AAAI 2010 Fall Symposium: Dialog with Robots. Arlington, VA. [pdf]