The robot should with the help of a human be able to:
acquire new knowledge
learn new skills
adapt the actions to the assisted user
adapt to the possibly changing environment
KTH is unique in having successful groups complementing each other through their backgrounds in each respective research discipline. The groups are engaged in robotics-related EU and VR projects such as IURO, GRASP, PACO-PLUS and CogX.
Al Moubayed, S., Beskow, J., Blomberg, M., Granström, B., Gustafson, J., Mirnig, N., & Skantze, G. (2012). Talking with Furhat - multi-party interaction with a back-projected robot head. In Proceedings of Fonetik'12. Gothenberg, Sweden. [abstract][pdf]
Abstract: This is a condensed presentation of some recent work on a back-projected robotic head for multi-party interaction in public settings. We will describe some of the design strategies and give some preliminary analysis of an interaction database collected at the Robotville exhibition at the London Science Museum
Al Moubayed, S., Beskow, J., Skantze, G., & Granström, B. (2012). Furhat: A Back-projected Human-like Robot Head for Multiparty Human-Machine Interaction. In Esposito, A., Esposito, A., Vinciarelli, A., Hoffmann, R., & C. Müller, V. (Eds.), Cognitive Behavioural Systems. Lecture Notes in Computer Science (pp. 114-130). Springer.
Al Moubayed, S., & Skantze, G. (2012). Perception of Gaze Direction for Situated Interaction. In Proc. of the 4th Workshop on Eye Gaze in Intelligent Human Machine Interaction. The 14th ACM International Conference on Multimodal Interaction ICMI. Santa Monica, CA, USA. [abstract][pdf]
Abstract: Accurate human perception of robots’ gaze direction is crucial for the design of a natural and fluent situated multimodal face-to-face interaction between humans and machines. In this paper, we present an experiment targeted at quantifying the effects of different gaze cues synthesized using the Furhat back-projected robot head, on the accuracy of perceived spatial direction of gaze by humans using 18 test subjects. The study first quantifies the accuracy of the perceived gaze direction in a human-human setup, and compares that to the use of synthesized gaze movements in different conditions: viewing the robot eyes frontal or at a 45 degrees angle side view. We also study the effect of 3D gaze by controlling both eyes to indicate the depth of the focal point, the use of gaze or head pose, and the use of static or dynamic eyelids. The findings of the study are highly relevant to the design and control of robots and animated agents in situated face-to-face interaction
Al Moubayed, S., Skantze, G., Beskow, J., Stefanov, K., & Gustafson, J. (2012). Multimodal Multiparty Social Interaction with the Furhat Head. In Proc. of the 14th ACM International Conference on Multimodal Interaction ICMI. Santa Monica, CA, USA. (*) [abstract][pdf]
(*) Outstanding Demo Award at ICMI 2012
Abstract: We will show in this demonstrator an advanced multimodal and multiparty spoken conversational system using Furhat, a robot head based on projected facial animation. Furhat is a human-like interface that utilizes facial animation for physical robot heads using back-projection. In the system, multimodality is enabled using speech and rich visual input signals such as multi-person real-time face tracking and microphone tracking. The demonstrator will showcase a system that is able to carry out social dialogue with multiple interlocutors simultaneously with rich output signals such as eye and head coordination, lips synchronized speech synthesis, and non-verbal facial gestures used to regulate fluent and expressive multiparty conversations.
Al Moubayed, S., Beskow, J., Granström, B., Gustafson, J., Mirning, N., Skantze, G., & Tscheligi, M. (2012). Furhat goes to Robotville: a large-scale multiparty human-robot interaction data collection in a public space. In Proc of LREC Workshop on Multimodal Corpora. Istanbul, Turkey. [pdf]
Edlund, J., Heldner, M., & Gustafson, J. (2012). On the effect of the acoustic environment on the accuracy of perception of speaker orientation from auditory cues alone. In Proc. of Interspeech 2012. Portland, Oregon, US. [abstract][pdf]
Abstract: The ability of people, and of machines, to determine the position of a sound source in a room is well studied. The related ability to determine the orientation of a directed sound source, on the other hand, is not, but the few studies there are show people to be surprisingly skilled at it. This has bearing for studies of face-to-face interaction and of embodied spoken dialogue systems, as sound source orientation of a speaker is connected to the head pose of the speaker, which is meaningful in a number of ways. The feature most often implicated for detection of sound source orientation is the inter-aural level difference - a feature which it is assumed is more easily exploited in anechoic chambers than in everyday surroundings. We expand here on our previous studies and compare detection of speaker orientation within and outside of the anechoic chamber. Our results show that listeners find the task easier, rather than harder, in everyday surroundings, which suggests that inter-aural level differences is not the only feature at play.
Edlund, J., Heldner, M., & Gustafson, J. (2012). Who am I speaking at? - perceiving the head orientation of speakers from acoustic cues alone. In Proc. of LREC Workshop on Multimodal Corpora 2012. Istanbul, Turkey. [abstract][pdf]
Abstract: The ability of people, and of machines, to determine the position of a sound source in a room is well studied. The related ability to determine the orientation of a directed sound source, on the other hand, is not, but the few studies there are show people to be surprisingly skilled at it. This has bearing for studies of face-to-face interaction and of embodied spoken dialogue systems, as sound source orientation of a speaker is connected to the head pose of the speaker, which is meaningful in a number of ways. We describe in passing some preliminary findings that led us onto this line of investigation, and in detail a study in which we extend an experiment design intended to measure perception of gaze direction to test instead for perception of sound source orientation. The results corroborate those of previous studies, and further show that people are very good at performing this skill outside of studio conditions as well.
Oertel, C., Wlodarczak, M., Edlund, J., Wagner, P., & Gustafson, J. (2012). Gaze Patterns in Turn-Taking. In Proc. of Interspeech 2012. Portland, Oregon, US. [abstract][pdf]
Abstract: This paper investigates gaze patterns in turn-taking. We focus on the difference between speaker changes resulting in gaps and overlaps. We also investigate gaze patters around backchannels and around silences not involving speaker changes.
Skantze, G. (2012). A Testbed for Examining the Timing of Feedback using a Map Task. In Proceedings of the Interdisciplinary Workshop on Feedback Behaviors in Dialog. Portland, OR. (*) [abstract][pdf]
(*) Selected for keynote presentation
Abstract: In this paper, we present a fully automated spoken dialogue sys-tem that can perform the Map Task with a user. By implementing a trick, the system can convincingly act as an attentive listener, without any speech recognition. An initial study is presented where we let users interact with the system and recorded the interactions. Using this data, we have then trained a Support Vector Machine on the task of identifying appropriate locations to give feedback, based on automatically extractable prosodic and contextual features. 200 ms after the end of the user’s speech, the model may identify response locations with an accuracy of 75%, as compared to a baseline of 56.3%.
Skantze, G., Al Moubayed, S., Gustafson, J., Beskow, J., & Granström, B. (2012). Furhat at Robotville: A Robot Head Harvesting the Thoughts of the Public through Multi-party Dialogue. In Proceedings of IVA-RCVA. Santa Cruz, CA. [pdf]
Al Moubayed, S., & Skantze, G. (2011). Turn-taking Control Using Gaze in Multiparty Human-Computer Dialogue: Effects of 2D and 3D Displays. In Proceedings of AVSP. Florence, Italy. [pdf]
Johnson-Roberson, M., Bohg, J., Skantze, G., Gustafson, J., Carlson, R., Rasolzadeh, B., & Kragic, D. (2011). Enhanced Visual Scene Understanding through Human-Robot Dialog. In IEEE/RSJ International Conference on Intelligent Robots and Systems. [pdf]
Johnson-Roberson, M., Bohg, J., Kragic, D., Skantze, G., Gustafson, J., & Carlson, R. (2010). Enhanced Visual Scene Understanding through Human-Robot Dialog. In Proceedings of AAAI 2010 Fall Symposium: Dialog with Robots. Arlington, VA. [pdf]