Joakim  Gustafson

I am a professor, and head of the Division of Speech, Music and Hearing at KTH. I started working with spoken dialogue systems in 1992 as part of the Waxholm project. In 1998 I led the development of the August project, which had the aim of building a synthetic August Strindberg that members of the public could chat with. Between 1998 and 2002 I worked on the AdApt system, which was a multimodal spoken dialogue system for browsing apartments on sale in Stockholm. The topic of my PhD thesis was iterative development of multimodal dialogue systems.

Between 2000 and 2007 I had a position as a senior researcher at Telia Research. My main focus was on research projects, but I also acted as expert support in commercial launches of speech applications. In 2001 I worked in the TänkOm project, where visitors at the telecom museum could enter an apartment of the future and interact with the animated agent Pixie. 2002-05 I led the development of Telia's part of the EU funded project NICE. The goal of this project was to build a speech enabled computer game where children could interact with animated 3D game characters.

In my current position I am doing long-term research in design and development of multimodal systems and social robots, as well as interactional analysis of spontaneous spoken dialogue phenomena. I am currently heading the VR-funded project CONNECTED (context-aware speech synthesis for conversational AI) that develops a spontaneous speech synthesizer that uses breath and disfluencies as an implicit control of the manner of speaking. We also want our voices to be used in Augmentative and Alternative Communication. I am also co-PI in the VR funded project STANCE (Perception of speaker stance - using spontaneous speech synthesis to explore the contribution of prosody, context and speaker) and the RJ-funded project CAPTivating (Comparative Analysis of Public speaking with Text-to-speech). For examples of our Spontaneous speech goto TTS examples)

Another interesting project is the PTS-funded project WikiSpeech2, that will make open source speech syntheses available through Wikimedia’s server architecture, that has gotten world-wide attention. In 2010 I started working on human-robot interaction. In the SRA ICT-TNG funded project SAVIR we collaborated with RPL to develop a robot that talked with humans in order to interpret visual scenes, see video. In the EU project IURO, we developed a mobile robot that could ask pedestrians in Munich for directions, see video. In this project the social robot FurHat was developed. I was recently the technical coordinator and work package leader in the EU project BabyRobot, where developed social robot applications for children. Our goal was to build a robot that is able to achieve joint-attention with the user, see video. I currently collaborate with RPL in two HRI-related projects funded by SSF: FACT, where we will develop collaborative robots for manufacturing and EACare where we will build a robot companion that will support elderly mental wellbeing and develop methods for detecting early signs of dementia. Finally, I have a KTH-techlab funded collaborative project AAIS (Advanced Adaptive Intelligent Systems), where the aim is to develop social robots that can assist elderly in everyday tasks, e.g. giving sitation-dependent cooking directions.

I am examiner and lecturer in the speech technology course. I am member of the Editorial Boards of the journal Speech Communication and the International Journal of Human-Computer Studies, and Review Editor on the Editorial Board of Human-Robot Interaction, specialty section of Frontiers in Robotics and AI. I was Technical Program co-Chair of Interspeech 2017 and spoken dialogue Area Chair of Interspeech 2015,2016,2018,2019. As of september 2021 I am member of the Board of ISCA (International Speech Communication Association).

List of Publications

Department of Speech, Music and Hearing, KTH

SE-100 44 Stockholm


Visiting address:
Lindstedtsvägen 24
Tel +468790 8965   Fax -7854
Mob +46 70 206 08 17