Joakim Gustafson
◆ Location: Stockholm, Sweden  —  KTH Royal Institute of Technology

Joakim Gustafson_

Professor & Head of Department

Department of Speech, Music and Hearing

Background

I am a professor and head of the Department of Speech, Music and Hearing at KTH Royal Institute of Technology, working on conversational AI, speech synthesis, social robotics, and multimodal human-computer interaction. I have built spoken dialogue systems since 1992 — from Waxholm, to August (a synthetic August Strindberg chatting with the public at Stockholm Cultural Centre for six months in 1998), to AdApt, a multimodal apartment-browsing system. My PhD thesis focused on iterative development of multimodal dialogue systems. From 2000 to 2007 I was a senior researcher at Telia Research, including leading KTH’s work on the EU project NICE — a speech-enabled computer game where children interacted with animated 3D characters.

Back at KTH from 2007, my research expanded into human-robot interaction. In SAVIR I collaborated with RPL to build a robot that interpreted visual scenes through dialogue. In the EU project IURO we built a mobile robot asking pedestrians for directions in Munich — work that led to the social robot Furhat. I was also technical coordinator on BabyRobot (EU), developing social robot applications for children. A parallel research strand is conversational spontaneous speech synthesis. I headed the VR project CONNECTED, which produced a TTS system controlled implicitly through breath and disfluencies, and was co-PI on STANCE and CAPTivating, using that TTS to study how prosody, fillers, and voice quality shape perceived speaker stance.

My current theme is AI for good. EmpowerME used LLMs to help people with cognitive disabilities handle official correspondence; FoodTalk explored voice-based cooking assistants. Two 2026 projects — SAInt and Kitchen Companion — continue that work, targeting in-home elderly assistance at the KTH Interaction and Robotics Lab. On the AAC front, RAPPORT and Personalized Voices form a joint programme on giving people back their voice.

Specializations

01 / CONVERSATIONAL AI

Conversational Speech Synthesis

Spontaneous speech synthesizers using breath, disfluencies, and prosody as implicit controls of speaking style. Context-aware TTS for dialogue systems. Turn-taking cues and voice quality in conversational AI.

02 / ROBOTICS

Social Robotics

Designing robots that engage in natural, multimodal conversation and function as social companions — from joint-attention robots for children to humanoid robots supporting domestic activities for the elderly.

03 / IMPACT

AI for Good & Accessibility

Using LLMs, AI voices and humanoid robots to empower people with cognitive disabilities, improve AAC devices, help elderly users maintain independence, and support social connection through intelligent voice assistants.

Active Projects

◆ Completed Projects

VR EU SSF Vinnova PTS Digital Futures RJ CTT TeliaResearch SRA/TNG HSFR/NUTEK
FoodTalk Vinnova — 2024–2025
Investigated how intelligent assistants could promote sustainable cooking, with elderly participants cooking with an AI assistant in the Interaction and Robotics Lab.
EmpowerMe PTS — 2023–2025
Used large language models to help people with cognitive disabilities understand and respond to official correspondence from authorities.
STANCE VR/HS — 2021–2025
Perceptual studies of speaker stance using spontaneous speech synthesis developed in CONNECTED.
CAPTivating RJ — 2021–2025
Comparative analysis of public speaking with text-to-speech; perceptual studies using the spontaneous TTS developed in CONNECTED.
CONNECTED VR/NT — 2020–2025
Context-aware speech synthesis for conversational AI. Developed a spontaneous speech synthesiser using breath and disfluencies as implicit control of speaking manner.
AAIS Digital Futures — 2020–2025
Advanced Adaptive Intelligent Systems. Developed social robots to assist elderly in everyday tasks, including situation-dependent cooking directions.
OpenUp Vinnova — 2020–2021
Facilitating the reopening of visitor spaces in the wake of the COVID-19 pandemic.
FACT SSF — 2016–2021
Research on spoken interaction and social robotics within the manipulation tasks, such as assembling an IKEA stool.
EACare SSF — 2016–2021
A project on a multimodal detection on early signs of dementia. A massively multimodal collection of memory test conducted by doctors in the memory clinic at KI.
Wikispeech PTS — 2016–2021
Made Wikipedia pages accessible through speech synthesis and built infrastructure for users to donate their voices.
BabyRobot EU — 2016–2018
Robots that analyse and track human behaviour using audio-visual monitoring, targeting typically developing and autistic spectrum children.
InkSynt VR/NT — 2014–2018
Reasearch on how to design a system for incremental text-to-speech conversion.
GetHomeSafe EU — 2011–2014
Developed a system for safe information access and spoken communication while driving. KTH focused on human-like behaviour while stopping and resuming a spoken interaction.
SamSynt VR/NT — 2010–2013
Research on how to introduce interactional phenomena like backchannels into speech synthesis.
SAVIR SRA/TNG — 2010–2013
Investigated how a robot can improve visual scene understanding by engaging in spoken dialogue with a human.
IURO EU — 2010–2012
Built a robot that autonomously navigates in urban environments and asked human passers-by for route directions.
MonAMI EU — 2006–2009
Mainstreamed accessibility in consumer goods and services through advanced technologies for independent living and equal participation.
NICE EU — 2002–2005
Speech-enabled computer game where children could interact with animated 3D game characters using a mix of spoken Swedish and pointing with a gyro mouse into the 3D scene on a 3x4 m large screen.
TänkOm TeliaResearch — 2002–2003
Visitors at the Telecom Museum could interact with animated agent Pixie in spoken Swedish in an apartment of the future. Collected a large corpus of human-computer interactions daily for 2 years.
AdApt CTT — 1999–2002
Foundation for advanced multimodal spoken dialogue systems; users cooperated with an animated agent in Swedish to find and talk about apartments for sale in Stockholm.
August CTT — 1998–1999
Animated conversational agent inspired by August Strindberg, available daily at the Culture Centre in Stockholm for Stockholm’s year as European Cultural Capital.
Gulan CTT — 1996–1998
A system for teaching spoken dialogue systems technology, putting a fully functioning Swedish multimodal dialogue system into students’ hands as an instructional aid.
ONOMASTICA EU — 1993–1996
EU project on multilingual pronunciation lexicons and name recognition, resulting in a Licentiate thesis on Swedish name pronunciation for speech synthesis.
Waxholm HSFR/NUTEK — 1992–1995
Pioneering multimodal spoken dialogue system combining new dialogue management and parsing modules with TMH’s audio-visual speech synthesis and speech recognition. Swedish spoken dialogue system to get information about boat traffic in the Stockholm archipelago.

Top Cited Papers

Teaching & Service

Teaching

Editorial Boards

Community Roles

  • Treasurer — ISCA Board
  • Technical Program Co-Chair — Interspeech 2017
  • Spoken Dialogue Area Chair — Interspeech 2015, 2016, 2018, 2019

Contact

Postal Address

Department of Speech, Music and Hearing

KTH Royal Institute of Technology

SE-100 44 Stockholm, Sweden

Visiting Address

Lindstedtsvägen 24

Stockholm, Sweden