Joakim Gustafson

◆ Mission 01

Background

I am a professor and head of the Department of Speech, Music and Hearing at KTH Royal Institute of Technology, working on conversational AI, speech synthesis, social robotics, and multimodal human-computer interaction. I have built spoken dialogue systems since 1992 — from Waxholm, to August (a synthetic August Strindberg chatting with the public at Stockholm Cultural Centre for six months in 1998), to AdApt, a multimodal apartment-browsing system. My PhD thesis focused on iterative development of multimodal dialogue systems. From 2000 to 2007 I was a senior researcher at Telia Research, including leading KTH’s work on the EU project NICE — a speech-enabled computer game where children interacted with animated 3D characters.

Back at KTH from 2007, my research expanded into human-robot interaction. In SAVIR I collaborated with RPL to build a robot that interpreted visual scenes through dialogue. In the EU project IURO we built a mobile robot asking pedestrians for directions in Munich — work that led to the social robot Furhat. I was also technical coordinator on BabyRobot (EU), developing social robot applications for children. A parallel research strand is conversational spontaneous speech synthesis. I headed the VR project CONNECTED, which produced a TTS system controlled implicitly through breath and disfluencies, and was co-PI on STANCE and CAPTivating, using that TTS to study how prosody, fillers, and voice quality shape perceived speaker stance.

My current theme is AI for good. EmpowerME used LLMs to help people with cognitive disabilities handle official correspondence; FoodTalk explored voice-based cooking assistants. Two 2026 projects — SAInt and Kitchen Companion — continue that work, targeting in-home elderly assistance at the KTH Interaction and Robotics Lab. On the AAC front, RAPPORT and Personalized Voices form a joint programme on giving people back their voice.

◆ Mission 02

Specializations

01 / CONVERSATIONAL AI

Conversational Speech Synthesis

Spontaneous speech synthesizers using breath, disfluencies, and prosody as implicit controls of speaking style. Context-aware TTS for dialogue systems. Turn-taking cues and voice quality in conversational AI.

02 / ROBOTICS

Social Robotics

Designing robots that engage in natural, multimodal conversation and function as social companions — from joint-attention robots for children to humanoid robots supporting domestic activities for the elderly.

03 / IMPACT

AI for Good & Accessibility

Using LLMs, AI voices and humanoid robots to empower people with cognitive disabilities, improve AAC devices, help elderly users maintain independence, and support social connection through intelligent voice assistants.

◆ Mission 03

Active Projects

WASP NEST — 2022–2026

PerCorSo

Perceiving and Communicating Correct-by-design Socially Acceptable Autonomous Systems. Integrates perception, planning, and communication to develop robots that are both provably safe and perceived as trustworthy in shared human environments.

Promobilia — 2026–2031

SAInt

Situated Agentic Intelligence. Five humanoid robots (e.g. Unitree G1, Rainbow RB-Y1) will be equipped with multimodal conversational AI to assist with domestic tasks and serve as social companions for the elderly.

Kamprad Foundation — 2026–2029

Kitchen Companion

A proactive, voice-based AI assistant that helps older adults living alone cook nutritious meals and stay socially connected. Builds on findings from the Food Talk project.

WASP — 2024–2028

RAPPORT

Real-time context-aware speech prosthesis for conversational interaction. Combines spontaneous TTS and large language models to create more natural, expressive AAC devices for people who cannot speak.

VR — 2026–2030

Personalized Voices

Adaptable speech synthesis for speech-impaired users. The goal is to develop personalised, adaptable voice models that reflect a user’s own natural voice identity.

◆ Completed Projects

VR EU SSF Vinnova PTS Digital Futures RJ CTT TeliaResearch SRA/TNG HSFR/NUTEK

FoodTalk Vinnova — 2024–2025

Investigated how intelligent assistants could promote sustainable cooking, with elderly participants cooking with an AI assistant in the Interaction and Robotics Lab.

EmpowerMe PTS — 2023–2025

Used large language models to help people with cognitive disabilities understand and respond to official correspondence from authorities.

STANCE VR/HS — 2021–2025

Perceptual studies of speaker stance using spontaneous speech synthesis developed in CONNECTED.

CAPTivating RJ — 2021–2025

Comparative analysis of public speaking with text-to-speech; perceptual studies using the spontaneous TTS developed in CONNECTED.

CONNECTED VR/NT — 2020–2025

Context-aware speech synthesis for conversational AI. Developed a spontaneous speech synthesiser using breath and disfluencies as implicit control of speaking manner.

AAIS Digital Futures — 2020–2025

Advanced Adaptive Intelligent Systems. Developed social robots to assist elderly in everyday tasks, including situation-dependent cooking directions.

OpenUp Vinnova — 2020–2021

Facilitating the reopening of visitor spaces in the wake of the COVID-19 pandemic.

FACT SSF — 2016–2021

Research on spoken interaction and social robotics within the manipulation tasks, such as assembling an IKEA stool.

EACare SSF — 2016–2021

A project on a multimodal detection on early signs of dementia. A massively multimodal collection of memory test conducted by doctors in the memory clinic at KI.

Wikispeech PTS — 2016–2021

Made Wikipedia pages accessible through speech synthesis and built infrastructure for users to donate their voices.

BabyRobot EU — 2016–2018

Robots that analyse and track human behaviour using audio-visual monitoring, targeting typically developing and autistic spectrum children.

InkSynt VR/NT — 2014–2018

Reasearch on how to design a system for incremental text-to-speech conversion.

GetHomeSafe EU — 2011–2014

Developed a system for safe information access and spoken communication while driving. KTH focused on human-like behaviour while stopping and resuming a spoken interaction.

SamSynt VR/NT — 2010–2013

Research on how to introduce interactional phenomena like backchannels into speech synthesis.

SAVIR SRA/TNG — 2010–2013

Investigated how a robot can improve visual scene understanding by engaging in spoken dialogue with a human.

IURO EU — 2010–2012

Built a robot that autonomously navigates in urban environments and asked human passers-by for route directions.

MonAMI EU — 2006–2009

Mainstreamed accessibility in consumer goods and services through advanced technologies for independent living and equal participation.

NICE EU — 2002–2005

Speech-enabled computer game where children could interact with animated 3D game characters using a mix of spoken Swedish and pointing with a gyro mouse into the 3D scene on a 3x4 m large screen.

TänkOm TeliaResearch — 2002–2003

Visitors at the Telecom Museum could interact with animated agent Pixie in spoken Swedish in an apartment of the future. Collected a large corpus of human-computer interactions daily for 2 years.

AdApt CTT — 1999–2002

Foundation for advanced multimodal spoken dialogue systems; users cooperated with an animated agent in Swedish to find and talk about apartments for sale in Stockholm.

August CTT — 1998–1999

Animated conversational agent inspired by August Strindberg, available daily at the Culture Centre in Stockholm for Stockholm’s year as European Cultural Capital.

Gulan CTT — 1996–1998

A system for teaching spoken dialogue systems technology, putting a fully functioning Swedish multimodal dialogue system into students’ hands as an instructional aid.

ONOMASTICA EU — 1993–1996

EU project on multilingual pronunciation lexicons and name recognition, resulting in a Licentiate thesis on Swedish name pronunciation for speech synthesis.

Waxholm HSFR/NUTEK — 1992–1995

Pioneering multimodal spoken dialogue system combining new dialogue management and parsing modules with TMH’s audio-visual speech synthesis and speech recognition. Swedish spoken dialogue system to get information about boat traffic in the Stockholm archipelago.

◆ Mission 04

Top Cited Papers

2018

Interactive, Collaborative Robots: Challenges and Opportunities208 citations

Kragic, D., Gustafson, J., Karaoguz, H., Jensfelt, P. & Krug, R.

IJCAI 2018
2008

Towards human-like spoken dialogue systems205 citations

Edlund, J., Gustafson, J., Heldner, M. & Hjalmarsson, A.

Speech Communication 50 (8–9), 2008
1999

The August Spoken Dialogue System160 citations

Gustafson, J., Lindberg, N. & Lundeberg, M.

Eurospeech 1999
2003

Prosodic Adaptation in Human-Computer Interaction158 citations

Bell, L., Gustafson, J. & Heldner, M.

ICPHS 2003
2019

Speech Synthesis Evaluation — State-of-the-Art Assessment and Suggestion for a Novel Research Program132 citations

Wagner, P., Beskow, J., Betz, S., Edlund, J., Gustafson, J., Henter, G.E. et al.

Speech Synthesis Workshop (SSW10), 2019
2000

AdApt — A Multimodal Conversational Dialogue System in an Apartment Domain130 citations

Gustafson, J., Bell, L., Beskow, J., Boye, J., Carlson, R., Edlund, J., Granström, B. et al.

ICSLP 2000
2008

What Makes a Good Speaker? Subject Ratings, Acoustic Measurements and Perceptual Evaluations126 citations

Strangert, E. & Gustafson, J.

Interspeech 2008
2019

The Effects of Anthropomorphism and Non-verbal Social Behaviour in Virtual Assistants92 citations

Kontogiorgos, D., Pereira, A., Andersson, O., Koivisto, M., Gonzalez Rabal, E. et al.

IVA 2019

View All Papers →

Joakim Gustafson_

Background

Specializations

Conversational Speech Synthesis

Social Robotics

AI for Good & Accessibility

Active Projects

PerCorSo

SAInt

Kitchen Companion

RAPPORT

Personalized Voices

Top Cited Papers

Teaching & Service

Teaching

Editorial Boards

Community Roles

Contact

Postal Address

Visiting Address

Get in Touch