SAInt – Situated Agentic Intelligence

About the Project

Bridging the
Care Gap

Sweden's population aged 80 and over will grow by 50% in the next decade. Professional caregivers are in decline. SAInt addresses this structural challenge with agentic robotics.

The convergence of physically capable humanoid robots - such as 1X Neo, Figure A03, Tesla Optimus, BostonDynamics Atlas, Rainbow HRN-Y1 and Unitree H1 - with large language and vision models creates the first genuine technical foundation for domestic robotic assistance.

SAInt (Situated Agentic Interaction) is a five year project funded by Promobilia. It moves beyond conventional assistive robotics, which rely on explicit command structures, toward agentic partners capable of contextual understanding, goal inference through natural communication, and proactive collaboration integrated into daily activities.

The SAInt robot is designed not to replace human care, but to augment it, providing a continuous safety net between scheduled visits and empowering individuals to live on their own terms.

24/7 Presence

Continuous support between care visits, ensuring safety and independence at home around the clock.

Context Awareness

Understanding the user's physical situation, routine deviations, and cognitive state through multimodal perception.

Natural Interaction

Dialogue through speech, gesture, and expression — no specialist commands or technical knowledge required.

Safe & Ethical

All research follows rigorous ethical protocols with GDPR-compliant data handling and adaptive safety constraints.

Research

Three Scientific Pillars

SAInt integrates three interdependent work packages, each led by a specialist PI, to build a complete agentic robotic system operating in real domestic environments.

WP1

Proactive Situation-Aware Interaction

Lead: Prof. Joakim Gustafson, TMH

This work advances beyond reactive systems toward architectures that understand user goals, task states, and cognitive conditions, determining when and how to provide verbal and physical assistance. Swedish robot voices with the prosody and style controls needed to engage in both chatty and instructional interaction with elderly users.

Grounded in over 30 years of expertise in multimodal spoken dialogue systems and human-robot interaction.

WP2

Real-time Human Action & Environment Modeling

Lead: Prof. Danica Kragic, RPL

Multimodal representetion learning for real-time human action and environment modeling in complex domestic settings. Multi-sensor fusion creates accurate 3D scene representations; lightweight neural fields enable learning from small, unstructured datasets and bridge simulation to reality.

Builds on deep expertise in computer vision, dual-arm manipulation, and data-driven grasp synthesis.

WP3

Safe Physical
Robot Assistance

Lead: Asst. Prof. Noémie Jaquier, RPL

Motor intelligence enabling safe, compliant, and dexterous physical actions in direct collaboration with users. Integrates machine learning and control theory: multi-task learning, combining robust sim-to-real transfer techniques with adaptive autonomy controllers that dynamically adjust control authority.

Grounded in Riemannian geometry and physics-based inductive bias for data-efficient robot learning.

Five Interaction Scenarios

Structured in increasing complexity each scenario builds on the previous, forming an iterative research pathway from passive observation to full collaborative autonomy.

Observer

The robot passively observes a human performing a task to build a task model, using teleoperation to position cameras on both the human and the environment.

Apprentice

The robot actively learns by asking the human to repeat activities or provide explanations, building its understanding of task-fulfilling actions.

Instructor

The robot follows a predefined plan, guiding the human with dialogue and gestures, waiting for the user to verbally inform completion before proceeding to the next step.

Teacher

The robot gives instructions and dynamically adapts its execution plan based on continuous interaction, detecting the human's actions and cognitive state.

Collaborator

A full role-sharing scenario where human and robot negotiate who does what, it is the most autonomous and flexible mode of assistance.

Increasing complexity

Full autonomy

Principal Investigators

The Team

Three KTH researchers whose complementary expertise are in spoken dialogue, computer vision, and geometric robot learning forms the scientific backbone of SAInt.

TMH

Joakim Gustafson

Professor, Head of Department of Speech, Music and Hearing

WP1 Lead Multimodal Interaction

With over 30 years of experience, Gustafson's research spans multimodal spoken dialogue systems, human-robot interaction, generative AI, and adaptive speech synthesis. He has led 25 externally funded projects including workpackage leader in five EU projects. Technical Program Chair of Interspeech 2017; Treasurer of ISCA.

RPL

Danica Kragic

Professor, Director of Centre for Autonomous Systems

WP2 Lead Perception & Planning

Kragic's research covers robotics, computer vision, and machine learning, with particular depth in dual-arm manipulation and data-driven grasp synthesis. She received the 2007 IEEE Robotics and Automation Society Early Academic Career Award. ERC Starting Grant (2012), Distinguished Professor Grant from the Swedish Research Council (2019). Member of the Royal Swedish Academy of Sciences and Royal Swedish Academy of Engineering Sciences.

RPL

Noémie Jaquier

Assistant Professor, Head of GeoRob Lab, Department of Robotics, Perception and Learning

WP3 Lead Adaptive Control

Jaquier's research develops data-efficient and theoretically sound learning algorithms leveraging differential geometry and physics-based inductive bias, enabling robots to learn and adapt with near-human efficiency. PhD from EPFL (2020); postdoctoral experience at KIT's H²T Lab and Stanford Robotics Lab.

Infrastructure

KTH Interaction and Robotics Lab

A one-of-a-kind shared facility for research, education, and innovation in generative AI, human-machine interaction, and robotics.

Built for End-to-End Human-Robot Research

KTH IRL is a unique infrastructure jointly operated by the Department of Speech, Music and Hearing (TMH) and the Department of Robotics, Perception and Learning (RPL). It provides everything needed to conduct human-robot interaction research, from motion-capture data collection through to real-world deployment in domestic environments.

The lab houses over 25 faculty members, around 25 postdocs and close to 100 PhD students across both departments, making it one of Europe's densest concentrations of expertise in social robotics, spoken dialogue, computer vision, and robot learning. For SAInt, it provides the essential physical substrate: smart kitchens, interaction spaces, and a full fleet of humanoid robots.

SAInt will expand the existing humanoid robot fleet up to five humanoid platforms, enabling parallel development and simultaneous testing across multiple research threads.

BostonDynamics robot dog infront of the KTH IRL smart kitchen

Floor Plan KTH IRL, Lindstedtsvägen 24, KTH Campus, Stockholm

Lab Spaces

KTH IRL hosts six dedicated research environments, each purpose-built for a distinct aspect of human-robot interaction research.

Humanoid Robot Lab

The primary development and testing environment for robotic manipulation research, housing a number of dual-arm robotic platforms for dexterous manipulation studies. Through SAInt, this space has been expanded with two humanoid robots from Rainbow and Unitree Robotics. During the project this will be expanded up to five humanoid robots - both legged and wheeled platforms from leading vendors - enabling parallel research threads across WP2 and WP3 simultaneously.

Intelligence Augmentation Lab

A fully-equipped, sensor-rich domestic kitchen designed for studying AI-assisted daily life. Instrumented with overhead cameras, 2 Meta Aria gaze tracking glasses, embedded displays, and smart appliances, it enables realistic research on cooking assistance, food preparation, and domestic task support. Key environment for SAInt's core scenarios - the robot can assist, instruct, or collaborate on meal preparation in a genuine home-like setting.

Performance and Multimodal Interaction Lab — motion capture studio

Performance & Multimodal Interaction Lab

A professional full-body motion capture studio with 25+ Optitrack infrared tracking cameras mounted on the ceiling and walls. Additionally we have 3 head mounted Tobii gaze trackers, 2 pairs of Manus gloves, Meta Quest VR headsets as well as a multichannel speaker system and wireless sound recording equipment. In order to sync all input channels we have a Peel capture sytem and Tentacle time code devices. Used for capturing high-fidelity human movement, gesture, and dance for AI training datasets.

Group Perception Lab

A unique theatre-style space with tiered red velvet seating for up to 50 people, equipped with wireless clickers for audience response studies. Purpose-built for studying group dynamics, audience attention, and social perception in larger gatherings. Enables research into how robots and AI systems should behave in front of groups, panels, and public audiences.

Mobile Robotics Lab

Houses a fleet of custom-built aerial robots and ground vehicles including multiple quadrotor UAV designs, Boston Dynamics Spot Robot, and ground-based autonomous systems. Equipped with onboard compute, custom sensor arrays, and dedicated workshop space for hardware development. Supports research in autonomous navigation, aerial perception, and multi-robot coordination.

Cloud Robotics Lab

Home of the CloudGripper (cloudgripper.org) is a remotely-operated, transparent gripper enclosure that enables cloud-based robot teleoperation and large-scale data collection over the internet. Researchers worldwide can operate the hardware and collect manipulation data without physical presence, dramatically scaling up dataset generation for robot learning research.

KTH EECS School

Department of Speech, Music and Hearing (TMH)

TMH investigates how humans use voice, music, and sound to communicate and interact. Faculty expertise covers multimodal dialogue systems, generative models for speech and gesture, social robotics, expressive speech synthesis, and face-to-face interaction modelling. TMH leads the interaction and dialogue research in SAInt.

Visit TMH

KTH EECS School

Department of Robotics, Perception & Learning (RPL)

RPL bridges machine learning, computer vision, and robotics to create autonomous systems that perceive and act in the physical world. Faculty expertise includes dexterous manipulation, geometric robot learning, human-robot interaction, formal methods, and autonomous navigation. RPL leads perception and control research in SAInt.

Visit RPL

SAInt
Robotic
Partners
for Life

Independent Living

Proactive Collaboration

Human-Centred Science

Bridging the
Care Gap

24/7 Presence

Context Awareness

Natural Interaction

Safe & Ethical

Three Scientific Pillars

Proactive Situation-Aware Interaction

Real-time Human Action & Environment Modeling

Safe Physical
Robot Assistance

Five Interaction Scenarios

The Team

KTH Interaction and Robotics Lab

Built for End-to-End Human-Robot Research

Floor Plan KTH IRL, Lindstedtsvägen 24, KTH Campus, Stockholm

Lab Spaces

Get Involved

SAInt Robotic Partners for Life

Independent Living

Proactive Collaboration

Human-Centred Science

Bridging theCare Gap

24/7 Presence

Context Awareness

Natural Interaction

Safe & Ethical

Three Scientific Pillars

Proactive Situation-Aware Interaction

Real-time Human Action & Environment Modeling

Safe Physical Robot Assistance

Five Interaction Scenarios

The Team

KTH Interaction and Robotics Lab

Built for End-to-End Human-Robot Research

Floor Plan KTH IRL, Lindstedtsvägen 24, KTH Campus, Stockholm

Lab Spaces

Get Involved

SAInt
Robotic
Partners
for Life

Bridging the
Care Gap

Safe Physical
Robot Assistance