Situated Agentic Intelligence
Humanoid robots that understand context, learn from interaction, and provide proactive verbal and physical support - bridging the care gap for older adults and people with special needs.
Enabling older adults and people with disabilities to live safely at home with 24/7 robotic support.
Robots that infer needs and offer help at the right moment, not just responding to commands.
Rigorous user studies with real participants, guided by comprehensive ethical protocols.
Sweden's population aged 80 and over will grow by 50% in the next decade. Professional caregivers are in decline. SAInt addresses this structural challenge with agentic robotics.
The convergence of physically capable humanoid robots - such as 1X Neo, Figure A03, Tesla Optimus, BostonDynamics Atlas, Rainbow HRN-Y1 and Unitree H1 - with large language and vision models creates the first genuine technical foundation for domestic robotic assistance.
SAInt (Situated Agentic Interaction) is a five year project funded by Promobilia. It moves beyond conventional assistive robotics, which rely on explicit command structures, toward agentic partners capable of contextual understanding, goal inference through natural communication, and proactive collaboration integrated into daily activities.
The SAInt robot is designed not to replace human care, but to augment it, providing a continuous safety net between scheduled visits and empowering individuals to live on their own terms.
Continuous support between care visits, ensuring safety and independence at home around the clock.
Understanding the user's physical situation, routine deviations, and cognitive state through multimodal perception.
Dialogue through speech, gesture, and expression — no specialist commands or technical knowledge required.
All research follows rigorous ethical protocols with GDPR-compliant data handling and adaptive safety constraints.
SAInt integrates three interdependent work packages, each led by a specialist PI, to build a complete agentic robotic system operating in real domestic environments.
Lead: Prof. Joakim Gustafson, TMH
This work advances beyond reactive systems toward architectures that understand user goals, task states, and cognitive conditions, determining when and how to provide verbal and physical assistance. Swedish robot voices with the prosody and style controls needed to engage in both chatty and instructional interaction with elderly users.
Grounded in over 30 years of expertise in multimodal spoken dialogue systems and human-robot interaction.
Lead: Prof. Danica Kragic, RPL
Multimodal representetion learning for real-time human action and environment modeling in complex domestic settings. Multi-sensor fusion creates accurate 3D scene representations; lightweight neural fields enable learning from small, unstructured datasets and bridge simulation to reality.
Builds on deep expertise in computer vision, dual-arm manipulation, and data-driven grasp synthesis.
Lead: Asst. Prof. Noémie Jaquier, RPL
Motor intelligence enabling safe, compliant, and dexterous physical actions in direct collaboration with users. Integrates machine learning and control theory: multi-task learning, combining robust sim-to-real transfer techniques with adaptive autonomy controllers that dynamically adjust control authority.
Grounded in Riemannian geometry and physics-based inductive bias for data-efficient robot learning.
Structured in increasing complexity each scenario builds on the previous, forming an iterative research pathway from passive observation to full collaborative autonomy.
The robot passively observes a human performing a task to build a task model, using teleoperation to position cameras on both the human and the environment.
The robot actively learns by asking the human to repeat activities or provide explanations, building its understanding of task-fulfilling actions.
The robot follows a predefined plan, guiding the human with dialogue and gestures, waiting for the user to verbally inform completion before proceeding to the next step.
The robot gives instructions and dynamically adapts its execution plan based on continuous interaction, detecting the human's actions and cognitive state.
A full role-sharing scenario where human and robot negotiate who does what, it is the most autonomous and flexible mode of assistance.
Three KTH researchers whose complementary expertise are in spoken dialogue, computer vision, and geometric robot learning forms the scientific backbone of SAInt.
With over 30 years of experience, Gustafson's research spans multimodal spoken dialogue systems, human-robot interaction, generative AI, and adaptive speech synthesis. He has led 25 externally funded projects including workpackage leader in five EU projects. Technical Program Chair of Interspeech 2017; Treasurer of ISCA.
Kragic's research covers robotics, computer vision, and machine learning, with particular depth in dual-arm manipulation and data-driven grasp synthesis. She received the 2007 IEEE Robotics and Automation Society Early Academic Career Award. ERC Starting Grant (2012), Distinguished Professor Grant from the Swedish Research Council (2019). Member of the Royal Swedish Academy of Sciences and Royal Swedish Academy of Engineering Sciences.
Jaquier's research develops data-efficient and theoretically sound learning algorithms leveraging differential geometry and physics-based inductive bias, enabling robots to learn and adapt with near-human efficiency. PhD from EPFL (2020); postdoctoral experience at KIT's H²T Lab and Stanford Robotics Lab.
A one-of-a-kind shared facility for research, education, and innovation in generative AI, human-machine interaction, and robotics.
KTH IRL is a unique infrastructure jointly operated by the Department of Speech, Music and Hearing (TMH) and the Department of Robotics, Perception and Learning (RPL). It provides everything needed to conduct human-robot interaction research, from motion-capture data collection through to real-world deployment in domestic environments.
The lab houses over 25 faculty members, around 25 postdocs and close to 100 PhD students across both departments, making it one of Europe's densest concentrations of expertise in social robotics, spoken dialogue, computer vision, and robot learning. For SAInt, it provides the essential physical substrate: smart kitchens, interaction spaces, and a full fleet of humanoid robots.
SAInt will expand the existing humanoid robot fleet up to five humanoid platforms, enabling parallel development and simultaneous testing across multiple research threads.
KTH IRL hosts six dedicated research environments, each purpose-built for a distinct aspect of human-robot interaction research.
The primary development and testing environment for robotic manipulation research, housing a number of dual-arm robotic platforms for dexterous manipulation studies. Through SAInt, this space has been expanded with two humanoid robots from Rainbow and Unitree Robotics. During the project this will be expanded up to five humanoid robots - both legged and wheeled platforms from leading vendors - enabling parallel research threads across WP2 and WP3 simultaneously.
A fully-equipped, sensor-rich domestic kitchen designed for studying AI-assisted daily life. Instrumented with overhead cameras, 2 Meta Aria gaze tracking glasses, embedded displays, and smart appliances, it enables realistic research on cooking assistance, food preparation, and domestic task support. Key environment for SAInt's core scenarios - the robot can assist, instruct, or collaborate on meal preparation in a genuine home-like setting.
A professional full-body motion capture studio with 25+ Optitrack infrared tracking cameras mounted on the ceiling and walls. Additionally we have 3 head mounted Tobii gaze trackers, 2 pairs of Manus gloves, Meta Quest VR headsets as well as a multichannel speaker system and wireless sound recording equipment. In order to sync all input channels we have a Peel capture sytem and Tentacle time code devices. Used for capturing high-fidelity human movement, gesture, and dance for AI training datasets.
A unique theatre-style space with tiered red velvet seating for up to 50 people, equipped with wireless clickers for audience response studies. Purpose-built for studying group dynamics, audience attention, and social perception in larger gatherings. Enables research into how robots and AI systems should behave in front of groups, panels, and public audiences.
Houses a fleet of custom-built aerial robots and ground vehicles including multiple quadrotor UAV designs, Boston Dynamics Spot Robot, and ground-based autonomous systems. Equipped with onboard compute, custom sensor arrays, and dedicated workshop space for hardware development. Supports research in autonomous navigation, aerial perception, and multi-robot coordination.
Home of the CloudGripper (cloudgripper.org) is a remotely-operated, transparent gripper enclosure that enables cloud-based robot teleoperation and large-scale data collection over the internet. Researchers worldwide can operate the hardware and collect manipulation data without physical presence, dramatically scaling up dataset generation for robot learning research.
TMH investigates how humans use voice, music, and sound to communicate and interact. Faculty expertise covers multimodal dialogue systems, generative models for speech and gesture, social robotics, expressive speech synthesis, and face-to-face interaction modelling. TMH leads the interaction and dialogue research in SAInt.
Visit TMHRPL bridges machine learning, computer vision, and robotics to create autonomous systems that perceive and act in the physical world. Faculty expertise includes dexterous manipulation, geometric robot learning, human-robot interaction, formal methods, and autonomous navigation. RPL leads perception and control research in SAInt.
Visit RPLSAInt welcomes collaborations with care facilities, patient advocacy organizations, industrial partners, and international research groups. We are also recruiting PhD students and postdocs.
We are currently hiring 3 PhD students and 2 Postdocs, links to admission pages will be posted here.