Robot Helper

About this research line

We research learning and sensing for robot helpers capable of manipulating challenging objects, from clothing to glassware, in real-world homes and workspaces. Our end goal is a robot helper for your home.

Current robots handle rigid, predictable objects in structured environments. Homes and workspaces are the opposite: varied objects, changing conditions, human activity everywhere. We develop methods that enable collaborative robots to learn manipulation tasks autonomously and data-efficiently by leveraging human demonstrations, differentiable physics, and multi-modal sensing. Our target objects are the ones that make robotics genuinely hard: deformable clothing, transparent glassware, fragile biocomposites, and reflective surfaces.

From structured to unstructured environments

The path from industrial robot to household helper is a path of generalisation. We frame this along three axes: task diversity (from one repetitive motion to heterogeneous tasks), object diversity (from a single known object to varied unknown instances), and environment diversity (from a controlled cell to a dynamic, human-occupied space). Each of our demo platforms pushes the boundary along one or more of these axes.

How we learn

We explore multiple learning paradigms and deliberately push the boundaries of generalisation. Rather than optimising a single technique for one task on one object, we develop methods that transfer across object categories, task types, and environments. A robot trained on towels should also handle T-shirts it has never seen. A pouring skill learned in bright light should work by candlelight. A grasping strategy for rigid bottles should adapt to fragile biological samples. This drive towards broad generalisation shapes every methodological choice we make.

Human demonstrations are our primary source of prior knowledge: show the robot a task once, such as picking a pen from a cup, wiping a whiteboard, or slicing food, and it extracts the essential motion and force profile. For example, in a citizen science project at De Krook in Ghent, 800 participants taught our robot to fold laundry, generating 8.5 hours of diverse folding demonstrations across varied textiles, lighting, and folding styles that fuel our learning algorithms.

On the technical side, we match the learning paradigm to the task’s structure. Differentiable simulation enables gradient-based optimisation of motion primitives, validated by our competition-winning cloth folding system. Reinforcement learning with tactile reward allowed a dual-arm robot to learn cloth folding from scratch in the real world within 60 episodes (approximately 8 hours) on a single CPU core. We co-optimise robot body and brain, and transfer policies from simulation to reality using domain randomisation and body randomisation to close the sim-to-real gap. Imitation learning, classic control, and differentiable programming each play a role depending on the task, the object, and the available data.

Sensing for manipulation

Humans feel how hard to grip a glass without thinking about it. Robots currently lack this tactile feedback, which limits their ability to handle fragile, deformable, or irregular objects. We develop five sensing modalities in-house (piezoresistive, capacitive, magnetic, infrared, and laser interferometry) and integrate them into custom fingertips and grippers. Our Halberd ecosystem enables plug-and-play integration of tactile sensors on Robotiq grippers: no external power or data cables, Arduino-compatible, and fully open-source.

Our current ambition is the next generation of multimodal fingertips that embed as many sensor modalities as possible into the smallest possible form factor, with low-latency and high-bandwidth communication. We also develop open-source sensing tools, including a magnetic tactile sensor with automatic calibration and a smart textile with a piezoresistive sensor grid that provides a reward signal for reinforcement learning. Our sensing hardware designs, along with the airo-mono software stack that underpins all our manipulation research, are freely available on our open-source page.

What our robots can do

We validate our methods on real-world tasks that stress-test perception, manipulation, and generalisation.

Cloth manipulation. We won the international Cloth Manipulation Competition at IROS 2022 and ICRA 2023, and organised the ICRA 2024 edition in Yokohama with 11 participating teams. Our system combines infrared tactile edge-tracing (UnfoldIR), a deep learning keypoint detector trained entirely on synthetic data, and optimised dual-arm circular arc trajectories. It folds unseen garments in under two minutes.

Robot butler. Our robot butler Mobi autonomously fetches bottles from a fridge, detects glasses and estimates their volume, pours drinks to the correct level, and discards empty bottles. Trained on over 200 glasses in conditions ranging from bright sunlight to candlelight, Mobi currently achieves a failure rate of only 0.7% on previously unseen glassware. The system was featured on VRT NWS.

Lab assistant. In collaboration with biotech partners, we deploy collaborative robots for sample handling in wet lab environments: centrifuge loading, pipetting, and transfers of living, non-uniform, fragile biological samples. This is a hard stress test for physical AI, where every error risks contamination or sample destruction.

The code, datasets, and hardware designs behind these demos are publicly available. Our open-source page provides access to our manipulation software stack, cloth folding datasets, tactile sensor designs, and more.

A realistic perspective

The humanoid robot that does all your household chores is a moonshot, not a near-term product. Humanoid form factors are expensive, mechanically complex, and largely redundant for most tasks. The robotics field lacks the volume of manipulation data that language models enjoy for text, and physical errors have physical consequences: a misgripped glass shatters, a dropped sample is lost. What is realistic today: task-specific collaborative robots in workspaces, handling well-defined object classes. Over the next decade, we work towards broader generalisation, building the perception, manipulation, and learning capabilities for more tasks, more objects, and more environments, one step at a time. We also explained this perspective in layman’s terms in De Morgen.

Active researchers

Related publications

SPILL : size, pose, and internal liquid level estimation of transparent glassware for robotic bartending

Louis Adriaens, Thomas Lips, Mathieu De Coster, Andreas Verleysen, Francis wyffels

In IEEE ROBOTICS AND AUTOMATION LETTERS 2025

BIBLIO

Abstract

Robotic perception of transparent objects presents unique challenges due to their refractive properties, lack of texture, and limitations of conventional RGB-D sensors in capturing reliable depth information. These challenges significantly hinder robotic manipulation capabilities in real-world settings such as household assistance, hospitality, and healthcare. To address these issues, we propose SPILL: A lightweight perception pipeline for Size, Pose, and Internal Liquid Level estimation of unknown transparent glassware using a single view. SPILL combines object detection with semantic keypoint detection, and operates without requiring object-specific 3D models or depth completion. We demonstrate its effectiveness in autonomous robotic pouring tasks. Additionally, to enhance the robustness and generalization of keypoint detection to diverse real-world scenarios, we introduce Glasses-in-the-Wild, a new dataset that captures a wide variety of glass types in realistic environments. Evaluated on a robot manipulator, SPILL achieves a 93.6% success rate across 500 autonomous pours with 20 unseen glasses in three diverse real-world scenes. We further demonstrate robustness through multiple live public events in real-world, human-centered environments. In one recorded session, the robot autonomously served 62 drinks with a 98.3% success rate. These results demonstrate that task-relevant keypoint detection enables scalable, real-world transparent object interaction, paving the way for practical applications in service and assistive robotics - without spilling a drop.

Learning self-supervised task progression metrics : a case of cloth folding

Andreas Verleysen, Matthijs Biondina, Francis wyffels

In APPLIED INTELLIGENCE 2023

BIBLIO

Abstract

An important challenge for smart manufacturing systems is finding relevant metrics that capture task quality and progression for process monitoring to ensure process reliability and safety. Data-driven process metrics construct features and labels from abundant raw process data, which incurs costs and inaccuracies due to the labelling process. In this work, we circumvent expensive process data labelling by distilling the task intent from video demonstrations. We present a method to express the task intent in the form of a scalar value by aligning a self-supervised learned embedding to a small set of high-quality task demonstrations. We evaluate our method on the challenging case of monitoring the progress of people folding clothing. We demonstrate that our approach effectively learns to represent task progression without manually labelling sub-steps or progress in the videos. Using case-based experiments, we find that our method learns task-relevant features and useful invariances, making it robust to noise, distractors and variations in the task and shirts. The experimental results show that the proposed method can monitor processes in domains where state representation is inherently challenging.

Learning keypoints from synthetic data for robotic cloth folding

Thomas Lips, Victor-Louis De Gusseme, Francis wyffels

In ICRA 2022 Workshop on Representing and Manipulating Deformable Objects 2022

BIBLIO

Abstract

Robotic cloth manipulation is challenging due to its deformability, which makes determining its full state infeasible. However, for cloth folding, it suffices to know the position of a few semantic keypoints. Convolutional neural networks (CNN) can be used to detect these keypoints, but require large amounts of annotated data, which is expensive to collect. To overcome this, we propose to learn these keypoint detectors purely from synthetic data, enabling low-cost data collection. In this paper, we procedurally generate images of towels and use them to train a CNN. We evaluate the performance of this detector for folding towels on a unimanual robot setup and find that the grasp and fold success rates are 77% and 53%, respectively. We conclude that learning keypoint detectors from synthetic data for cloth folding and related tasks is a promising research direction, discuss some failures and relate them to future work. A video of the system, as well as the codebase, more details on the CNN architecture and the training setup can be found at https://github.com/tlpss/workshop-icra-2022-cloth-keypoints.git.

Related open-source