HumanoidExo: Scalable Whole-Body Humanoid Manipulation via Wearable Exoskeleton

Abstract

A significant bottleneck in humanoid policy learning is the acquisition of large-scale, diverse datasets, as collecting reliable real-world data remains both difficult and cost-prohibitive. To address this limitation, we introduce HumanoidExo, a novel system that transfers human motion to whole-body humanoid data. HumanoidExo offers a high-efficiency solution that minimizes the embodiment gap between the human demonstrator and the robot, thereby tackling the scarcity of whole-body humanoid data. By facilitating the collection of more voluminous and diverse datasets, our approach significantly enhances the performance of humanoid robots in dynamic, real-world scenarios. We evaluated our method across three challenging real-world tasks: table-top manipulation, manipulation integrated with stand-squat motions, and whole-body manipulation. Our results empirically demonstrate that HumanoidExo is a crucial addition to real-robot data, as it enables the humanoid policy to generalize to novel environments, learn complex whole-body control from only five real-robot demonstrations, and even acquire new skills (i.e., walking) solely from HumanoidExo data.

Video

HumaniodExo System

Hardware overview

We integrated a Mid-360 LiDAR for acquiring exoskeleton motion odometry. For visual information acquisition, we added two wrist cameras to capture new operational perspectives and enrich environmental perception. These cameras, installed on the Dexmo force-feedback gloves, were mounted at angles identical to those of the robot's cameras.

HumanoidExo is specifically designed to read all seven joints of the human arm. The rotational axes of its exoskeleton arm are precisely aligned with the corresponding axes of the human joints, making the exoskeleton isomorphic to the human arm. Since the HumanoidExo system adopts a joint space control method with angle remapping, we redesigned the exoskeleton's key parameters to match the arm length of the Unitree G1 robot.

HE-VLA: A Whole-Body Humanoid Policy Learning Method

Our approach, namely HumanoidExo-VLA (HE-VLA in short), consists of two key components: a pre-trained Vision- Language-Action (VLA) model that learns foundational whole-body motion control, and a reinforcement learning method that ensures robust whole-body balance. To tackle the challenge of manipulating complex humanoid skills, we leverage DexVLA, a pre-trained vision-language-action model, for the tasks described in our experiments.

Relying solely on imitation learning to directly output joint positions for whole-body control introduces significant stability risks. Minor deviations from the learned trajectories can result in falls, posing a threat of catastrophic damage to the robot and its environment. To overcome this limitation, we leverage reinforcement learning to train a robust whole-body loco- manipulation controller. This controller is responsible for maintaining dynamic balance while executing commands for base speed, yaw rate, and a target torso height.

Data Collection

HumanoidExo in Daily Scene

In Café

Picking up fruit

Folding clothes

Walking in the wild

Data Replay

HumanoidExo data collection & Data replay

Experiments

Task1: PlaceToy

This is a tabletop manipulation task. The robot is required to pick up a toy, whose position is randomized on its left or right side, and place it into a tray at the center. In the following videos, the HE-VLA model was trained with 5 teleoperated data + 195 HumanoidExo data.

Laboratory Scene
PlaceToy

Autonomous 2x

Unseen Environment
Unseen item

Autonomous 2x

Unseen Environment
Unseen tray & Unseen table

Autonomous 2x

Unseen Environment
Unseen item & Unseen table

Autonomous 2x

Task2: Walk & PlaceToy

We designed this experiment that builds upon the previous tabletop manipulation setup. We collected 195 new HumanoidExo demonstrations of a compound task: walking to the table, stopping, and then executing the 'Place Toy' action. The policy was then trained on a mixed dataset containing these 195 new HumanoidExo demonstrations and the same 5 teleoperated demonstrations from the previous experiment. Crucially, these five teleoperated demonstrations only contain the stationary manipulation portion of the task; they include no walking. Therefore, any walking ability exhibited by the final policy must be learned exclusively from the HumanoidExo data.

2m

Autonomous 3x

2.5m

Autonomous 3x

3.5m

Autonomous 3x

8m

Autonomous 3x

Recovery from human interference

Autonomous 3x

Task3: PlaceLaundry

We chose the 'Place Laundry' task, where the robot is required to squat, grasp clothes from a basket, and place them into a washing machine on its right. The robot repeats this process until the basket is empty, then stands up to signal task completion. This task presents several challenges: the clothes are deformable objects that are difficult for dexterous hands to grasp and place entirely inside the machine, which requires the model to exhibit recovery behaviors. Furthermore, the model must use robust visual observation while maintaining whole-body balance to avoid falling during the upper-body task execution. Following the methodology of the previous section, in the following videos, the HE-VLA model was trained with 5 teleoperated data + 195 HumanoidExo data.

PlaceLaundry in household scene

Autonomous 3x

Long-horizon sequential tasks
Autonomous failure recovery

Autonomous 3x

Task4: Walking (only on the track)

Beyond the manipulation tasks, we also sought to explore the role of HumanoidExo in robot navigation skill learning. We selected an outdoor environment for data collection, with the following defined task: the robot was required to continue walking on a blue track until it exited the track and then stop. All training data was sourced exclusively from HumanoidExo, with no teleoperated data used. In our experiment, data collected entirely by HumanoidExo was successfully able to achieve simple autonomous navigation for the robot.

Outdoor Scenes

Autonomous 2x

Task5: Folding

T-shirt

Autonomous 3x

Shorts

Autonomous 3x

Conclusion

In this work, we addressed the critical data bottleneck that hinders the development of capable, general-purpose humanoid robots. While existing methods like simulation, human videos, and direct teleoperation have advanced the field, they suffer from significant limitations in scalability, cost, and embodiment mismatch. We introduced HumanoidExo, a lightweight, wearable exoskeleton system designed to provide a practical and effective solution for scalable, whole-body data collection. Our experiments confirm that this approach is highly effective. We've shown that data from HumanoidExo enables policies to generalize to new environments, achieve remarkable data efficiency by learning complex skills from as few as five real-robot demonstrations, and even acquire entirely new skills like walking without any prior robot data. These results validate our system as a powerful paradigm for generating large-scale, high-quality humanoid datasets.

BibTeX


      @article{zhong2025humanoidexo,
        title={HumanoidExo: Scalable Whole-Body Humanoid Manipulation via Wearable Exoskeleton},
        author={Rui Zhong and Yizhe Sun and Junjie Wen and Jinming Li and Chuang Cheng and Wei Dai and Zhiwen Zeng and Huimin Lu and Yichen Zhu and Yi Xu},
        journal={arXiv preprint arXiv:2510.03022},
        year={2025}
      }

HumanoidExo

Whole-Body Humanoid Manipulation via Wearable Exoskeleton

▼ Scroll down the page to find more ▼

HumanoidExo: Scalable Whole-Body Humanoid Manipulation via Wearable Exoskeleton

Abstract

Video

HumaniodExo System

Hardware overview

HE-VLA: A Whole-Body Humanoid Policy Learning Method

Data Collection

HumanoidExo in Daily Scene

Data Replay

Experiments

Task1: PlaceToy

Laboratory Scene PlaceToy

Unseen Environment Unseen item

Unseen Environment Unseen tray & Unseen table

Unseen Environment Unseen item & Unseen table

Task2: Walk & PlaceToy

Task3: PlaceLaundry

Task4: Walking (only on the track)

Task5: Folding

Conclusion

BibTeX

Laboratory Scene
PlaceToy

Unseen Environment
Unseen item

Unseen Environment
Unseen tray & Unseen table

Unseen Environment
Unseen item & Unseen table