Embodied AI Reading Notes (@EmbodiedAIRead)

2025-10-29 | โค๏ธ 139 | ๐Ÿ” 12


Ctrl-World: A Controllable Generative World Model for Robot Manipulation

Project:ย https://ctrl-world.github.io/ Paper:ย https://arxiv.org/abs/2510.10125 Code:ย https://github.com/Robert-gyj/Ctrl-World

This work introduces a controllable multiview world model for robot manipulation, enabling consistent policy-in-the-loop long-horizon interactions (20 seconds) within modelโ€™s imagination, which can be used to evaluate and improve instruction following of modern generalist robot policies.

  • Ctrl-World world model initializes from a pretrained video diffusion backbone with spatial-temporal transformers, and introduces three key adaptations to become a policy-compatible interactive simulator: (1) Multi-view joint predictions including wrist cameras (2) Pose-conditioned memory retrieval mechanism (3) Frame-level action conditioning.

  • Policy Evaluation use case: authors show imagination-based evaluation with Ctrl-World faithfully reflects policiesโ€™ real-world instruction-following ability, and the model can sustain coherent rollouts for over 20 seconds in novel scenes beyond its DROID training dataset.

  • Policy Improvement use case: authors demonstrate that by collecting synthetic successful rollouts from novel instructions inside the world model, they can perform supervised fine-tuning to improve the policy performance.

๐Ÿ”— ์›๋ณธ ๋งํฌ

๋ฏธ๋””์–ด

image


Tags

3D GenAI Robotics