Yunzhu Li (@YunzhuLiYZ)
2025-06-20 | ❤️ 84 | 🔁 15
We’ve been exploring 3D world models with the goal of finding the right recipe that is both: (1) structured—for sample efficiency and generalization (my personal emphasis), and (2) scalable—as we increase real-world data collection.
With Particle-Grid Neural Dynamics (accepted to RSS2025), we’re getting closer.
In this work, we build a purely neural 3D digital twin—capturing appearance, geometry, and dynamics—directly from real-world interactions. Inspired by the Material Point Method (MPM), we use a hybrid particle + grid representation and outperform both: (1) learning-based (GNN), and (2) physics-based (MPM) approaches.
It handles a wide range of deformables—stuffed animals, ropes, cloth, and more.
Next step? Scaling to large-scale action-conditioned robot interaction data.
Why build 3D world models? Because what we need to train large-scale imitation learning policies (VLAs) is: 👉 action-conditioned robot interaction data. And what 3D world models also need? 👉 the exact same thing.
It would be a waste if all that data were used only for policy learning, when it’s also rich with dynamics knowledge. Building these world models enables powerful evaluation and data generation engines—key to scaling policy learning and VLA models.
Check out @kaiwynd’s thread for more details—and don’t forget to play with the interactive demo on @huggingface!
🔗 Related
See similar notes in domain-robotics, domain-simulation
인용 트윗
Kaifeng Zhang (@kaiwynd)
Can we learn a 3D world model that predicts object dynamics directly from videos?
Introducing Particle-Grid Neural Dynamics: a learning-based simulator for deformable objects that trains from real-world videos.
Website: https://t.co/MpDZLHOQb3 ArXiv: https://t.co/7ydVHq4T3I Code: https://t.co/j4VAL0ZmHX Demo: https://t.co/26c5qD0Umh
To appear at RSS2025
🎬 영상