Chelsea Finn (@chelseabfinn)
2026-01-24 | โค๏ธ 506 | ๐ 48 | ๐ฌ 8
Video models serve as a good pretrained backbone for robot policies.
Paper: https://arxiv.org/abs/2601.16163 Code: https://github.com/nvlabs/cosmos-policy
๐ ์๋ณธ ๋งํฌ
์ธ์ฉ๋ ํธ์
@moo_jin_kim: We release Cosmos Policy ๐ซ: a state-of-the-art robot policy built on a video diffusion model backbone.
- policy + world model + value function โ in 1 model
- no architectural changes to the base videโฆ
๐ Related
- what-if-we-could-train-ai-robots-in-a-perfect-physics โ ์ฃผ์ : AI-ML, Dev-Tools, Robotics, GenAI
- do-we-really-need-an-external-world-model-standard โ ์ฃผ์ : AI-ML, Robotics, GenAI
- 16-ego-centric-world-models-we-introduce-egowm-a-video โ ์ฃผ์ : AI-ML, Robotics, GenAI
- you-could-direct-a-video-like-a-real-3d-world-researchers โ ์ฃผ์ : AI-ML, Dev-Tools, GenAI
- exciting-new-work-on-detailed-pixel-level-dense-3d-visual โ ์ฃผ์ : AI-ML, Robotics, GenAI
์ธ์ฉ ํธ์
Moo Jin Kim (@moo_jin_kim)
We release Cosmos Policy ๐ซ: a state-of-the-art robot policy built on a video diffusion model backbone.
- policy + world model + value function โ in 1 model
- no architectural changes to the base video model
- SOTA in LIBERO (98.5%), RoboCasa (67.1%), & ALOHA tasks (93.6%)
๐งต๐ https://t.co/cz9L3ziJ6x
๐ฌ ์์