Ilir Aliu (@IlirAliu_)
2026-01-30 | โค๏ธ 212 | ๐ 30
First fully open Action Reasoning Model (ARM); can โthinkโ in 3D & turn your instructions into real-world actions:
[๐ Bookmark for later]
A model that reasons in space, time, and motion.
It breaks down your command into three steps:
Grounds the scene with depth-aware perception tokens Plans the motion through visual reasoning traces Executes low-level commands for real hardware
Think of it as chain-of-thought for physical action.
Give it an instruction like โPick up the trashโ and MolmoAct will:
- Understand the environment through depth perception
- Visually plan the sequence of moves
- Carry them outโฆ while letting you see the plan overlaid on camera frames before anything moves
Itโs steerable in real time: draw a path, change the prompt, and the trajectory updates instantly.
AAAANNNDDD: Itโs completely open: checkpoints, code, and evaluation scripts are ALL PUBLIC!
Resources: Models: https://t.co/qfG1i8KqhN Data: https://t.co/Equ1G23oZB ๐Blog: https://t.co/3r17WRthxI
MolmoAct runs across different robot types (from gripper arms to humanoids) and adapts quickly to new tasks.
It outperforms models from major labs like NVIDIA, Google, and Microsoft on benchmark tests for generalization and real-world success rates.
For anyone building robotics systems or studying AI-driven action models, this is worth exploringโฆ and worth sharing! โป๏ธ
๋ฏธ๋์ด
![]()