naveen manwani (@NaveenManwani17)
2025-04-15 | ❤️ 20 | 🔁 2
🚨CVPR 2025 Paper Alert 🚨
➡️Paper Title: ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting
🌟Few pointers from the paper
🎯Vision-language models (VLMs) have excelled in multimodal tasks, but adapting them to embodied decision-making in open-world environments presents challenges.
🎯One critical issue is bridging the gap between discrete entities in low-level observations and the abstract concepts required for effective planning.
🎯A common solution is building hierarchical agents, where VLMs serve as high-level reasoners that break down tasks into executable sub-tasks, typically specified using language.
🎯However, language suffers from the inability to communicate detailed spatial information. Authors of this paper proposed visual-temporal context prompting, a novel communication protocol between VLMs and policy models.
🎯This protocol leverages object segmentation from past observations to guide policy-environment interactions.
🎯Using this approach, they trained “ROCKET-1”, a low-level policy that predicts actions based on concatenated visual observations and segmentation masks, supported by real-time object tracking from SAM-2.
🎯Their method unlocks the potential of VLMs, enabling them to tackle complex tasks that demand spatial reasoning.
🎯Experiments in Minecraft showed that their approach enables agents to achieve previously unattainable tasks, with a 76% absolute improvement in open-world interaction performance.
🏢Organization: @PKU1898 , BIGAI, @UCLA
🧙Paper Authors: Shaofei Cai, @RealZihaoWang , Kewei Lian, Zhancun Mu, @jeasinema , @liu_anji , Yitao Liang
📝 Read the Full Paper here: https://arxiv.org/abs/2410.17856
🗂️ Project Page: https://craftjarvis.github.io/ROCKET-1/
🧑💻 Code: https://github.com/CraftJarvis/ROCKET-1
🎥 Be sure to watch the attached Demo Video - Sound on 🔊🔊
Find this Valuable 💎 ?
♻️QT and teach your network something new
Follow me 👣, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.
🔗 원본 링크
- https://arxiv.org/abs/2410.17856
- https://craftjarvis.github.io/ROCKET-1/
- https://github.com/CraftJarvis/ROCKET-1
미디어
![]()
🔗 Related
Auto-generated - needs manual review
Tags
domain-robotics domain-ai-ml domain-vlm domain-dev-tools domain-visionos