naveen manwani (@NaveenManwani17)

2025-04-15 | ❤️ 20 | 🔁 2


🚨CVPR 2025 Paper Alert 🚨

➡️Paper Title: ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting

🌟Few pointers from the paper

🎯Vision-language models (VLMs) have excelled in multimodal tasks, but adapting them to embodied decision-making in open-world environments presents challenges.

🎯One critical issue is bridging the gap between discrete entities in low-level observations and the abstract concepts required for effective planning.

🎯A common solution is building hierarchical agents, where VLMs serve as high-level reasoners that break down tasks into executable sub-tasks, typically specified using language.

🎯However, language suffers from the inability to communicate detailed spatial information. Authors of this paper proposed visual-temporal context prompting, a novel communication protocol between VLMs and policy models.

🎯This protocol leverages object segmentation from past observations to guide policy-environment interactions.

🎯Using this approach, they trained “ROCKET-1”, a low-level policy that predicts actions based on concatenated visual observations and segmentation masks, supported by real-time object tracking from SAM-2.

🎯Their method unlocks the potential of VLMs, enabling them to tackle complex tasks that demand spatial reasoning.

🎯Experiments in Minecraft showed that their approach enables agents to achieve previously unattainable tasks, with a 76% absolute improvement in open-world interaction performance.

🏢Organization: @PKU1898 , BIGAI, @UCLA

🧙Paper Authors: Shaofei Cai, @RealZihaoWang , Kewei Lian, Zhancun Mu, @jeasinema , @liu_anji , Yitao Liang

📝 Read the Full Paper here: https://arxiv.org/abs/2410.17856

🗂️ Project Page: https://craftjarvis.github.io/ROCKET-1/

🧑‍💻 Code: https://github.com/CraftJarvis/ROCKET-1

🎥 Be sure to watch the attached Demo Video - Sound on 🔊🔊

Find this Valuable 💎 ?

♻️QT and teach your network something new

Follow me 👣, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

CVPR2025

🔗 원본 링크

미디어

video


Auto-generated - needs manual review

Tags

domain-robotics domain-ai-ml domain-vlm domain-dev-tools domain-visionos