naveen manwani (@NaveenManwani17)

2024-08-18 | ❤️ 50 | 🔁 11

🚨ECCV 2024 Paper Alert 🚨

➡️Paper Title: Unifying 3D Vision-Language Understanding via Promptable Queries

🌟Few pointers from the paper

🎯A unified model for 3D vision-language (3D-VL) understanding is expected to take various scene representations and perform a wide range of tasks in a 3D scene. However, a considerable gap exists between existing methods and such a unified model, due to the independent application of representation and insufficient exploration of 3D multi-task training.

🎯In this paper, Authors have introduced “PQ3D”, a unified model capable of using Promptable Queries to tackle a wide range of 3D-VL tasks, from low-level instance segmentation to high-level reasoning and planning.

🎯This is achieved through three key innovations: ⚓Unifying various 3D scene representations (i.e., voxels, point clouds, multi-view images) into a shared 3D coordinate space by segment-level grouping. ⚓ An attention-based query decoder for task-specific information retrieval guided by prompts. ⚓ Universal output heads for different tasks to support multi-task training.

🎯Tested across ten diverse 3D-VL datasets, PQ3D demonstrates impressive performance on these tasks, setting new records on most benchmarks. Particularly, PQ3D improves the state-of-the-art on ScanNet200 by 4.9% (AP25), ScanRefer by 5.4% (acc@0.5), Multi3DRefer by 11.7% (F1@0.5), and Scan2Cap by 13.4% (CIDEr@0.5).

🎯Moreover, PQ3D supports flexible inference with individual or combined forms of available 3D representations, e.g., solely voxel input.

🏢Organization: @Tsinghua_Uni , Beijing Institute for General Artificial Intelligence (BIGAI)

🧙Paper Authors: Ziyu Zhu, Zhuofan Zhang, @jeasinema , Xuesong Niu, @_yixinchen , @BaoxiongJ , Zhidong Deng, @siyuanhuang95 , @Sealiqing

1️⃣Read the Full Paper here: https://arxiv.org/abs/2405.11442

2️⃣Project Page: https://pq3d.github.io/

3️⃣Code: https://github.com/PQ3D/PQ3D

4️⃣Demo:https://huggingface.co/spaces/li-qing/PQ3D-Demo

🎥 Be sure to watch the attached Technical Video -Sound on 🔊🔊

🎵 Music by Pavel Bekirov from @pixabay

Find this Valuable 💎 ?

♻️QT and teach your network something new

Follow me 👣, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

ECCV2024

미디어

video

📚 세현's Vault

🌍 도메인

📄 Papers

eccv-2024-paper-alert-paper-title-unifying-3d-vision-language-understanding-via

naveen manwani (@NaveenManwani17)

미디어

Tags

그래프 뷰

목차

백링크