Ayush Jain (@ayushjain1144)

2025-04-23 | โค๏ธ 136 | ๐Ÿ” 28


1/ Despite having access to rich 3D inputs, embodied agents still rely on 2D VLMsโ€”due to the lack of large-scale 3D data and pre-trained 3D encoders.

We introduce UniVLG, a unified 2D-3D VLM that leverages 2D scale to improve 3D scene understanding. https://univlg.github.io/ https://x.com/ayushjain1144/status/1915087087486828854/video/1

๐Ÿ”— ์›๋ณธ ๋งํฌ

๋ฏธ๋””์–ด

video


์š”์•ฝ

๋Œ€๊ทœ๋ชจ 3D ๋ฐ์ดํ„ฐยท์‚ฌ์ „ํ•™์Šต 3D ์ธ์ฝ”๋” ๋ถ€์กฑ์œผ๋กœ 2D VLM์— ์˜์กดํ•˜๋˜ ํ•œ๊ณ„๋ฅผ ๊ฒจ๋ƒฅํ•ด, 2D์™€ 3D๋ฅผ ํ†ตํ•ฉํ•œ VLM(UniVLG)์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. 2D ๋ฐ์ดํ„ฐ์˜ ์Šค์ผ€์ผ์„ ํ™œ์šฉํ•ด 3D ์žฅ๋ฉด ์ดํ•ด ์„ฑ๋Šฅ์„ ๋†’์ด๋Š” ๊ฒƒ์ด ํ•ต์‹ฌ์ž…๋‹ˆ๋‹ค.

Auto-generated - needs manual review

Tags

domain-vision-3d domain-robotics domain-vlm domain-dev-tools domain-visionos