MrNeRF (@janusch_patas)

2025-03-16 | โค๏ธ 404 | ๐Ÿ” 64


VGGT: Visual Geometry Grounded Transformer

TL;DR: Is DUSt3R facing a formidable new rival?

Contributions: (1) We introduce VGGT, a large feed-forward transformer that can, given one, a few, or even hundreds of images of a scene, predict all its key 3D attributes - including camera intrinsics and extrinsics, point maps, depth maps, and 3D point tracks - in seconds.

(2) We demonstrate that VGGTโ€™s predictions are directly usable, being highly competitive and usually better than those of state-of-the-art methods that use slow post-processing optimization techniques.

(3) We also show that when further combined with BA post-processing, VGGT achieves state-of-the-art results across the board, even when compared to methods that specialize in a subset of 3D tasks, often improving quality substantially.

๋ฏธ๋””์–ด

video


Auto-generated - needs manual review

Tags

domain-vision-3d domain-llm domain-visionos