Bingyi Kang (@bingyikang)
2024-12-19 | โค๏ธ 455 | ๐ 78
Want to use Depth Anything, but need metric depth rather than relative depth?
Thrilled to introduce Prompt Depth Anything, a new paradigm for accurate metric depth estimation with up to 4K resolution.
๐Key Message: Depth foundation models like DA have already internalized rich geometric knowledge of the 3D world but lack a proper way to elicit it. Inspired by the success of prompting in LLMs, we propose prompting Depth Anything with metric cues to produce metric depth. This method proves to be very effective when using a low-cost lidar (e.g., iPhoneโs LiDAR), which is widely available, as prompts. We believe the prompt can generalize to other forms as long as scale information is provided.
Prompt Depth Anything offers 1โฃA series of models for iPhone lidars. 2โฃ4D reconstruction from monocular videos (captured with iPhone). 3โฃImproved generalization ability for robot manipulation, e.g. Training on cans but generalizing on glasses. 4โฃMore detailed depth annotations for the ScanNet++ dataset.
The first author is our excellent intern @HaotongLin.
Paper: https://huggingface.co/papers/2412.14015 Huggingface: https://huggingface.co/papers/2412.14015 Project Page: https://promptda.github.io/ Code: https://github.com/DepthAnything/PromptDA
๋ฏธ๋์ด
![]()
Tags
domain-reconstruction domain-llm domain-ai-ml domain-robotics