Xingang Pan (@XingangP)

2025-12-12 | โค๏ธ 141 | ๐Ÿ” 25


Can video generative models exhibit visuospatial intelligence? ๐Ÿค”

Introducing Video4Spatial โ€” a video-only framework that tackles spatial tasks.

With just video context, our model can: ๐Ÿ” Ground objects by planning geometry-consistent paths ๐Ÿ“ธ Follow camera-pose instructions for scene navigation ๐ŸŒ Generalize to long contexts & unseen outdoor scenes

A step toward video models as visual-spatial reasoners.

Project: https://xizaoqu.github.io/video4spatial/ arXiv: https://arxiv.org/pdf/2512.03040

๐Ÿ”— ์›๋ณธ ๋งํฌ

๋ฏธ๋””์–ด

image


Tags

3D Robotics AI-ML