Ellis Brown (@_ellisbrown)
2025-11-07 | ❤️ 237 | 🔁 47
MLLMs are great at understanding videos, but struggle with spatial reasoning—like estimating distances or tracking objects across time.
the bottleneck? getting precise 3D spatial annotations on real videos is expensive and error-prone.
introducing SIMS-V 🤖
[1/n] https://x.com/_ellisbrown/status/1986904352506667479/video/1
🔗 원본 링크
미디어
![]()