Shiqi Chen (@shiqi_chen17)

2025-05-02 | โค๏ธ 295 | ๐Ÿ” 41


๐Ÿš€๐Ÿ”ฅ Thrilled to announce our ICML25 paper: โ€œWhy Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areasโ€!

We dive into the core reasons behind spatial reasoning difficulties for Vision-Language Models from an attention mechanism view. ๐ŸŒ๐Ÿ”

Paper: https://arxiv.org/pdf/2503.01773 Code: https://github.com/shiqichen17/AdaptVis Website: https://shiqichen17.github.io/AdaptVis/

๐Ÿ”— ์›๋ณธ ๋งํฌ

๋ฏธ๋””์–ด

image


Auto-generated - needs manual review

Tags

domain-ai-ml domain-vlm domain-dev-tools domain-visionos