Shubham Tulsiani (@shubhtuls)

2026-02-05 | โค๏ธ 529 | ๐Ÿ” 66 | ๐Ÿ’ฌ 3


[1/N] Rotary Position Embeddings (RoPE) are ubiquitous across transformers that process tokens from 1D, 2D, or 3D grids e.g. language, images, or videos. Our RayRoPE formulation extends these to multi-view transformers. Paper and code: https://rayrope.github.io/ https://x.com/shubhtuls/status/2019470333594468357/video/1


๐Ÿ“„ ์›๋ฌธ ๋‚ด์šฉ

RayRoPE: Projective Ray Positional Encoding for Multi-view Attention

RayRoPE: Projective Ray Positional Encoding for Multi-view Attention


๋ฏธ๋””์–ด

๐ŸŽฌ ์˜์ƒ


Tags

AI-ML 3D-Vision