Xiaolong Wang (@xiaolonw)
2023-12-16 | โค๏ธ 29 | ๐ 6
Can LLM understand and reason explicit object coordinates?
Introducing PixelLLM, which can take object coordinates as inputs and outputs. This allows LLM to directly perform all detection/segmentation tasks with dense descriptions.
์ธ์ฉ ํธ์
Jiarui Xu (@Jerry_XU_Jiarui): GPT4-V can describe the location via text, but canโt accurately output the coordinate of each word.
Introducing: Pixel Aligned Language Models. It generates image captions along with the aligned pixel coordinates of the image. https://arxiv.org/abs/2312.09237
(1/n https://x.com/Jerry_XU_Jiarui/status/1735881901926498310/video/1