Blog
6 hours ago
Beyond ReconVLA: Annotation-Free Visual Grounding via Language-Attention Masked Reconstruction
A recent paper called ReconVLA attempted to solve this. I spent a significant stretch of time reading it carefully, stress-testing its assumptions, and thinking about what it would mean to implement and extend it. What I found impressed me in some ways and genuinely troubled me in others.
Source: HackerNoon →