Visual Representation Alignment for Multimodal Large Language Models Paper • 2509.07979 • Published Sep 9, 2025 • 84
VLMs Need Words: Vision Language Models Ignore Visual Detail In Favor of Semantic Anchors Paper • 2604.02486 • Published 10 days ago • 9
Watch Before You Answer: Learning from Visually Grounded Post-Training Paper • 2604.05117 • Published 6 days ago • 31