Can Textual Reasoning Improve the Performance of MLLMs on Fine-grained Visual Classification? Paper • 2601.06993 • Published 12 days ago • 2
Towards Scalable Pre-training of Visual Tokenizers for Generation Paper • 2512.13687 • Published Dec 15, 2025 • 102
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation Paper • 2506.03147 • Published Jun 3, 2025 • 58