Seeing the Forest and the Trees: Query-Aware Tokenizer for Long-Video Multimodal Language Models Paper • 2511.11910 • Published Nov 14 • 35
μ^2Tokenizer: Differentiable Multi-Scale Multi-Modal Tokenizer for Radiology Report Generation Paper • 2507.00316 • Published Jun 30 • 15
CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following Paper • 2506.12285 • Published Jun 14 • 54