JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation Paper • 2512.22905 • Published 17 days ago • 18
Multimodal Spatial Reasoning in the Large Model Era: A Survey and Benchmarks Paper • 2510.25760 • Published Oct 29, 2025 • 16
Mind the Third Eye! Benchmarking Privacy Awareness in MLLM-powered Smartphone Agents Paper • 2508.19493 • Published Aug 27, 2025 • 11
UltraEdit: Training-, Subject-, and Memory-Free Lifelong Editing in Large Language Models Paper • 2505.14679 • Published May 20, 2025 • 5
RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video Paper • 2505.02064 • Published May 4, 2025 • 4