DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects Paper • 2410.02730 • Published Oct 3, 2024
HOLODECK 2.0: Vision-Language-Guided 3D World Generation with Editing Paper • 2508.05899 • Published Aug 7, 2025 • 1
Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding Paper • 2601.10611 • Published Jan 15 • 32
Unified Spatio-Temporal Token Scoring for Efficient Video VLMs Paper • 2603.18004 • Published 22 days ago • 12
MolmoPoint: Better Pointing for VLMs with Grounding Tokens Paper • 2603.28069 • Published 10 days ago • 8
MolmoPoint: Better Pointing for VLMs with Grounding Tokens Paper • 2603.28069 • Published 10 days ago • 8
Unified Spatio-Temporal Token Scoring for Efficient Video VLMs Paper • 2603.18004 • Published 22 days ago • 12
MolmoAct: Action Reasoning Models that can Reason in Space Paper • 2508.07917 • Published Aug 11, 2025 • 45