Repurposing Geometric Foundation Models for Multi-view Diffusion Paper • 2603.22275 • Published 7 days ago • 45
Efficiently Reconstructing Dynamic Scenes One D4RT at a Time Paper • 2512.08924 • Published Dec 9, 2025 • 21
Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing Paper • 2603.03143 • Published 27 days ago • 145
Flash-KMeans: Fast and Memory-Efficient Exact K-Means Paper • 2603.09229 • Published 21 days ago • 81
Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation Paper • 2312.02145 • Published Dec 4, 2023 • 8
Stroke of Surprise: Progressive Semantic Illusions in Vector Sketching Paper • 2602.12280 • Published Feb 12 • 34
view article Article We’re open-sourcing our text-to-image model and the process behind it Nov 12, 2025 • 96
CoVT: Chain-of-Visual-Thought Collection Enrich VLMs’ vision-centric reasoning capabilities via Chain-of-Visual-Thought! • 7 items • Updated Nov 25, 2025 • 6
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge Feb 7, 2025 • 282
view article Article CinePile 2.0 - making stronger datasets with adversarial refinement +2 Oct 23, 2024 • 19
view article Article TimeScope: How Long Can Your Video Large Multimodal Model Go? +2 Jul 23, 2025 • 48
view article Article PaliGemma – Google's Cutting-Edge Open Vision Language Model +1 May 14, 2024 • 285
V-JEPA 2 Collection A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann • 8 items • Updated Jun 13, 2025 • 200