kotoba-tech/kotoba-whisper-v2.0 Automatic Speech Recognition • 0.8B • Updated Oct 23, 2024 • 84.8k • 84
facebook/dinov3-convnext-small-pretrain-lvd1689m Image Feature Extraction • 49.5M • Updated Aug 19, 2025 • 16.9k • 22
Configuration error Featured 1.45k EasyControl Ghibli 🦀 1.45k New Ghibli EasyControl model is now released!!
Running on Zero Featured 180 Chat with Kimi-VL-A3B-Thinking-2506 🤔 180 Chat with images, videos, or PDFs to generate text
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published Apr 7, 2025 • 202
AppAgentX: Evolving GUI Agents as Proficient Smartphone Users Paper • 2503.02268 • Published Mar 4, 2025 • 11
microsoft/Phi-4-multimodal-instruct Automatic Speech Recognition • 6B • Updated 23 days ago • 248k • 1.55k