Got to 1199.8 tokens/sec with Devstral Small -2 on my desktop GPU workstation. vLLM nightly. Works out of the box with Mistral Vibe. Next is time to test the big one.
3 replies
Β·
reacted to MonsterMMORPG's
post with π₯about 9 hours ago
Whisper-WebUI Premium - Ultra Fast and High Accuracy Speech to Text Transcripton App for All Languages - Windows, RunPod, Massed Compute 1-Click Installers - Supporting RTX 1000 to 5000 series
MiniCPM-V 4.5 π New MLLM for image, multi-image & video understanding, running even on your phone, released by OpenBMB openbmb/MiniCPM-V-4_5
β¨ SOTA vision language capability β¨ 96Γ video token compression > high-FPS & long video reasoning β¨ Switchable fast vs deep thinking modes β¨ Strong OCR, document parsing, supports 30+ languages
OpenGVLab's InternVL3_5-2B-MPO [Mixed Preference Optimization (MPO)] is a compact vision-language model in the InternVL3.5 series. You can now experience it in the Tiny VLMs Lab, an app featuring 15+ multimodal VLMs ranging from 250M to 4B parameters. These models support tasks such as OCR, reasoning, single-shot answering with small models, and captioning (including ablated variants), across a broad range of visual categories. They are also capable of handling images with complex, sensitive, or nuanced content, while adapting to varying aspect ratios and resolutions.