CommonForms: A Large, Diverse Dataset for Form Field Detection Paper • 2509.16506 • Published Sep 20 • 19
PP-OCRv5 Collection PP-OCRv5 is the latest text recognition solution, supporting Simplified Chinese, Chinese Pinyin, Traditional Chinese, English, and Japanese • 13 items • Updated Sep 15 • 50
V-JEPA 2 Collection A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann • 8 items • Updated Jun 13 • 173
view article Article ScreenSuite - The most comprehensive evaluation suite for GUI Agents! Jun 6 • 55
Holo1 Collection Vision-Language Action Model for use in Surfer-H web navigation agent • 6 items • Updated Jun 10 • 48
AGUVIS: Unified Pure Vision GUI Agents Collection https://aguvis-project.github.io • 3 items • Updated Dec 20, 2024 • 7
MiniCPM-o & MiniCPM-V Collection Multimodal models with leading performance. • 28 items • Updated Sep 1 • 59