Add model-index with benchmark evaluations
#2
by
davidlms
- opened
Added structured evaluation results from benchmark comparison (36 benchmarks):
General VQA:
- MMBench V1.1: 86.9, MMBench V1.1 (CN): 85.9, MMStar: 74.7, BLINK (Val): 65.5, MUIRBENCH: 75.7
Multimodal Reasoning:
- MMMU (Val): 71.1, MMMU_Pro: 60.6, VideoMMU: 70.1, MathVista: 82.7, AI2D: 89.2, DynaMath: 43.7, WeMath: 60.0, ZeroBench (sub): 22.5
Multimodal Agentic:
- MMBrowseComp: 7.1, Design2Code: 69.8, Flame-React-Eval: 78.8, OSWorld: 21.1, AndroidWorld: 42.7, WebVoyager: 71.8, Webquest-SingleQA: 75.1, Webquest-MultiQA: 53.4
Multimodal Long Context:
- MMLongBench-Doc: 53.0, MMLongBench-128K: 63.4, LVBench: 49.5
OCR & Chart:
- OCRBench: 84.7, OCR-Bench_v2 (EN): 63.5, OCR-Bench_v2 (CN): 59.5, ChartQAPro: 62.6, ChartMuseum: 49.8, CharXiv_Val-Reasoning: 59.6
Spatial & Grounding:
- OmniSpatial: 50.6, RefCOCO-avg (val): 85.6, TreeBench: 45.7, Ref-L4-test: 87.7
This enables the model to appear in leaderboards and makes it easier to compare with other models.