Add model-index with benchmark evaluations

#2
by davidlms - opened

Added structured evaluation results from benchmark comparison (36 benchmarks):

General VQA:

  • MMBench V1.1: 86.9, MMBench V1.1 (CN): 85.9, MMStar: 74.7, BLINK (Val): 65.5, MUIRBENCH: 75.7

Multimodal Reasoning:

  • MMMU (Val): 71.1, MMMU_Pro: 60.6, VideoMMU: 70.1, MathVista: 82.7, AI2D: 89.2, DynaMath: 43.7, WeMath: 60.0, ZeroBench (sub): 22.5

Multimodal Agentic:

  • MMBrowseComp: 7.1, Design2Code: 69.8, Flame-React-Eval: 78.8, OSWorld: 21.1, AndroidWorld: 42.7, WebVoyager: 71.8, Webquest-SingleQA: 75.1, Webquest-MultiQA: 53.4

Multimodal Long Context:

  • MMLongBench-Doc: 53.0, MMLongBench-128K: 63.4, LVBench: 49.5

OCR & Chart:

  • OCRBench: 84.7, OCR-Bench_v2 (EN): 63.5, OCR-Bench_v2 (CN): 59.5, ChartQAPro: 62.6, ChartMuseum: 49.8, CharXiv_Val-Reasoning: 59.6

Spatial & Grounding:

  • OmniSpatial: 50.6, RefCOCO-avg (val): 85.6, TreeBench: 45.7, Ref-L4-test: 87.7

This enables the model to appear in leaderboards and makes it easier to compare with other models.

Cannot merge
This branch has merge conflicts in the following files:
  • README.md

Sign up or log in to comment