Add GPQA evaluation result (#9)
Browse files- Add GPQA evaluation result (29b5ff86b0a09af474084b9f915993f48d56c112)
- Fix task_id to diamond (matching benchmark eval.yaml) (2bc02b7640f5428474e8eb8efb78d88673618a98)
Co-authored-by: ben burtenshaw <burtenshaw@users.noreply.huggingface.co>
- .eval_results/gpqa.yaml +9 -0
.eval_results/gpqa.yaml
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
- dataset:
|
| 2 |
+
id: Idavidrein/gpqa
|
| 3 |
+
task_id: diamond
|
| 4 |
+
value: 38.89
|
| 5 |
+
date: '2026-01-27'
|
| 6 |
+
source:
|
| 7 |
+
url: https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct
|
| 8 |
+
name: Model Card
|
| 9 |
+
user: burtenshaw
|