Introduction

AGI-Eval-OA-Judge is a judge model designed for evaluation tasks with a single correct answer. Given a reference answer, it determines whether a model’s response is correct. AGI-Eval-OA-Judge is full-parameter fine-tuning based on the Qwen2.5-32B model.

The training data for the model consists of two parts:

Data annotated by top closed-source models: First, responses from various large models are collected for each instance in public datasets. Then, closed-source model is used to determine whether each response is correct.

Targeted supplementation for training data gaps: A preliminary judge model is trained using the first part of the data. This trained judge model is then used to generate predictions on additional samples. Human annotators labeled the badcases.

Finally, the model is fine-tuned on the combined dataset from both stages.

License

This code repository and the model weights are licensed under the MIT License.

Please note that:

AGI-Eval-OA-Judge are derived from Qwen2.5-32B, which are originally licensed under Apache 2.0 License.

Downloads last month: 3

Safetensors

Model size

33B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AGI-Eval-Official/AGI-Eval-OA-Judge

Quantizations

1 model