SFT-Qwen2.5-Coder-3B_v1.1s

This model is a fine-tuned version of Qwen/Qwen2.5-Coder-3B-Instruct on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 8
optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.03
num_epochs: 3

Training Loss	Epoch	Step	Validation Loss
0.8929	0.1980	20	0.8335
0.7806	0.3960	40	0.7456
0.7399	0.5941	60	0.7087
0.8417	0.7921	80	0.6846
0.7405	0.9901	100	0.6639
0.6697	1.1881	120	0.6591
0.5717	1.3861	140	0.6512
0.654	1.5842	160	0.6377
0.553	1.7822	180	0.6323
0.6804	1.9802	200	0.6208
0.512	2.1782	220	0.6240
0.6068	2.3762	240	0.6217
0.4595	2.5743	260	0.6196
0.605	2.7723	280	0.6164
0.5567	2.9703	300	0.6122

Base model

Qwen/Qwen2.5-3B

Finetuned

Finetuned

Adapter

(25)

this model