iitb_punct_orig_finetuned_eng_Ltn_to_mar_Deva

This model is a fine-tuned version of ai4bharat/indictrans2-indic-indic-dist-320M on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 10

Training Loss	Epoch	Step	Validation Loss	Bleu	Gen Len
0.599	0.3373	4000	0.5677	6.4634	20.7483
0.5686	0.6746	8000	0.5257	7.1881	20.8685
0.495	1.0119	12000	0.5027	7.4899	20.8699
0.4929	1.3492	16000	0.4881	7.7787	20.8711
0.4806	1.6865	20000	0.4769	7.9328	20.8708
0.4517	2.0238	24000	0.4675	8.0617	20.8698
0.4536	2.3611	28000	0.4590	8.1686	20.8712
0.4355	2.6984	32000	0.4546	8.357	20.869
0.4036	3.0357	36000	0.4507	8.4307	20.8688
0.4021	3.3730	40000	0.4452	8.455	20.869
0.4097	3.7103	44000	0.4410	8.5307	20.8662
0.3623	4.0476	48000	0.4397	8.6425	20.8683
0.3823	4.3849	52000	0.4354	8.7187	20.8648
0.3822	4.7222	56000	0.4319	8.7131	20.8681
0.3434	5.0594	60000	0.4338	8.7598	20.869
0.3568	5.3967	64000	0.4296	8.8605	20.8626
0.3691	5.7340	68000	0.4272	8.8506	20.8722
0.3419	6.0713	72000	0.4295	8.9405	20.8697
0.3566	6.4086	76000	0.4262	9.0144	20.8692
0.3483	6.7459	80000	0.4258	9.0411	20.8695
0.3373	7.0832	84000	0.4259	9.0363	20.8659
0.3355	7.4205	88000	0.4252	9.0481	20.8665
0.3251	7.7578	92000	0.4227	9.0958	20.8655
0.3146	8.0951	96000	0.4234	9.0694	20.8682
0.3295	8.4324	100000	0.4226	9.1057	20.8662
0.3362	8.7697	104000	0.4219	9.1125	20.8652
0.3163	9.1070	108000	0.4229	9.1516	20.867
0.3061	9.4443	112000	0.4222	9.1548	20.8688
0.3074	9.7816	116000	0.4214	9.1768	20.8676

Safetensors

Model size

0.3B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Finetuned

(3)

this model