2c4e74418f0fb555a85ce5bd8d7f88eb

This model is a fine-tuned version of google/mt5-large on the Helsinki-NLP/opus_books [de-nl] dataset. It achieves the following results on the evaluation set:

  • Loss: 1.8539
  • Data Size: 1.0
  • Epoch Runtime: 157.9288
  • Bleu: 9.0135

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 23.7303 0 12.4678 0.0208
No log 1 390 24.1355 0.0078 14.1786 0.0184
No log 2 780 24.3028 0.0156 16.6559 0.0166
No log 3 1170 17.3621 0.0312 21.3792 0.0179
No log 4 1560 14.1326 0.0625 26.5135 0.0209
1.1012 5 1950 9.8915 0.125 36.6540 0.0183
1.7496 6 2340 4.3676 0.25 55.1910 0.1699
3.1138 7 2730 2.3169 0.5 93.5042 5.3114
2.5365 8.0 3120 2.0728 1.0 169.0967 6.5409
2.3024 9.0 3510 1.9927 1.0 162.8727 7.0852
2.1691 10.0 3900 1.9378 1.0 160.1259 7.6590
2.0854 11.0 4290 1.8989 1.0 162.4107 7.9075
1.989 12.0 4680 1.8734 1.0 163.0841 8.1366
1.9149 13.0 5070 1.8588 1.0 160.3492 8.2804
1.8378 14.0 5460 1.8429 1.0 160.8582 8.5066
1.7263 15.0 5850 1.8366 1.0 160.3609 8.6285
1.6765 16.0 6240 1.8364 1.0 162.3152 8.7134
1.6365 17.0 6630 1.8284 1.0 161.5530 8.7564
1.5684 18.0 7020 1.8330 1.0 160.1011 8.7800
1.5244 19.0 7410 1.8346 1.0 159.7138 8.9650
1.4834 20.0 7800 1.8405 1.0 158.3638 8.9365
1.4328 21.0 8190 1.8539 1.0 157.9288 9.0135

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
4
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/2c4e74418f0fb555a85ce5bd8d7f88eb

Base model

google/mt5-large
Finetuned
(91)
this model

Evaluation results