de_wiki_mlm_42

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.0027

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 40000
  • training_steps: 100000

Training results

Training Loss Epoch Step Validation Loss
No log 1.0796 2000 8.1248
8.1604 2.1592 4000 7.4613
8.1604 3.2389 6000 7.3457
7.3684 4.3185 8000 7.2823
7.3684 5.3981 10000 7.1806
7.2121 6.4777 12000 7.1159
7.2121 7.5574 14000 7.0322
7.0695 8.6370 16000 7.0128
7.0695 9.7166 18000 6.9382
6.9563 10.7962 20000 6.9027
6.9563 11.8758 22000 6.8521
6.8584 12.9555 24000 6.7639
6.8584 14.0351 26000 6.6782
6.6954 15.1147 28000 6.5272
6.6954 16.1943 30000 6.3891
6.4335 17.2740 32000 6.1050
6.4335 18.3536 34000 5.6402
5.7799 19.4332 36000 5.1583
5.7799 20.5128 38000 4.8938
5.0133 21.5924 40000 4.6340
5.0133 22.6721 42000 4.4619
4.5804 23.7517 44000 4.2638
4.5804 24.8313 46000 4.1289
4.2594 25.9109 48000 4.0013
4.2594 26.9906 50000 3.8840
4.0135 28.0702 52000 3.8026
4.0135 29.1498 54000 3.7119
3.8273 30.2294 56000 3.6407
3.8273 31.3090 58000 3.5547
3.6814 32.3887 60000 3.5167
3.6814 33.4683 62000 3.4455
3.56 34.5479 64000 3.4070
3.56 35.6275 66000 3.3657
3.4651 36.7072 68000 3.3225
3.4651 37.7868 70000 3.3028
3.3776 38.8664 72000 3.2467
3.3776 39.9460 74000 3.2158
3.3098 41.0256 76000 3.2050
3.3098 42.1053 78000 3.1554
3.2499 43.1849 80000 3.1305
3.2499 44.2645 82000 3.1254
3.2031 45.3441 84000 3.0903
3.2031 46.4238 86000 3.0811
3.1596 47.5034 88000 3.0615
3.1596 48.5830 90000 3.0489
3.1274 49.6626 92000 3.0343
3.1274 50.7422 94000 3.0256
3.1001 51.8219 96000 3.0236
3.1001 52.9015 98000 3.0081
3.0805 53.9811 100000 3.0027

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
3
Safetensors
Model size
14.9M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results