de_childes_13

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 4.1847

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 13
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 40000
  • training_steps: 100000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
No log 1.5021 2000 7.0542
6.9608 3.0041 4000 5.8207
6.9608 4.5062 6000 5.4596
5.2094 6.0083 8000 5.1686
5.2094 7.5103 10000 4.9647
4.7331 9.0124 12000 4.8010
4.7331 10.5145 14000 4.6684
4.425 12.0165 16000 4.5528
4.425 13.5186 18000 4.4599
4.1941 15.0207 20000 4.3706
4.1941 16.5227 22000 4.2942
4.0094 18.0248 24000 4.2263
4.0094 19.5268 26000 4.1749
3.8583 21.0289 28000 4.1299
3.8583 22.5310 30000 4.0866
3.7325 24.0330 32000 4.0591
3.7325 25.5351 34000 4.0317
3.6244 27.0372 36000 4.0061
3.6244 28.5392 38000 3.9925
3.5304 30.0413 40000 3.9792
3.5304 31.5434 42000 3.9642
3.4371 33.0454 44000 3.9606
3.4371 34.5475 46000 3.9557
3.3462 36.0496 48000 3.9565
3.3462 37.5516 50000 3.9676
3.2678 39.0537 52000 3.9713
3.2678 40.5558 54000 3.9783
3.1984 42.0578 56000 3.9919
3.1984 43.5599 58000 3.9993
3.1371 45.0620 60000 4.0153
3.1371 46.5640 62000 4.0182
3.0817 48.0661 64000 4.0311
3.0817 49.5682 66000 4.0445
3.0311 51.0702 68000 4.0579
3.0311 52.5723 70000 4.0714
2.9858 54.0744 72000 4.0819
2.9858 55.5764 74000 4.0889
2.9448 57.0785 76000 4.1067
2.9448 58.5805 78000 4.1096
2.9072 60.0826 80000 4.1248
2.9072 61.5847 82000 4.1356
2.8728 63.0867 84000 4.1422
2.8728 64.5888 86000 4.1524
2.8421 66.0909 88000 4.1611
2.8421 67.5929 90000 4.1661
2.8146 69.0950 92000 4.1721
2.8146 70.5971 94000 4.1775
2.7902 72.0991 96000 4.1821
2.7902 73.6012 98000 4.1832
2.7708 75.1033 100000 4.1847

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
1
Safetensors
Model size
12.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support