childes_30

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 5.3366

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 30
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 40000
  • training_steps: 100000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
No log 2.0964 2000 7.1029
6.9987 4.1929 4000 5.8842
6.9987 6.2893 6000 5.5487
5.2204 8.3857 8000 5.2793
5.2204 10.4822 10000 5.1049
4.7358 12.5786 12000 4.9836
4.7358 14.6751 14000 4.8829
4.4216 16.7715 16000 4.8029
4.4216 18.8679 18000 4.7423
4.1842 20.9644 20000 4.6904
4.1842 23.0608 22000 4.6458
3.9858 25.1572 24000 4.6234
3.9858 27.2537 26000 4.6056
3.8189 29.3501 28000 4.5909
3.8189 31.4465 30000 4.5868
3.6763 33.5430 32000 4.5830
3.6763 35.6394 34000 4.5782
3.5493 37.7358 36000 4.5854
3.5493 39.8323 38000 4.5964
3.4327 41.9287 40000 4.6104
3.4327 44.0252 42000 4.6369
3.3112 46.1216 44000 4.6697
3.3112 48.2180 46000 4.6953
3.1908 50.3145 48000 4.7280
3.1908 52.4109 50000 4.7629
3.0857 54.5073 52000 4.7928
3.0857 56.6038 54000 4.8196
2.9936 58.7002 56000 4.8564
2.9936 60.7966 58000 4.8890
2.9113 62.8931 60000 4.9200
2.9113 64.9895 62000 4.9539
2.8353 67.0860 64000 4.9934
2.8353 69.1824 66000 5.0297
2.7673 71.2788 68000 5.0610
2.7673 73.3753 70000 5.0805
2.7091 75.4717 72000 5.1054
2.7091 77.5681 74000 5.1283
2.6563 79.6646 76000 5.1594
2.6563 81.7610 78000 5.1836
2.6077 83.8574 80000 5.2009
2.6077 85.9539 82000 5.2230
2.5635 88.0503 84000 5.2444
2.5635 90.1468 86000 5.2631
2.5229 92.2432 88000 5.2798
2.5229 94.3396 90000 5.2951
2.4886 96.4361 92000 5.3101
2.4886 98.5325 94000 5.3189
2.4584 100.6289 96000 5.3300
2.4584 102.7254 98000 5.3327
2.4337 104.8218 100000 5.3366

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
299
Safetensors
Model size
12.7M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.