childes_42

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 5.3390

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 40000
  • training_steps: 100000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
No log 2.0964 2000 7.0908
6.9765 4.1929 4000 5.8746
6.9765 6.2893 6000 5.5441
5.2182 8.3857 8000 5.2788
5.2182 10.4822 10000 5.0992
4.7379 12.5786 12000 4.9710
4.7379 14.6751 14000 4.8759
4.4249 16.7715 16000 4.8005
4.4249 18.8679 18000 4.7436
4.1842 20.9644 20000 4.6922
4.1842 23.0608 22000 4.6481
3.9843 25.1572 24000 4.6155
3.9843 27.2537 26000 4.5982
3.8181 29.3501 28000 4.5845
3.8181 31.4465 30000 4.5811
3.6751 33.5430 32000 4.5796
3.6751 35.6394 34000 4.5828
3.5484 37.7358 36000 4.5869
3.5484 39.8323 38000 4.5976
3.4328 41.9287 40000 4.6090
3.4328 44.0252 42000 4.6298
3.31 46.1216 44000 4.6598
3.31 48.2180 46000 4.6983
3.1908 50.3145 48000 4.7263
3.1908 52.4109 50000 4.7624
3.0864 54.5073 52000 4.7913
3.0864 56.6038 54000 4.8263
2.993 58.7002 56000 4.8538
2.993 60.7966 58000 4.8770
2.9108 62.8931 60000 4.9097
2.9108 64.9895 62000 4.9486
2.8352 67.0860 64000 4.9929
2.8352 69.1824 66000 5.0339
2.7677 71.2788 68000 5.0516
2.7677 73.3753 70000 5.0869
2.708 75.4717 72000 5.1078
2.708 77.5681 74000 5.1317
2.6552 79.6646 76000 5.1598
2.6552 81.7610 78000 5.1774
2.6082 83.8574 80000 5.1928
2.6082 85.9539 82000 5.2273
2.5633 88.0503 84000 5.2497
2.5633 90.1468 86000 5.2644
2.5227 92.2432 88000 5.2840
2.5227 94.3396 90000 5.2921
2.4873 96.4361 92000 5.3118
2.4873 98.5325 94000 5.3205
2.458 100.6289 96000 5.3308
2.458 102.7254 98000 5.3365
2.4331 104.8218 100000 5.3390

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
222
Safetensors
Model size
12.7M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.