wikipedia_13

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.9275

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 13
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 40000
  • training_steps: 100000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
No log 1.4657 2000 8.0872
8.1326 2.9315 4000 7.3975
8.1326 4.3972 6000 7.2718
7.2846 5.8630 8000 7.1835
7.2846 7.3287 10000 7.0992
7.1078 8.7944 12000 7.0331
7.1078 10.2602 14000 6.9494
6.942 11.7259 16000 6.8899
6.942 13.1916 18000 6.7822
6.7676 14.6574 20000 6.7185
6.7676 16.1231 22000 6.6536
6.5959 17.5889 24000 6.5431
6.5959 19.0546 26000 6.3925
6.3624 20.5203 28000 6.2119
6.3624 21.9861 30000 5.9526
5.9309 23.4518 32000 5.4162
5.9309 24.9176 34000 5.0255
5.0575 26.3833 36000 4.7680
5.0575 27.8490 38000 4.5020
4.5282 29.3148 40000 4.3214
4.5282 30.7805 42000 4.1312
4.1335 32.2462 44000 3.9708
4.1335 33.7120 46000 3.8616
3.8339 35.1777 48000 3.7640
3.8339 36.6435 50000 3.7074
3.6042 38.1092 52000 3.6360
3.6042 39.5749 54000 3.5203
3.4291 41.0407 56000 3.4424
3.4291 42.5064 58000 3.4276
3.286 43.9722 60000 3.3797
3.286 45.4379 62000 3.3277
3.1748 46.9036 64000 3.2922
3.1748 48.3694 66000 3.2361
3.0842 49.8351 68000 3.2043
3.0842 51.3008 70000 3.1870
3.0082 52.7666 72000 3.1487
3.0082 54.2323 74000 3.1257
2.9483 55.6981 76000 3.1001
2.9483 57.1638 78000 3.0694
2.8885 58.6295 80000 3.0605
2.8885 60.0953 82000 3.0568
2.8416 61.5610 84000 3.0083
2.8416 63.0267 86000 3.0188
2.8064 64.4925 88000 3.0213
2.8064 65.9582 90000 2.9645
2.7717 67.4240 92000 2.9901
2.7717 68.8897 94000 2.9684
2.7441 70.3554 96000 2.9565
2.7441 71.8212 98000 2.9547
2.7289 73.2869 100000 2.9275

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
6
Safetensors
Model size
14.9M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.