pretrain_2

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5716

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 48
  • eval_batch_size: 48
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • total_train_batch_size: 96
  • total_eval_batch_size: 96
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 200

Training results

Training Loss Epoch Step Validation Loss
0.7609 1.0 24286 0.6893
0.7239 2.0 48572 0.6476
0.7056 3.0 72858 0.6279
0.6961 4.0 97144 0.6242
0.6838 5.0 121430 0.6123
0.6742 6.0 145716 0.6111
0.6762 7.0 170002 0.6064
0.6722 8.0 194288 0.6052
0.6603 9.0 218574 0.6043
0.6522 10.0 242860 0.6005
0.654 11.0 267146 0.6022
0.6422 12.0 291432 0.5964
0.6495 13.0 315718 0.5967
0.655 14.0 340004 0.5961
0.651 15.0 364290 0.5925
0.6458 16.0 388576 0.5922
0.6441 17.0 412862 0.5901
0.6477 18.0 437148 0.5871
0.6382 19.0 461434 0.5896
0.6426 20.0 485720 0.5878
0.6369 21.0 510006 0.5873
0.6298 22.0 534292 0.5844
0.6388 23.0 558578 0.5863
0.6389 24.0 582864 0.5826
0.6394 25.0 607150 0.5861
0.6295 26.0 631436 0.5848
0.6365 27.0 655722 0.5815
0.6347 28.0 680008 0.5836
0.6384 29.0 704294 0.5870
0.6381 30.0 728580 0.5816
0.6306 31.0 752866 0.5813
0.6385 32.0 777152 0.5838
0.6338 33.0 801438 0.5808
0.6331 34.0 825724 0.5806
0.6235 35.0 850010 0.5825
0.6329 36.0 874296 0.5825
0.6338 37.0 898582 0.5810
0.6257 38.0 922868 0.5803
0.6268 39.0 947154 0.5810
0.6371 40.0 971440 0.5759
0.6272 41.0 995726 0.5775
0.6276 42.0 1020012 0.5771
0.635 43.0 1044298 0.5757
0.6314 44.0 1068584 0.5753
0.6279 45.0 1092870 0.5760
0.6186 46.0 1117156 0.5756
0.6214 47.0 1141442 0.5763
0.6257 48.0 1165728 0.5776
0.6272 49.0 1190014 0.5746
0.6291 50.0 1214300 0.5734
0.6311 51.0 1238586 0.5715
0.6279 52.0 1262872 0.5776
0.6372 53.0 1287158 0.5725
0.6155 54.0 1311444 0.5782
0.6241 55.0 1335730 0.5748
0.6187 56.0 1360016 0.5716

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.6.0.dev20241022+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
4
Safetensors
Model size
99.3M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.