tfa_output_2025_m02_d02_t23h_29m_10s

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.5286

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 8
  • optimizer: Use OptimizerNames.PAGED_ADAMW with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss
No log 0 0 3.1242
3.6266 0.6667 1 3.1242
1.7421 1.0 2 3.1070
3.5691 1.6667 3 3.0863
1.8145 2.0 4 3.0757
3.5803 2.6667 5 3.0580
1.7267 3.0 6 3.0413
3.5001 3.6667 7 3.0321
1.756 4.0 8 3.0203
3.5043 4.6667 9 3.0064
1.6752 5.0 10 2.9966
3.449 5.6667 11 2.9887
1.7513 6.0 12 2.9786
3.4134 6.6667 13 2.9696
1.7109 7.0 14 2.9604
3.4382 7.6667 15 2.9518
1.6749 8.0 16 2.9431
3.3344 8.6667 17 2.9329
1.7577 9.0 18 2.9260
3.3458 9.6667 19 2.9184
1.6737 10.0 20 2.9092
3.4422 10.6667 21 2.8993
1.5953 11.0 22 2.8892
3.3445 11.6667 23 2.8819
1.631 12.0 24 2.8721
3.3118 12.6667 25 2.8626
1.6067 13.0 26 2.8547
3.2708 13.6667 27 2.8468
1.619 14.0 28 2.8393
3.3062 14.6667 29 2.8312
1.6059 15.0 30 2.8240
3.2822 15.6667 31 2.8154
1.607 16.0 32 2.8071
3.2488 16.6667 33 2.7995
1.6078 17.0 34 2.7929
3.2022 17.6667 35 2.7875
1.5858 18.0 36 2.7801
3.1965 18.6667 37 2.7720
1.5955 19.0 38 2.7656
3.1891 19.6667 39 2.7606
1.5755 20.0 40 2.7527
3.0662 20.6667 41 2.7467
1.6257 21.0 42 2.7411
3.1364 21.6667 43 2.7350
1.5211 22.0 44 2.7288
3.157 22.6667 45 2.7235
1.4631 23.0 46 2.7158
3.1188 23.6667 47 2.7099
1.4971 24.0 48 2.7063
2.98 24.6667 49 2.7008
1.634 25.0 50 2.6942
3.016 25.6667 51 2.6879
1.5771 26.0 52 2.6843
3.0495 26.6667 53 2.6808
1.4922 27.0 54 2.6750
2.9655 27.6667 55 2.6711
1.6188 28.0 56 2.6655
3.0155 28.6667 57 2.6611
1.4867 29.0 58 2.6567
3.0117 29.6667 59 2.6515
1.5069 30.0 60 2.6470
3.0118 30.6667 61 2.6441
1.4577 31.0 62 2.6375
3.0372 31.6667 63 2.6350
1.411 32.0 64 2.6295
2.9611 32.6667 65 2.6267
1.4289 33.0 66 2.6246
2.9595 33.6667 67 2.6207
1.437 34.0 68 2.6166
2.9483 34.6667 69 2.6127
1.4469 35.0 70 2.6114
2.9291 35.6667 71 2.6067
1.411 36.0 72 2.6021
2.9534 36.6667 73 2.5988
1.4295 37.0 74 2.5958
2.9181 37.6667 75 2.5929
1.4138 38.0 76 2.5891
2.9133 38.6667 77 2.5855
1.4172 39.0 78 2.5818
2.8655 39.6667 79 2.5809
1.3988 40.0 80 2.5780
2.929 40.6667 81 2.5750
1.3445 41.0 82 2.5712
2.8141 41.6667 83 2.5696
1.503 42.0 84 2.5668
2.8483 42.6667 85 2.5636
1.4017 43.0 86 2.5622
2.8643 43.6667 87 2.5575
1.3592 44.0 88 2.5553
2.8332 44.6667 89 2.5537
1.3675 45.0 90 2.5503
2.742 45.6667 91 2.5478
1.5006 46.0 92 2.5453
2.7909 46.6667 93 2.5436
1.4314 47.0 94 2.5406
2.7937 47.6667 95 2.5382
1.3617 48.0 96 2.5359
2.8299 48.6667 97 2.5343
1.3295 49.0 98 2.5306
2.7586 49.6667 99 2.5297
1.4496 50.0 100 2.5286

Framework versions

  • Transformers 4.48.0
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
0
Safetensors
Model size
124M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for brando/tfa_output_2025_m02_d02_t23h_29m_10s

Finetuned
(1362)
this model