HW2-orpo / README.md
KoNqUeRoR3891's picture
Model save
09ffb66 verified
metadata
library_name: transformers
license: mit
base_model: openai-community/gpt2
tags:
  - trl
  - orpo
  - generated_from_trainer
datasets:
  - piqa
model-index:
  - name: HW2-orpo
    results: []

HW2-orpo

This model is a fine-tuned version of openai-community/gpt2 on the piqa dataset. It achieves the following results on the evaluation set:

  • Loss: 3.8617
  • Rewards/chosen: -0.3716
  • Rewards/rejected: -0.3885
  • Rewards/accuracies: 0.6390
  • Rewards/margins: 0.0170
  • Logps/rejected: -3.8851
  • Logps/chosen: -3.7156
  • Logits/rejected: -3.3968
  • Logits/chosen: -3.5059
  • Nll Loss: 3.7885
  • Log Odds Ratio: -0.7324
  • Log Odds Chosen: 0.1830

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 5
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen Nll Loss Log Odds Ratio Log Odds Chosen
3.5511 0.2758 500 3.4162 -0.3146 -0.3224 0.6303 0.0078 -3.2238 -3.1457 -12.1919 -12.3316 3.3464 -0.6978 0.0837
3.3852 0.5517 1000 3.3345 -0.3060 -0.3152 0.6421 0.0092 -3.1517 -3.0602 -3.3351 -3.5024 3.2656 -0.6894 0.0984
3.2734 0.8275 1500 3.2903 -0.3011 -0.3101 0.6309 0.0090 -3.1013 -3.0113 -5.6602 -5.7320 3.2211 -0.6920 0.0975
3.104 1.1034 2000 3.2933 -0.3021 -0.3118 0.6371 0.0097 -3.1182 -3.0211 -0.2253 -0.3135 3.2237 -0.6956 0.1062
2.8138 1.3792 2500 3.2816 -0.3018 -0.3125 0.6464 0.0107 -3.1253 -3.0179 1.3216 1.2346 3.2125 -0.6916 0.1172
2.8178 1.6551 3000 3.2660 -0.2998 -0.3108 0.6383 0.0109 -3.1080 -2.9985 -0.7475 -0.8064 3.1968 -0.6923 0.1204
2.8122 1.9309 3500 3.2586 -0.2992 -0.3104 0.6433 0.0112 -3.1039 -2.9922 -2.8285 -2.9509 3.1893 -0.6925 0.1228
2.4931 2.2067 4000 3.3765 -0.3130 -0.3256 0.6427 0.0127 -3.2563 -3.1296 1.6707 1.5380 3.3063 -0.7020 0.1392
2.3999 2.4826 4500 3.4109 -0.3174 -0.3298 0.6402 0.0125 -3.2982 -3.1736 1.4695 1.2634 3.3402 -0.7069 0.1373
2.4254 2.7584 5000 3.3882 -0.3150 -0.3278 0.6439 0.0128 -3.2781 -3.1497 2.1282 1.9044 3.3180 -0.7018 0.1416
2.373 3.0343 5500 3.5698 -0.3370 -0.3515 0.6408 0.0145 -3.5149 -3.3698 3.7150 3.6601 3.4983 -0.7147 0.1595
2.0541 3.3101 6000 3.6256 -0.3430 -0.3570 0.6284 0.0140 -3.5700 -3.4302 1.1269 0.9714 3.5532 -0.7240 0.1540
2.0641 3.5860 6500 3.6157 -0.3425 -0.3577 0.6445 0.0152 -3.5771 -3.4246 -0.6703 -0.8165 3.5439 -0.7178 0.1665
2.0747 3.8618 7000 3.6335 -0.3447 -0.3598 0.6402 0.0151 -3.5983 -3.4474 -0.1967 -0.3291 3.5616 -0.7193 0.1640
1.9377 4.1376 7500 3.8286 -0.3671 -0.3838 0.6445 0.0167 -3.8381 -3.6712 -2.6871 -2.8058 3.7557 -0.7288 0.1800
1.8001 4.4135 8000 3.8629 -0.3715 -0.3882 0.6414 0.0168 -3.8822 -3.7146 -3.4193 -3.5370 3.7898 -0.7315 0.1810
1.81 4.6893 8500 3.8574 -0.3711 -0.3879 0.6396 0.0168 -3.8789 -3.7110 -4.2176 -4.3406 3.7842 -0.7321 0.1814
1.8108 4.9652 9000 3.8617 -0.3716 -0.3885 0.6390 0.0170 -3.8851 -3.7156 -3.3968 -3.5059 3.7885 -0.7324 0.1830

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.4.0+cu118
  • Datasets 2.21.0
  • Tokenizers 0.19.1