CharlesLi's picture
Model save
09e373e verified
|
raw
history blame
7.25 kB
metadata
library_name: transformers
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: OpenELM-1_1B-DPO-full-max-8-reward
    results: []

OpenELM-1_1B-DPO-full-max-8-reward

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.8180
  • Rewards/chosen: -14.9375
  • Rewards/rejected: -17.25
  • Rewards/accuracies: 0.6270
  • Rewards/margins: 2.3125
  • Logps/rejected: -2016.0
  • Logps/chosen: -1816.0
  • Logits/rejected: 1.4375
  • Logits/chosen: -0.3125

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5108 0.1047 100 0.6708 -1.2344 -1.4531 0.6094 0.2178 -434.0 -442.0 -10.375 -10.8125
0.4452 0.2094 200 0.7220 -2.8281 -3.2812 0.6172 0.4434 -616.0 -600.0 -9.625 -10.1875
0.4197 0.3141 300 0.7966 -3.9844 -4.5938 0.6055 0.6133 -748.0 -716.0 -7.5 -8.375
0.4624 0.4188 400 0.7233 -3.5625 -4.0625 0.6055 0.4941 -692.0 -676.0 -11.3125 -12.3125
0.373 0.5236 500 0.8017 -4.5 -5.1875 0.6270 0.6758 -808.0 -768.0 -7.3125 -8.6875
0.418 0.6283 600 0.8381 -4.4688 -5.2812 0.6133 0.7812 -816.0 -768.0 -9.8125 -11.0
0.4 0.7330 700 0.7900 -4.3125 -5.125 0.6465 0.8008 -800.0 -752.0 -10.75 -11.6875
0.384 0.8377 800 0.8198 -5.25 -6.0312 0.6367 0.7773 -892.0 -844.0 -7.2188 -8.6875
0.3594 0.9424 900 1.0285 -7.25 -8.4375 0.6289 1.1719 -1128.0 -1040.0 -5.7188 -7.0938
0.0833 1.0471 1000 1.1501 -9.25 -10.375 0.6055 1.1562 -1328.0 -1240.0 -2.5781 -4.0938
0.0953 1.1518 1100 1.2000 -9.5 -10.75 0.5938 1.2422 -1360.0 -1264.0 -2.0 -3.5938
0.1198 1.2565 1200 1.2179 -8.5625 -9.75 0.6074 1.1641 -1264.0 -1176.0 -2.9375 -4.9688
0.1291 1.3613 1300 1.1658 -8.75 -10.0625 0.6133 1.2969 -1296.0 -1192.0 -2.6562 -4.5938
0.0877 1.4660 1400 1.1249 -9.0 -10.375 0.625 1.3438 -1320.0 -1216.0 -1.3359 -3.2969
0.1044 1.5707 1500 1.1523 -9.75 -11.375 0.6484 1.6328 -1424.0 -1296.0 0.3184 -1.5312
0.0798 1.6754 1600 1.3625 -11.6875 -13.375 0.6172 1.6562 -1624.0 -1488.0 -0.3613 -2.2344
0.0847 1.7801 1700 1.3074 -12.25 -13.6875 0.6211 1.4375 -1656.0 -1544.0 0.5781 -1.0938
0.1018 1.8848 1800 1.1160 -9.4375 -10.8125 0.6387 1.4219 -1376.0 -1264.0 -1.4219 -3.3125
0.0649 1.9895 1900 1.3142 -10.75 -12.4375 0.6211 1.6797 -1536.0 -1392.0 -1.0312 -3.0469
0.0155 2.0942 2000 1.7397 -14.4375 -16.5 0.6309 2.0781 -1944.0 -1768.0 0.6602 -1.1641
0.0091 2.1990 2100 1.7227 -14.0 -16.125 0.625 2.1562 -1904.0 -1720.0 0.5039 -1.3672
0.0101 2.3037 2200 1.8446 -15.0625 -17.25 0.6406 2.2344 -2016.0 -1824.0 0.9844 -0.8359
0.0145 2.4084 2300 1.7911 -14.5 -16.75 0.6289 2.2656 -1968.0 -1768.0 1.1719 -0.6680
0.0114 2.5131 2400 1.7978 -14.75 -17.0 0.6270 2.2812 -1992.0 -1792.0 1.2344 -0.5430
0.012 2.6178 2500 1.7940 -14.6875 -17.0 0.6309 2.2812 -1984.0 -1784.0 1.2656 -0.5234
0.0075 2.7225 2600 1.8009 -14.75 -17.0 0.6309 2.2656 -1992.0 -1792.0 1.3594 -0.4062
0.0142 2.8272 2700 1.8148 -14.9375 -17.25 0.6309 2.2969 -2016.0 -1816.0 1.4453 -0.3027
0.0102 2.9319 2800 1.8180 -14.9375 -17.25 0.6270 2.3125 -2016.0 -1816.0 1.4375 -0.3125

Framework versions

  • Transformers 4.45.1
  • Pytorch 2.3.0
  • Datasets 3.0.1
  • Tokenizers 0.20.0