CharlesLi's picture
Model save
c9c9d19 verified
|
raw
history blame
7.27 kB
metadata
library_name: transformers
tags:
  - trl
  - dpo
  - alignment-handbook
  - generated_from_trainer
model-index:
  - name: OpenELM-1_1B-DPO-full-max-14-reward
    results: []

OpenELM-1_1B-DPO-full-max-14-reward

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1668
  • Rewards/chosen: -3.5938
  • Rewards/rejected: -4.0
  • Rewards/accuracies: 0.4902
  • Rewards/margins: 0.4121
  • Logps/rejected: -688.0
  • Logps/chosen: -676.0
  • Logits/rejected: -16.375
  • Logits/chosen: -16.875

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.0562 0.1047 100 0.6971 -1.2578 -1.5703 0.5762 0.3145 -446.0 -444.0 -9.3125 -9.5625
0.0394 0.2094 200 0.7479 -0.8516 -1.0078 0.5195 0.1572 -390.0 -404.0 -12.3125 -12.75
0.0487 0.3141 300 0.9195 -1.9922 -2.3125 0.5176 0.3203 -520.0 -516.0 -13.4375 -13.6875
0.0454 0.4188 400 0.8309 -1.4453 -1.6016 0.4961 0.1543 -448.0 -462.0 -15.625 -15.75
0.0297 0.5236 500 0.8326 -3.1094 -3.375 0.5039 0.2734 -628.0 -628.0 -15.5 -15.6875
0.0434 0.6283 600 0.8373 -1.6953 -1.875 0.4941 0.1826 -476.0 -488.0 -15.0 -15.25
0.0496 0.7330 700 0.9407 -3.7344 -3.9688 0.5332 0.2236 -684.0 -692.0 -9.5625 -10.3125
0.0289 0.8377 800 1.0108 -3.1406 -3.25 0.4707 0.0991 -612.0 -632.0 -13.0625 -13.3125
0.0259 0.9424 900 1.0869 -3.6094 -3.7812 0.4648 0.1631 -668.0 -680.0 -15.625 -15.875
0.005 1.0471 1000 1.0944 -3.4375 -3.625 0.4570 0.1758 -652.0 -664.0 -15.0625 -15.25
0.0156 1.1518 1100 1.2452 -4.4062 -4.5938 0.4629 0.1973 -748.0 -760.0 -16.5 -16.625
0.0018 1.2565 1200 1.0496 -3.7344 -3.9219 0.4844 0.1885 -680.0 -692.0 -15.5625 -15.875
0.0046 1.3613 1300 1.0484 -3.375 -3.6094 0.4980 0.2402 -648.0 -656.0 -14.9375 -15.25
0.0041 1.4660 1400 0.9980 -3.5156 -3.8438 0.5137 0.3379 -676.0 -668.0 -13.8125 -14.3125
0.0077 1.5707 1500 1.0434 -3.1719 -3.5156 0.4902 0.3535 -640.0 -636.0 -13.875 -14.375
0.0016 1.6754 1600 1.0882 -3.8594 -4.2812 0.4922 0.4141 -716.0 -704.0 -12.4375 -12.9375
0.0042 1.7801 1700 1.0261 -3.3438 -3.7656 0.4941 0.4238 -664.0 -652.0 -15.5 -15.9375
0.0005 1.8848 1800 1.0536 -3.2344 -3.5938 0.4961 0.3555 -648.0 -644.0 -16.625 -17.0
0.0083 1.9895 1900 1.1039 -3.4844 -3.8125 0.4883 0.3242 -672.0 -668.0 -16.25 -16.625
0.0003 2.0942 2000 1.1159 -3.5156 -3.8438 0.4922 0.3301 -672.0 -672.0 -16.125 -16.625
0.0027 2.1990 2100 1.1535 -3.5938 -4.0 0.4980 0.4043 -688.0 -680.0 -16.125 -16.625
0.0003 2.3037 2200 1.1505 -3.5781 -3.9844 0.4902 0.4062 -688.0 -676.0 -16.25 -16.625
0.0006 2.4084 2300 1.1535 -3.5469 -3.9531 0.4902 0.4023 -684.0 -672.0 -16.25 -16.75
0.0002 2.5131 2400 1.1581 -3.5781 -3.9844 0.4922 0.4082 -688.0 -676.0 -16.25 -16.625
0.0001 2.6178 2500 1.1609 -3.5625 -3.9688 0.4961 0.4082 -684.0 -672.0 -16.375 -16.75
0.0008 2.7225 2600 1.1668 -3.5938 -4.0 0.4922 0.4121 -688.0 -676.0 -16.375 -16.75
0.0002 2.8272 2700 1.1668 -3.5938 -4.0 0.4902 0.4121 -688.0 -676.0 -16.375 -16.75
0.0003 2.9319 2800 1.1668 -3.5938 -4.0 0.4902 0.4121 -688.0 -676.0 -16.375 -16.875

Framework versions

  • Transformers 4.45.1
  • Pytorch 2.3.0
  • Datasets 3.0.1
  • Tokenizers 0.20.0