CharlesLi's picture
Model save
80c4adf verified
|
raw
history blame
7.28 kB
metadata
library_name: transformers
tags:
  - trl
  - dpo
  - alignment-handbook
  - generated_from_trainer
model-index:
  - name: OpenELM-1_1B-DPO-full-max-14-reward
    results: []

OpenELM-1_1B-DPO-full-max-14-reward

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1518
  • Rewards/chosen: -4.0625
  • Rewards/rejected: -4.5312
  • Rewards/accuracies: 0.4980
  • Rewards/margins: 0.4824
  • Logps/rejected: -744.0
  • Logps/chosen: -724.0
  • Logits/rejected: -15.25
  • Logits/chosen: -15.5625

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.05 0.1047 100 0.8594 -2.0469 -2.3906 0.5488 0.3496 -528.0 -524.0 -9.3125 -9.5625
0.0695 0.2094 200 0.7369 -1.2031 -1.3984 0.4883 0.1953 -428.0 -438.0 -12.9375 -13.3125
0.0402 0.3141 300 1.4284 -4.5938 -5.0 0.5195 0.4297 -792.0 -776.0 -9.125 -9.75
0.0373 0.4188 400 0.8732 -2.3906 -2.5156 0.4902 0.1279 -540.0 -556.0 -13.375 -13.5625
0.0289 0.5236 500 0.9761 -3.5938 -3.8906 0.4902 0.2871 -676.0 -680.0 -15.0 -15.25
0.0549 0.6283 600 0.9004 -2.3594 -2.6094 0.4805 0.2539 -548.0 -556.0 -15.0625 -15.1875
0.0385 0.7330 700 0.9997 -3.125 -3.25 0.4746 0.1299 -612.0 -632.0 -11.5 -11.875
0.0303 0.8377 800 1.0037 -2.7656 -2.9375 0.4785 0.1748 -584.0 -596.0 -13.75 -14.0625
0.0147 0.9424 900 1.1243 -3.6094 -3.7656 0.4824 0.1553 -664.0 -680.0 -13.8125 -14.1875
0.0038 1.0471 1000 1.0635 -3.5781 -3.7969 0.4941 0.2158 -668.0 -676.0 -13.8125 -14.125
0.0192 1.1518 1100 1.2317 -4.1875 -4.5625 0.4941 0.3711 -744.0 -736.0 -14.0 -14.25
0.0035 1.2565 1200 1.1275 -3.8125 -4.0938 0.5195 0.2676 -696.0 -700.0 -13.6875 -14.0625
0.0014 1.3613 1300 1.1072 -3.8281 -4.1875 0.5039 0.3672 -708.0 -700.0 -10.3125 -11.0
0.0009 1.4660 1400 1.2158 -4.1562 -4.5938 0.5039 0.4570 -748.0 -732.0 -15.6875 -16.0
0.0047 1.5707 1500 0.9804 -3.4062 -3.7656 0.5 0.3672 -664.0 -660.0 -14.625 -15.0625
0.0009 1.6754 1600 1.0340 -4.0312 -4.4688 0.5137 0.4219 -736.0 -724.0 -10.6875 -11.4375
0.0053 1.7801 1700 0.9808 -3.4531 -3.8125 0.5215 0.3730 -672.0 -664.0 -16.125 -16.25
0.0006 1.8848 1800 0.9781 -3.2812 -3.5312 0.5098 0.2578 -640.0 -644.0 -16.125 -16.25
0.0086 1.9895 1900 1.1759 -4.1562 -4.6562 0.5020 0.5195 -756.0 -732.0 -15.4375 -15.6875
0.0001 2.0942 2000 1.1181 -3.8594 -4.3125 0.5 0.4473 -720.0 -704.0 -15.4375 -15.6875
0.0145 2.1990 2100 1.1573 -4.0312 -4.5312 0.4980 0.4941 -740.0 -720.0 -15.625 -15.875
0.0002 2.3037 2200 1.1923 -4.2188 -4.7188 0.4961 0.5234 -760.0 -740.0 -15.0625 -15.4375
0.0005 2.4084 2300 1.1497 -4.0 -4.5 0.4902 0.4824 -736.0 -720.0 -15.3125 -15.5625
0.0002 2.5131 2400 1.1575 -4.0312 -4.5312 0.4961 0.4902 -740.0 -720.0 -15.375 -15.6875
0.0001 2.6178 2500 1.1676 -4.0938 -4.5938 0.4922 0.5039 -748.0 -728.0 -15.25 -15.5625
0.0014 2.7225 2600 1.1490 -4.0312 -4.5 0.5020 0.4785 -740.0 -720.0 -15.3125 -15.625
0.0002 2.8272 2700 1.1505 -4.0312 -4.5312 0.4961 0.4824 -740.0 -720.0 -15.25 -15.5625
0.0002 2.9319 2800 1.1518 -4.0625 -4.5312 0.4980 0.4824 -744.0 -724.0 -15.25 -15.5625

Framework versions

  • Transformers 4.45.1
  • Pytorch 2.3.0
  • Datasets 3.0.1
  • Tokenizers 0.20.0