CharlesLi's picture
Model save
1643ab1 verified
|
raw
history blame
8.61 kB
metadata
library_name: transformers
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: OpenELM-1_1B-DPO-full-max-4-reward
    results: []

OpenELM-1_1B-DPO-full-max-4-reward

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.5952
  • Rewards/chosen: -13.125
  • Rewards/rejected: -14.4375
  • Rewards/accuracies: 0.6035
  • Rewards/margins: 1.3047
  • Logps/rejected: -1728.0
  • Logps/chosen: -1632.0
  • Logits/rejected: 2.4062
  • Logits/chosen: 0.5391

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6374 0.0838 80 0.6876 -0.6875 -0.7773 0.5664 0.0908 -366.0 -386.0 -9.8125 -10.125
0.6205 0.1675 160 0.6953 -1.2266 -1.375 0.5840 0.1475 -426.0 -440.0 -11.4375 -11.75
0.6209 0.2513 240 0.7235 -1.6016 -1.7578 0.5762 0.1553 -464.0 -478.0 -12.125 -12.375
0.6605 0.3351 320 0.7567 -2.7969 -3.0469 0.5996 0.2461 -592.0 -600.0 -12.375 -12.6875
0.6471 0.4188 400 0.7148 -2.7031 -2.875 0.5801 0.1816 -576.0 -588.0 -12.0625 -12.5
0.6121 0.5026 480 0.7704 -4.1562 -4.5625 0.5879 0.3945 -744.0 -736.0 -7.4062 -8.1875
0.6059 0.5864 560 0.7471 -4.125 -4.4688 0.5957 0.3320 -736.0 -732.0 -11.25 -11.9375
0.5698 0.6702 640 0.7281 -3.3125 -3.7344 0.6367 0.4297 -660.0 -648.0 -14.1875 -15.1875
0.6151 0.7539 720 0.7413 -2.8438 -3.1562 0.5840 0.3105 -604.0 -604.0 -12.625 -13.0625
0.5859 0.8377 800 0.7781 -4.7812 -5.2812 0.6074 0.4844 -816.0 -796.0 -8.5 -9.8125
0.6049 0.9215 880 0.7388 -2.8281 -3.125 0.5977 0.2930 -600.0 -600.0 -12.3125 -12.6875
0.4217 1.0052 960 0.7678 -5.1875 -5.8438 0.6465 0.6562 -872.0 -836.0 -6.3125 -7.8125
0.1894 1.0890 1040 1.0973 -6.9688 -7.625 0.6074 0.6719 -1056.0 -1016.0 -4.3438 -6.0
0.1959 1.1728 1120 0.9770 -6.75 -7.5 0.6133 0.7422 -1040.0 -992.0 -3.9062 -5.625
0.17 1.2565 1200 1.0293 -7.2188 -7.9062 0.6094 0.6719 -1080.0 -1040.0 -4.5625 -6.0938
0.1857 1.3403 1280 0.9556 -7.0625 -7.8125 0.5996 0.7578 -1072.0 -1024.0 -5.75 -7.3125
0.1872 1.4241 1360 0.9190 -7.25 -8.0625 0.5938 0.8359 -1096.0 -1040.0 -4.7812 -6.4375
0.1445 1.5079 1440 1.0569 -9.0 -9.9375 0.5996 0.9258 -1280.0 -1216.0 -3.7656 -5.3438
0.136 1.5916 1520 1.0663 -9.5625 -10.4375 0.6191 0.9219 -1336.0 -1272.0 -2.3125 -4.0312
0.1765 1.6754 1600 1.0288 -8.0625 -8.9375 0.6133 0.875 -1184.0 -1128.0 -2.6562 -4.5312
0.1661 1.7592 1680 1.0917 -8.0625 -8.9375 0.6035 0.8633 -1184.0 -1128.0 -2.7656 -4.6562
0.1451 1.8429 1760 1.0870 -8.375 -9.25 0.5957 0.8867 -1216.0 -1152.0 -2.9688 -4.6875
0.1712 1.9267 1840 1.0650 -8.6875 -9.6875 0.6172 0.9922 -1256.0 -1184.0 -2.4375 -4.3125
0.0278 2.0105 1920 1.0530 -8.875 -9.875 0.6152 0.9805 -1272.0 -1208.0 -1.8906 -3.8125
0.0225 2.0942 2000 1.4602 -11.5 -12.5625 0.6035 1.0312 -1544.0 -1472.0 0.4297 -1.5547
0.0182 2.1780 2080 1.5544 -12.6875 -13.8125 0.5977 1.1172 -1672.0 -1592.0 1.7266 -0.1621
0.0385 2.2618 2160 1.5476 -13.0 -14.1875 0.6016 1.1953 -1712.0 -1616.0 1.5859 -0.2598
0.0162 2.3455 2240 1.5637 -12.8125 -13.9375 0.6016 1.1641 -1688.0 -1600.0 1.8984 0.0913
0.0239 2.4293 2320 1.4822 -11.9375 -13.0625 0.5938 1.1797 -1600.0 -1512.0 1.2422 -0.6172
0.0264 2.5131 2400 1.6307 -13.375 -14.6875 0.6035 1.3047 -1760.0 -1656.0 2.4688 0.6406
0.024 2.5969 2480 1.5421 -12.3125 -13.5625 0.5996 1.2188 -1640.0 -1552.0 1.7422 -0.1631
0.0217 2.6806 2560 1.6061 -13.0 -14.25 0.6035 1.2656 -1720.0 -1616.0 2.2656 0.3770
0.0208 2.7644 2640 1.5995 -13.1875 -14.4375 0.6055 1.2812 -1736.0 -1632.0 2.5156 0.6602
0.0206 2.8482 2720 1.5964 -13.1875 -14.4375 0.6055 1.2969 -1736.0 -1632.0 2.4688 0.6133
0.0163 2.9319 2800 1.5952 -13.125 -14.4375 0.6035 1.3047 -1728.0 -1632.0 2.4062 0.5391

Framework versions

  • Transformers 4.45.1
  • Pytorch 2.3.0
  • Datasets 3.0.1
  • Tokenizers 0.20.0