CharlesLi's picture
Model save
7466a89 verified
|
raw
history blame
7.27 kB
metadata
library_name: transformers
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: OpenELM-1_1B-DPO-full-max-reward-most-similar
    results: []

OpenELM-1_1B-DPO-full-max-reward-most-similar

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.4525
  • Rewards/chosen: -5.5938
  • Rewards/rejected: -6.125
  • Rewards/accuracies: 0.5234
  • Rewards/margins: 0.5078
  • Logps/rejected: -900.0
  • Logps/chosen: -880.0
  • Logits/rejected: -16.5
  • Logits/chosen: -16.75

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.0972 0.1047 100 0.7389 -1.3516 -1.5781 0.5527 0.2207 -446.0 -454.0 -13.625 -13.75
0.0661 0.2094 200 0.9071 -2.4219 -2.7969 0.5527 0.3711 -568.0 -560.0 -12.6875 -12.9375
0.0937 0.3141 300 0.9646 -3.0469 -3.375 0.5156 0.3184 -624.0 -624.0 -14.6875 -15.0
0.183 0.4188 400 0.9573 -2.9688 -3.0625 0.5 0.0898 -596.0 -616.0 -13.875 -14.125
0.0827 0.5236 500 1.1730 -3.7344 -4.125 0.4902 0.3887 -700.0 -692.0 -12.5625 -12.875
0.07 0.6283 600 1.0648 -3.0 -3.2656 0.5176 0.2695 -616.0 -620.0 -19.125 -19.0
0.0789 0.7330 700 0.8502 -2.8438 -2.9531 0.5078 0.0996 -584.0 -604.0 -12.0 -12.5
0.061 0.8377 800 1.2018 -3.9531 -4.2188 0.5156 0.2578 -708.0 -712.0 -17.375 -17.625
0.0536 0.9424 900 1.0159 -3.6875 -3.8125 0.5312 0.1299 -672.0 -688.0 -18.0 -18.125
0.0051 1.0471 1000 1.0946 -4.3438 -4.6875 0.5156 0.3379 -756.0 -752.0 -15.6875 -16.125
0.0049 1.1518 1100 1.2330 -4.7188 -5.1875 0.5098 0.4648 -808.0 -788.0 -17.75 -17.875
0.0037 1.2565 1200 1.2518 -4.75 -5.25 0.5156 0.4707 -812.0 -796.0 -18.5 -18.375
0.0122 1.3613 1300 1.0438 -3.9688 -4.3125 0.5312 0.3477 -720.0 -716.0 -18.25 -18.375
0.0082 1.4660 1400 1.2435 -4.75 -5.0625 0.5078 0.3203 -796.0 -792.0 -16.875 -17.125
0.0039 1.5707 1500 1.2731 -4.875 -5.2812 0.5039 0.4023 -816.0 -808.0 -15.3125 -15.75
0.0021 1.6754 1600 1.3171 -4.875 -5.2812 0.5098 0.4160 -820.0 -808.0 -15.8125 -16.125
0.0146 1.7801 1700 1.2652 -4.625 -5.0312 0.5020 0.4141 -792.0 -780.0 -16.0 -16.125
0.0034 1.8848 1800 1.2840 -4.6875 -4.9688 0.5234 0.3027 -788.0 -788.0 -16.0 -16.25
0.0031 1.9895 1900 1.2655 -4.5312 -4.8438 0.5117 0.3008 -772.0 -772.0 -16.125 -16.25
0.0007 2.0942 2000 1.3138 -4.875 -5.25 0.5078 0.3691 -812.0 -804.0 -16.5 -16.625
0.0209 2.1990 2100 1.3850 -5.25 -5.6562 0.5117 0.4258 -856.0 -844.0 -16.75 -16.875
0.0007 2.3037 2200 1.4692 -5.5625 -6.0625 0.5234 0.4980 -896.0 -876.0 -16.625 -16.875
0.0008 2.4084 2300 1.5070 -5.8125 -6.3438 0.5176 0.5312 -924.0 -900.0 -16.5 -16.75
0.0003 2.5131 2400 1.4649 -5.625 -6.125 0.5234 0.5039 -900.0 -880.0 -16.625 -16.75
0.0003 2.6178 2500 1.4368 -5.5312 -6.0312 0.5176 0.4980 -892.0 -872.0 -16.5 -16.75
0.0007 2.7225 2600 1.4452 -5.5625 -6.0938 0.5215 0.5039 -896.0 -876.0 -16.5 -16.75
0.0009 2.8272 2700 1.4519 -5.5938 -6.125 0.5234 0.5078 -900.0 -880.0 -16.5 -16.75
0.0005 2.9319 2800 1.4525 -5.5938 -6.125 0.5234 0.5078 -900.0 -880.0 -16.5 -16.75

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.3.0
  • Datasets 2.21.0
  • Tokenizers 0.19.1