OpenELM-1_1B-DPO-full-3-5

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1637
  • Rewards/chosen: -13.625
  • Rewards/rejected: -17.0
  • Rewards/accuracies: 0.7051
  • Rewards/margins: 3.375
  • Logps/rejected: -1984.0
  • Logps/chosen: -1680.0
  • Logits/rejected: 3.8594
  • Logits/chosen: 1.9453

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6232 0.1047 100 0.6285 -0.6055 -0.8242 0.6660 0.2178 -368.0 -374.0 -8.25 -8.5
0.5729 0.2093 200 0.5957 -1.6328 -2.1094 0.6992 0.4766 -498.0 -478.0 -7.9688 -8.4375
0.6122 0.3140 300 0.5751 -1.6016 -2.1094 0.7129 0.5 -496.0 -474.0 -5.5938 -6.1562
0.5905 0.4186 400 0.5994 -1.6328 -2.1875 0.6680 0.5508 -504.0 -478.0 -5.5625 -6.3438
0.5781 0.5233 500 0.5764 -1.7188 -2.2656 0.6816 0.5586 -512.0 -486.0 -6.0625 -6.8438
0.5356 0.6279 600 0.5831 -3.8906 -4.5625 0.6699 0.6797 -744.0 -704.0 -3.3281 -4.25
0.5756 0.7326 700 0.5859 -3.4219 -4.0312 0.7012 0.6133 -692.0 -656.0 -8.8125 -9.375
0.5528 0.8373 800 0.5732 -2.8906 -3.5 0.6836 0.6016 -636.0 -604.0 -7.4375 -8.3125
0.5753 0.9419 900 0.5693 -3.0469 -3.7344 0.7168 0.6797 -660.0 -620.0 -7.0 -7.9062
0.2632 1.0466 1000 0.5881 -4.1875 -5.2188 0.7148 1.0312 -808.0 -732.0 -2.875 -4.25
0.2283 1.1512 1100 0.6142 -4.5312 -5.5312 0.7129 0.9961 -840.0 -768.0 -5.375 -7.0625
0.2202 1.2559 1200 0.5943 -4.0938 -5.1875 0.7090 1.0781 -804.0 -724.0 -1.875 -3.375
0.2472 1.3605 1300 0.5995 -4.4375 -5.4062 0.7168 0.9844 -828.0 -760.0 -2.2188 -3.6875
0.2406 1.4652 1400 0.5971 -5.2188 -6.2188 0.7188 1.0156 -908.0 -836.0 -3.875 -5.2812
0.2059 1.5699 1500 0.6052 -5.3438 -6.5312 0.7148 1.1953 -940.0 -848.0 -4.2188 -5.7812
0.2305 1.6745 1600 0.6068 -4.875 -5.9062 0.7188 1.0391 -876.0 -800.0 -5.1562 -6.6875
0.2327 1.7792 1700 0.6141 -5.9375 -7.1562 0.7168 1.2188 -1000.0 -908.0 -4.5 -6.0625
0.2221 1.8838 1800 0.6072 -6.4688 -7.6562 0.7266 1.1875 -1048.0 -960.0 -1.9844 -3.625
0.2153 1.9885 1900 0.5949 -6.5 -7.6875 0.7266 1.1953 -1056.0 -964.0 -3.3125 -4.875
0.0215 2.0931 2000 0.7470 -8.6875 -10.5 0.7246 1.8125 -1336.0 -1184.0 -0.1074 -1.9609
0.0303 2.1978 2100 0.7469 -8.3125 -10.25 0.7031 1.9453 -1312.0 -1144.0 -0.1299 -2.0781
0.0322 2.3025 2200 0.7584 -8.5625 -10.4375 0.7109 1.8828 -1328.0 -1168.0 -0.5156 -2.6094
0.0253 2.4071 2300 0.8087 -9.8125 -11.9375 0.7129 2.125 -1480.0 -1296.0 1.2656 -0.7539
0.0302 2.5118 2400 0.8033 -9.0 -11.0625 0.7246 2.0312 -1392.0 -1216.0 2.2812 0.4395
0.0218 2.6164 2500 0.8603 -11.0 -13.3125 0.7188 2.3125 -1616.0 -1408.0 2.2969 0.5195
0.027 2.7211 2600 0.8162 -9.75 -12.0 0.7402 2.2188 -1488.0 -1288.0 1.0703 -0.9609
0.0274 2.8257 2700 0.8296 -9.75 -12.0 0.7188 2.2188 -1480.0 -1288.0 1.125 -0.9102
0.0369 2.9304 2800 0.8085 -9.5625 -11.875 0.7227 2.3125 -1472.0 -1272.0 0.6289 -1.4531
0.0154 3.0351 2900 0.8779 -9.875 -12.375 0.7266 2.5 -1520.0 -1296.0 0.9609 -1.3125
0.007 3.1397 3000 0.9780 -11.5 -14.375 0.7207 2.875 -1728.0 -1464.0 2.7969 0.6836
0.0059 3.2444 3100 0.9793 -11.125 -14.0 0.7090 2.875 -1688.0 -1424.0 2.2188 0.0258
0.0102 3.3490 3200 0.9823 -11.0625 -13.875 0.7148 2.8281 -1672.0 -1424.0 2.7656 0.7539
0.0082 3.4537 3300 1.0423 -12.1875 -15.1875 0.7051 3.0 -1800.0 -1528.0 3.3281 1.4453
0.0109 3.5583 3400 1.0225 -11.375 -14.375 0.7168 2.9688 -1720.0 -1456.0 2.875 0.8672
0.0098 3.6630 3500 1.0070 -11.4375 -14.25 0.7109 2.8438 -1712.0 -1456.0 3.1875 1.1719
0.007 3.7677 3600 1.0390 -11.9375 -14.9375 0.7148 3.0 -1776.0 -1512.0 2.8594 0.8086
0.0057 3.8723 3700 1.0702 -12.75 -15.8125 0.7031 3.0625 -1864.0 -1584.0 3.4531 1.5
0.0054 3.9770 3800 1.0485 -12.4375 -15.4375 0.7031 3.0 -1832.0 -1560.0 3.4062 1.4688
0.0037 4.0816 3900 1.0905 -12.8125 -15.9375 0.7031 3.1406 -1880.0 -1600.0 3.5469 1.6172
0.0031 4.1863 4000 1.1163 -13.0625 -16.25 0.7012 3.2188 -1912.0 -1616.0 3.6094 1.6562
0.0037 4.2909 4100 1.1256 -13.125 -16.375 0.7090 3.2656 -1920.0 -1624.0 3.6094 1.6562
0.0089 4.3956 4200 1.1395 -13.3125 -16.625 0.7070 3.3125 -1952.0 -1648.0 3.75 1.8125
0.0042 4.5003 4300 1.1512 -13.4375 -16.75 0.7051 3.3438 -1968.0 -1664.0 3.7969 1.8672
0.0094 4.6049 4400 1.1580 -13.5 -16.875 0.7070 3.3594 -1976.0 -1664.0 3.8125 1.8828
0.006 4.7096 4500 1.1593 -13.5625 -17.0 0.7051 3.375 -1984.0 -1672.0 3.8438 1.9219
0.0029 4.8142 4600 1.1617 -13.625 -17.0 0.7051 3.375 -1984.0 -1680.0 3.8594 1.9375
0.0059 4.9189 4700 1.1637 -13.625 -17.0 0.7051 3.375 -1984.0 -1680.0 3.8594 1.9453

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.1.2
  • Datasets 2.18.0
  • Tokenizers 0.19.1
Downloads last month
8
Safetensors
Model size
1.08B params
Tensor type
BF16
·
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.