OpenELM-1_1B-DPO-full-random-pair

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 164.4180
  • Rewards/chosen: -560.0
  • Rewards/rejected: -482.0
  • Rewards/accuracies: 0.4277
  • Rewards/margins: -76.0
  • Logps/rejected: -48640.0
  • Logps/chosen: -56320.0
  • Logits/rejected: 3.1562
  • Logits/chosen: 2.7031

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6914 0.1047 100 0.6966 -0.2227 -0.2197 0.4570 -0.0034 -310.0 -340.0 -9.875 -10.25
0.6914 0.2094 200 4.2809 -11.1875 -10.375 0.4316 -0.8555 -1328.0 -1440.0 -8.6875 -9.1875
0.6914 0.3141 300 95.7161 -324.0 -280.0 0.4258 -44.0 -28416.0 -32768.0 -2.5 -2.5156
0.6914 0.4188 400 99.5534 -338.0 -292.0 0.4277 -46.0 -29440.0 -34048.0 -2.625 -2.6406
0.6914 0.5236 500 103.5082 -352.0 -304.0 0.4277 -48.0 -30592.0 -35328.0 -1.9688 -2.0469
0.6914 0.6283 600 107.6879 -366.0 -316.0 0.4316 -50.0 -31872.0 -36864.0 -1.1328 -1.2656
0.6914 0.7330 700 111.8930 -380.0 -328.0 0.4297 -52.0 -33024.0 -38400.0 -0.5117 -0.7031
0.6914 0.8377 800 116.2988 -394.0 -340.0 0.4355 -54.0 -34304.0 -39680.0 1.3906 1.0781
0.6914 0.9424 900 120.7803 -410.0 -354.0 0.4316 -55.75 -35584.0 -41216.0 1.9062 1.5391
0.6914 1.0471 1000 125.1435 -424.0 -366.0 0.4355 -57.75 -36864.0 -42752.0 4.4688 3.9062
0.6914 1.1518 1100 129.3826 -440.0 -380.0 0.4316 -59.75 -38144.0 -44288.0 4.0312 3.5156
0.6914 1.2565 1200 133.6557 -454.0 -392.0 0.4297 -62.0 -39424.0 -45568.0 3.8438 3.2188
0.6914 1.3613 1300 137.5098 -466.0 -404.0 0.4355 -63.5 -40704.0 -46848.0 2.1094 1.7891
0.6914 1.4660 1400 141.6271 -482.0 -416.0 0.4355 -65.5 -41728.0 -48384.0 3.2344 2.7656
0.6914 1.5707 1500 145.0692 -492.0 -426.0 0.4336 -67.0 -42752.0 -49664.0 3.4844 2.9844
0.6914 1.6754 1600 148.4839 -504.0 -436.0 0.4297 -68.5 -43776.0 -50688.0 3.3281 2.8594
0.6914 1.7801 1700 151.1965 -512.0 -444.0 0.4316 -69.5 -44544.0 -51712.0 3.7031 3.2188
0.6914 1.8848 1800 154.0215 -524.0 -452.0 0.4336 -71.0 -45568.0 -52736.0 4.2188 3.6875
0.6914 1.9895 1900 156.4897 -532.0 -460.0 0.4316 -72.5 -46080.0 -53504.0 3.3125 2.875
0.6914 2.0942 2000 158.3665 -540.0 -466.0 0.4336 -73.0 -46848.0 -54016.0 3.1875 2.75
0.6914 2.1990 2100 160.3225 -544.0 -470.0 0.4297 -74.0 -47360.0 -54784.0 3.3438 2.8906
0.6914 2.3037 2200 161.6044 -548.0 -474.0 0.4316 -74.5 -47616.0 -55296.0 2.7344 2.3594
0.6914 2.4084 2300 162.5378 -552.0 -478.0 0.4316 -75.0 -48128.0 -55552.0 2.8281 2.4062
0.6914 2.5131 2400 163.3184 -556.0 -480.0 0.4336 -75.5 -48128.0 -55808.0 3.0 2.5469
0.6914 2.6178 2500 163.9196 -556.0 -482.0 0.4316 -75.5 -48384.0 -56064.0 3.1875 2.75
0.6914 2.7225 2600 164.2697 -556.0 -482.0 0.4297 -76.0 -48640.0 -56064.0 3.1719 2.7344
0.6914 2.8272 2700 164.3540 -560.0 -482.0 0.4297 -76.0 -48640.0 -56064.0 3.1562 2.7188
0.6914 2.9319 2800 164.4180 -560.0 -482.0 0.4277 -76.0 -48640.0 -56320.0 3.1562 2.7031

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.3.0
  • Datasets 2.21.0
  • Tokenizers 0.19.1
Downloads last month
10
Safetensors
Model size
1.08B params
Tensor type
BF16
·
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.