OpenELM-1_1B-DPO-full-self-improve

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 13.7610
  • Rewards/chosen: -51.0
  • Rewards/rejected: -46.75
  • Rewards/accuracies: 0.4570
  • Rewards/margins: -4.3438
  • Logps/rejected: -4960.0
  • Logps/chosen: -5440.0
  • Logits/rejected: 1.8125
  • Logits/chosen: 0.8477

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.2459 0.1047 100 3.8620 -12.0625 -10.5 0.4531 -1.5391 -1344.0 -1528.0 -5.5625 -5.9688
0.1787 0.2094 200 4.2236 -13.0625 -11.1875 0.4434 -1.9141 -1408.0 -1624.0 -1.0547 -1.8828
0.1064 0.3141 300 5.5584 -19.5 -16.875 0.4336 -2.5156 -1984.0 -2256.0 2.6406 1.8281
0.1114 0.4188 400 5.9626 -21.625 -19.5 0.4473 -2.1094 -2240.0 -2480.0 -2.3906 -3.2969
0.0803 0.5236 500 6.1040 -24.75 -23.375 0.4922 -1.4141 -2624.0 -2800.0 3.6562 2.4844
0.0999 0.6283 600 5.5224 -22.5 -20.375 0.4395 -2.125 -2336.0 -2576.0 2.6719 1.2969
0.0767 0.7330 700 5.9968 -24.25 -22.5 0.4648 -1.6953 -2544.0 -2736.0 0.5781 -0.4414
0.0891 0.8377 800 4.9921 -20.875 -19.125 0.4570 -1.7188 -2208.0 -2400.0 -0.3652 -1.375
0.0907 0.9424 900 3.9869 -17.25 -16.125 0.4785 -1.1328 -1896.0 -2040.0 -2.0781 -3.0469
0.028 1.0471 1000 7.5994 -27.75 -26.0 0.4824 -1.7422 -2896.0 -3104.0 -1.6328 -2.6094
0.0329 1.1518 1100 8.8766 -34.5 -33.0 0.4707 -1.7344 -3584.0 -3776.0 0.8086 -0.2539
0.0288 1.2565 1200 7.4045 -30.25 -27.875 0.4531 -2.3438 -3072.0 -3344.0 0.7969 -0.1514
0.0403 1.3613 1300 6.6099 -27.75 -25.75 0.4531 -1.9844 -2864.0 -3088.0 -2.9688 -3.8125
0.0286 1.4660 1400 12.4327 -43.75 -39.75 0.4688 -3.875 -4288.0 -4672.0 0.9492 -0.0228
0.0237 1.5707 1500 9.6342 -37.0 -33.75 0.4414 -3.25 -3664.0 -4016.0 1.4141 0.3789
0.0231 1.6754 1600 9.6624 -38.25 -34.75 0.4531 -3.5156 -3776.0 -4160.0 1.1016 0.1680
0.0199 1.7801 1700 13.2106 -48.5 -43.75 0.4512 -4.75 -4672.0 -5152.0 1.8438 0.9062
0.0202 1.8848 1800 10.3211 -41.0 -37.75 0.4492 -3.2344 -4080.0 -4416.0 0.6641 -0.2930
0.0305 1.9895 1900 9.0914 -35.5 -33.25 0.4609 -2.0625 -3616.0 -3856.0 -0.5703 -1.5
0.0093 2.0942 2000 12.3840 -45.75 -42.0 0.4512 -3.5938 -4480.0 -4864.0 0.7969 -0.1797
0.006 2.1990 2100 13.6169 -49.5 -45.25 0.4531 -4.2188 -4832.0 -5280.0 1.4062 0.4277
0.0119 2.3037 2200 12.2264 -45.75 -41.75 0.4531 -3.9844 -4480.0 -4896.0 1.4453 0.4785
0.0105 2.4084 2300 12.7440 -47.5 -43.25 0.4531 -4.125 -4608.0 -5056.0 1.4062 0.4570
0.0077 2.5131 2400 13.4844 -50.25 -45.75 0.4512 -4.3125 -4864.0 -5344.0 1.7656 0.8125
0.0149 2.6178 2500 13.7760 -51.0 -46.75 0.4551 -4.3438 -4960.0 -5408.0 1.6562 0.7031
0.0045 2.7225 2600 14.2584 -52.75 -48.25 0.4551 -4.5 -5120.0 -5600.0 1.9766 1.0078
0.0105 2.8272 2700 13.8720 -51.5 -47.0 0.4551 -4.375 -4992.0 -5472.0 1.8203 0.8516
0.0065 2.9319 2800 13.7610 -51.0 -46.75 0.4570 -4.3438 -4960.0 -5440.0 1.8125 0.8477

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.3.0
  • Datasets 3.0.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
1.08B params
Tensor type
BF16
·
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.