mistral-dpo

This model is a fine-tuned version of TheBloke/Mistral-7B-v0.1-GPTQ on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5603
  • Rewards/chosen: -12.5467
  • Rewards/rejected: -28.4037
  • Rewards/accuracies: 0.8571
  • Rewards/margins: 15.8571
  • Logps/rejected: -411.7001
  • Logps/chosen: -215.4742
  • Logits/rejected: -0.7509
  • Logits/chosen: -0.7707

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 2
  • training_steps: 250
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6785 0.02 10 0.6291 -0.0030 -0.1321 0.875 0.1291 -128.9836 -90.0372 -2.3988 -2.3489
0.5661 0.04 20 0.4421 0.0008 -0.6608 0.875 0.6616 -134.2708 -89.9997 -2.3613 -2.3042
0.3257 0.06 30 0.3584 -0.7139 -2.3035 0.8393 1.5897 -150.6985 -97.1463 -2.2995 -2.2546
0.3563 0.08 40 0.5522 -3.0636 -6.7067 0.8214 3.6431 -194.7305 -120.6441 -2.1396 -2.0849
0.0013 0.1 50 1.3365 -8.4317 -16.1649 0.8036 7.7332 -289.3120 -174.3246 -1.8243 -1.7710
0.0277 0.12 60 2.4224 -14.8512 -25.9570 0.8214 11.1059 -387.2331 -238.5192 -1.5464 -1.4950
1.5742 0.14 70 3.1075 -17.8751 -29.6755 0.8214 11.8004 -424.4176 -268.7585 -1.4071 -1.3681
14.1036 0.16 80 3.6238 -20.4205 -32.7881 0.8214 12.3675 -455.5435 -294.2129 -1.3237 -1.2729
8.531 0.18 90 3.7123 -21.7863 -36.0729 0.8214 14.2866 -488.3922 -307.8707 -1.2975 -1.2388
4.6429 0.2 100 2.0394 -16.6472 -29.8508 0.8393 13.2036 -426.1712 -256.4797 -1.3189 -1.2784
0.0565 0.22 110 1.6331 -14.8501 -27.2015 0.8393 12.3514 -399.6779 -238.5090 -1.2425 -1.2118
0.0056 0.24 120 1.4774 -15.0784 -28.0012 0.8214 12.9228 -407.6750 -240.7916 -1.0819 -1.0579
0.0804 0.26 130 1.5398 -16.7630 -30.6346 0.8393 13.8716 -434.0091 -257.6378 -1.0054 -0.9846
0.0001 0.28 140 1.5159 -17.9940 -33.3459 0.8393 15.3520 -461.1225 -269.9474 -0.8887 -0.8844
0.0 0.3 150 1.5062 -18.4614 -34.3481 0.8393 15.8868 -471.1445 -274.6213 -0.8496 -0.8503
0.0 0.32 160 1.5035 -18.6474 -34.7165 0.8393 16.0692 -474.8286 -276.4815 -0.8343 -0.8367
4.2123 0.34 170 1.2949 -17.3471 -32.6721 0.8571 15.3250 -454.3839 -263.4789 -0.8672 -0.8661
2.13 0.36 180 0.9892 -15.2178 -30.1177 0.8571 14.8999 -428.8398 -242.1859 -0.8992 -0.9047
2.0146 0.38 190 0.8365 -13.9461 -28.5983 0.8571 14.6522 -413.6459 -229.4683 -0.9104 -0.9224
0.0706 0.4 200 0.7897 -14.5198 -29.8989 0.8571 15.3791 -426.6525 -235.2058 -0.8064 -0.8224
5.2517 0.42 210 0.6621 -13.7049 -29.2354 0.8571 15.5305 -420.0170 -227.0569 -0.7981 -0.8124
0.0499 0.44 220 0.5752 -12.8733 -28.5299 0.8571 15.6566 -412.9616 -218.7403 -0.7801 -0.7990
0.5779 0.46 230 0.5611 -12.6633 -28.3836 0.8571 15.7203 -411.4988 -216.6405 -0.7789 -0.7975
0.0322 0.48 240 0.5624 -12.6348 -28.4766 0.8571 15.8418 -412.4289 -216.3556 -0.7696 -0.7878
0.1347 0.5 250 0.5603 -12.5467 -28.4037 0.8571 15.8571 -411.7001 -215.4742 -0.7509 -0.7707

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.2
  • Pytorch 2.0.1+cu118
  • Datasets 2.15.0
  • Tokenizers 0.15.0
Downloads last month
3
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for Belred/mistral-dpo

Adapter
(27)
this model