zephyr-7b-dpo-qlora-no-sft

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5486
  • Rewards/chosen: -1.4557
  • Rewards/rejected: -2.2032
  • Rewards/accuracies: 0.7090
  • Rewards/margins: 0.7475
  • Logps/rejected: -484.1859
  • Logps/chosen: -430.8606
  • Logits/rejected: 0.8536
  • Logits/chosen: 0.8124

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 1
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6934 0.01 100 0.6930 0.0021 0.0018 0.5120 0.0003 -263.6873 -285.0847 -2.5761 -2.6081
0.6921 0.03 200 0.6923 0.0064 0.0047 0.5820 0.0017 -263.3970 -284.6488 -2.5766 -2.6089
0.6913 0.04 300 0.6910 0.0127 0.0083 0.6195 0.0044 -263.0383 -284.0253 -2.5774 -2.6105
0.6888 0.05 400 0.6894 0.0235 0.0157 0.6210 0.0077 -262.2991 -282.9474 -2.5778 -2.6114
0.6881 0.07 500 0.6866 0.0322 0.0186 0.6220 0.0136 -262.0058 -282.0685 -2.5648 -2.6011
0.6848 0.08 600 0.6829 0.0391 0.0173 0.6230 0.0218 -262.1442 -281.3836 -2.5621 -2.6006
0.6706 0.09 700 0.6776 0.0515 0.0169 0.6135 0.0346 -262.1758 -280.1425 -2.5437 -2.5861
0.6544 0.1 800 0.6650 -0.0843 -0.1603 0.6065 0.0760 -279.8956 -293.7216 -2.5208 -2.5676
0.668 0.12 900 0.6552 -0.1689 -0.2798 0.6170 0.1109 -291.8528 -302.1819 -2.5180 -2.5613
0.6285 0.13 1000 0.6457 -0.5287 -0.7121 0.6290 0.1834 -335.0806 -338.1635 -2.4563 -2.4939
0.6741 0.14 1100 0.6396 -0.7030 -0.9481 0.6305 0.2452 -358.6847 -355.5893 -2.2815 -2.3227
0.605 0.16 1200 0.6279 -0.7077 -0.9713 0.6375 0.2636 -360.9963 -356.0601 -2.2198 -2.2608
0.5844 0.17 1300 0.6228 -0.8502 -1.1414 0.6410 0.2912 -378.0121 -370.3147 -2.0337 -2.0743
0.6085 0.18 1400 0.6157 -0.6163 -0.8963 0.6565 0.2799 -353.4970 -346.9268 -1.9276 -1.9742
0.5887 0.2 1500 0.6093 -1.0534 -1.4085 0.6585 0.3551 -404.7234 -390.6338 -1.5130 -1.5476
0.5585 0.21 1600 0.6020 -0.8558 -1.2372 0.6645 0.3814 -387.5893 -370.8767 -1.4216 -1.4652
0.5417 0.22 1700 0.5937 -0.7787 -1.1648 0.6640 0.3860 -380.3489 -363.1672 -1.3190 -1.3614
0.5691 0.24 1800 0.5964 -1.0690 -1.5628 0.6705 0.4938 -420.1472 -392.1945 -0.7433 -0.7891
0.5869 0.25 1900 0.5931 -1.4234 -1.8618 0.6700 0.4384 -450.0478 -427.6318 -0.5757 -0.5963
0.6732 0.26 2000 0.5928 -0.7320 -1.1323 0.6765 0.4002 -377.0961 -358.4945 -0.8928 -0.9596
0.5453 0.27 2100 0.5812 -1.2215 -1.6723 0.6770 0.4508 -431.1005 -407.4461 -0.3057 -0.3325
0.5521 0.29 2200 0.5773 -0.9855 -1.4907 0.6775 0.5052 -412.9417 -383.8439 -0.0835 -0.1059
0.5352 0.3 2300 0.5821 -1.0780 -1.5279 0.6885 0.4500 -416.6599 -393.0880 -0.2117 -0.2432
0.4291 0.31 2400 0.5800 -1.3780 -1.9871 0.6785 0.6091 -462.5805 -423.0901 0.1802 0.1741
0.5324 0.33 2500 0.5709 -1.0291 -1.5875 0.6765 0.5584 -422.6171 -388.1980 0.0904 0.0751
0.5659 0.34 2600 0.5640 -1.2533 -1.8232 0.6985 0.5699 -446.1898 -410.6243 0.3281 0.3241
0.5041 0.35 2700 0.5737 -1.7469 -2.3921 0.6865 0.6452 -503.0828 -459.9810 0.5911 0.5924
0.5754 0.37 2800 0.5716 -1.6382 -2.2298 0.6885 0.5915 -486.8488 -449.1171 0.6424 0.6612
0.6073 0.38 2900 0.5731 -1.5512 -2.2130 0.6815 0.6618 -485.1724 -440.4115 0.7017 0.6979
0.6283 0.39 3000 0.5645 -1.3105 -1.9937 0.6860 0.6832 -463.2372 -416.3378 0.6221 0.5951
0.5199 0.41 3100 0.5585 -1.1618 -1.7386 0.6940 0.5768 -437.7283 -401.4741 0.4404 0.4092
0.5658 0.42 3200 0.5603 -1.1916 -1.7704 0.6960 0.5788 -440.9099 -404.4548 0.3075 0.2535
0.6214 0.43 3300 0.5605 -1.3366 -1.9673 0.6925 0.6307 -460.5986 -418.9480 0.6742 0.6564
0.581 0.44 3400 0.5563 -1.1359 -1.7683 0.6985 0.6324 -440.7018 -398.8812 0.5839 0.5449
0.5422 0.46 3500 0.5590 -1.0364 -1.6150 0.6915 0.5786 -425.3734 -388.9318 0.5735 0.5330
0.5626 0.47 3600 0.5602 -1.1120 -1.7501 0.6910 0.6381 -438.8792 -396.4902 0.7862 0.7520
0.627 0.48 3700 0.5579 -1.2845 -1.9488 0.6935 0.6644 -458.7537 -413.7391 0.8809 0.8576
0.5522 0.5 3800 0.5562 -1.3810 -2.0706 0.6975 0.6896 -470.9312 -423.3916 0.9118 0.8745
0.5734 0.51 3900 0.5557 -1.3964 -2.0908 0.6970 0.6943 -472.9462 -424.9361 0.7969 0.7417
0.612 0.52 4000 0.5548 -1.6249 -2.3232 0.7075 0.6982 -496.1850 -447.7854 0.8941 0.8718
0.5357 0.54 4100 0.5587 -1.1962 -1.8866 0.6995 0.6904 -452.5338 -404.9135 0.5836 0.5102
0.5648 0.55 4200 0.5570 -1.3147 -2.0461 0.6940 0.7314 -468.4804 -416.7626 0.7063 0.6440
0.5237 0.56 4300 0.5515 -1.5027 -2.2087 0.7030 0.7060 -484.7385 -435.5629 0.8569 0.8282
0.5979 0.58 4400 0.5594 -1.6981 -2.4801 0.7040 0.7820 -511.8796 -455.1061 0.9415 0.9060
0.4859 0.59 4500 0.5530 -1.5910 -2.3517 0.7080 0.7607 -499.0415 -444.3948 0.9399 0.9057
0.5484 0.6 4600 0.5525 -1.5159 -2.2439 0.7055 0.7280 -488.2595 -436.8822 0.8711 0.8268
0.6135 0.62 4700 0.5504 -1.3255 -2.0246 0.7065 0.6990 -466.3248 -417.8462 0.7736 0.7222
0.5714 0.63 4800 0.5501 -1.4736 -2.1670 0.7070 0.6934 -480.5717 -432.6558 0.8649 0.8370
0.517 0.64 4900 0.5531 -1.6509 -2.4069 0.7090 0.7560 -504.5561 -450.3797 0.9735 0.9524
0.4862 0.65 5000 0.5524 -1.5409 -2.2932 0.7080 0.7523 -493.1930 -439.3873 0.9138 0.8849
0.6176 0.67 5100 0.5519 -1.4759 -2.2276 0.7020 0.7516 -486.6266 -432.8859 0.8785 0.8443
0.5514 0.68 5200 0.5500 -1.4083 -2.1357 0.7025 0.7274 -477.4418 -426.1200 0.8299 0.7894
0.5166 0.69 5300 0.5508 -1.4154 -2.1510 0.7040 0.7356 -478.9723 -426.8324 0.8441 0.8065
0.4918 0.71 5400 0.5496 -1.4093 -2.1290 0.7090 0.7197 -476.7667 -426.2183 0.8313 0.7905
0.596 0.72 5500 0.5489 -1.4890 -2.2221 0.7075 0.7332 -486.0821 -434.1885 0.8632 0.8239
0.6034 0.73 5600 0.5489 -1.4048 -2.1338 0.7065 0.7290 -477.2522 -425.7730 0.8041 0.7561
0.4793 0.75 5700 0.5495 -1.5017 -2.2541 0.7080 0.7524 -489.2809 -435.4676 0.8918 0.8545
0.5164 0.76 5800 0.5497 -1.5548 -2.3215 0.7085 0.7667 -496.0150 -440.7685 0.9221 0.8885
0.6164 0.77 5900 0.5491 -1.5335 -2.2884 0.7080 0.7549 -492.7101 -438.6432 0.8987 0.8645
0.5347 0.79 6000 0.5487 -1.5028 -2.2487 0.7105 0.7459 -488.7427 -435.5721 0.8766 0.8397
0.56 0.8 6100 0.5491 -1.4855 -2.2337 0.7105 0.7482 -487.2426 -433.8429 0.8643 0.8248
0.587 0.81 6200 0.5491 -1.4638 -2.2111 0.7095 0.7473 -484.9788 -431.6711 0.8489 0.8072
0.4927 0.82 6300 0.5490 -1.4591 -2.2082 0.7090 0.7491 -484.6881 -431.2039 0.8531 0.8118
0.6102 0.84 6400 0.5486 -1.4462 -2.1928 0.7105 0.7466 -483.1518 -429.9117 0.8474 0.8055
0.4988 0.85 6500 0.5485 -1.4482 -2.1938 0.7095 0.7456 -483.2466 -430.1142 0.8464 0.8046
0.5544 0.86 6600 0.5486 -1.4491 -2.1949 0.7115 0.7458 -483.3600 -430.1988 0.8487 0.8068
0.5828 0.88 6700 0.5486 -1.4518 -2.1981 0.7100 0.7463 -483.6802 -430.4771 0.8512 0.8097
0.5711 0.89 6800 0.5485 -1.4557 -2.2030 0.7095 0.7473 -484.1660 -430.8610 0.8538 0.8124
0.5621 0.9 6900 0.5484 -1.4557 -2.2035 0.7125 0.7478 -484.2229 -430.8625 0.8535 0.8119
0.5093 0.92 7000 0.5485 -1.4555 -2.2030 0.7095 0.7475 -484.1658 -430.8411 0.8539 0.8128
0.4665 0.93 7100 0.5485 -1.4561 -2.2038 0.7100 0.7477 -484.2509 -430.9035 0.8539 0.8128
0.6276 0.94 7200 0.5486 -1.4556 -2.2033 0.7110 0.7476 -484.1955 -430.8554 0.8539 0.8130
0.457 0.96 7300 0.5486 -1.4547 -2.2022 0.7110 0.7475 -484.0942 -430.7640 0.8540 0.8129
0.5436 0.97 7400 0.5486 -1.4557 -2.2035 0.7130 0.7478 -484.2209 -430.8634 0.8541 0.8130
0.4801 0.98 7500 0.5486 -1.4555 -2.2033 0.7125 0.7478 -484.1994 -430.8404 0.8538 0.8125
0.5922 0.99 7600 0.5486 -1.4555 -2.2032 0.7100 0.7477 -484.1860 -430.8414 0.8537 0.8124

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.2
  • Pytorch 2.1.2
  • Datasets 2.14.6
  • Tokenizers 0.15.0
Downloads last month
31
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for dball/zephyr-7b-dpo-qlora-no-sft

Adapter
(1175)
this model

Dataset used to train dball/zephyr-7b-dpo-qlora-no-sft