--- license: apache-2.0 base_model: mistralai/Mistral-7B-Instruct-v0.2 tags: - trl - dpo - generated_from_trainer model-index: - name: mistralit2_1000_STEPS_1e6_05_beta_DPO results: [] --- # mistralit2_1000_STEPS_1e6_05_beta_DPO This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 1.7261 - Rewards/chosen: -2.7031 - Rewards/rejected: -5.5561 - Rewards/accuracies: 0.5890 - Rewards/margins: 2.8530 - Logps/rejected: -39.6846 - Logps/chosen: -28.7920 - Logits/rejected: -2.5943 - Logits/chosen: -2.5947 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-06 - train_batch_size: 4 - eval_batch_size: 1 - seed: 42 - gradient_accumulation_steps: 2 - total_train_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 100 - training_steps: 1000 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.7251 | 0.1 | 50 | 0.8837 | 0.1755 | -0.1037 | 0.4901 | 0.2792 | -28.7799 | -23.0348 | -2.8359 | -2.8362 | | 0.9163 | 0.2 | 100 | 1.7788 | -4.6432 | -6.2118 | 0.5231 | 1.5686 | -40.9959 | -32.6723 | -2.6192 | -2.6196 | | 2.5499 | 0.29 | 150 | 1.9611 | -3.8807 | -4.8711 | 0.5033 | 0.9904 | -38.3145 | -31.1472 | -2.8718 | -2.8723 | | 1.6289 | 0.39 | 200 | 2.1262 | -4.2615 | -4.3039 | 0.4462 | 0.0423 | -37.1802 | -31.9089 | -2.5439 | -2.5442 | | 2.3907 | 0.49 | 250 | 2.1527 | -2.9174 | -2.6939 | 0.4527 | -0.2235 | -33.9602 | -29.2207 | -2.7643 | -2.7646 | | 1.4887 | 0.59 | 300 | 2.2144 | -2.7649 | -3.3119 | 0.4725 | 0.5470 | -35.1962 | -28.9157 | -2.7607 | -2.7611 | | 1.9594 | 0.68 | 350 | 2.1934 | -0.0315 | 0.0006 | 0.4593 | -0.0322 | -28.5711 | -23.4489 | -2.6191 | -2.6193 | | 2.1399 | 0.78 | 400 | 1.9044 | -4.4917 | -5.1288 | 0.4989 | 0.6371 | -38.8300 | -32.3693 | -2.8491 | -2.8494 | | 1.1937 | 0.88 | 450 | 1.9658 | -2.8086 | -3.5888 | 0.4989 | 0.7802 | -35.7500 | -29.0030 | -2.8330 | -2.8333 | | 1.6222 | 0.98 | 500 | 1.8626 | -2.3058 | -3.5222 | 0.5363 | 1.2164 | -35.6167 | -27.9974 | -2.7302 | -2.7305 | | 0.5066 | 1.07 | 550 | 1.8660 | -2.9490 | -5.0994 | 0.5758 | 2.1504 | -38.7712 | -29.2838 | -2.7083 | -2.7087 | | 0.4413 | 1.17 | 600 | 1.7645 | -4.3370 | -6.8789 | 0.5868 | 2.5419 | -42.3302 | -32.0597 | -2.6355 | -2.6360 | | 0.2726 | 1.27 | 650 | 1.7971 | -1.8488 | -4.1281 | 0.5780 | 2.2793 | -36.8285 | -27.0834 | -2.6083 | -2.6085 | | 0.2803 | 1.37 | 700 | 1.7498 | -2.2886 | -4.8524 | 0.5802 | 2.5639 | -38.2772 | -27.9629 | -2.6089 | -2.6092 | | 0.199 | 1.46 | 750 | 1.7383 | -2.5467 | -5.2810 | 0.5868 | 2.7343 | -39.1343 | -28.4792 | -2.5998 | -2.6002 | | 0.2405 | 1.56 | 800 | 1.7280 | -2.4873 | -5.2804 | 0.5890 | 2.7931 | -39.1332 | -28.3604 | -2.5980 | -2.5984 | | 0.2125 | 1.66 | 850 | 1.7269 | -2.6426 | -5.4648 | 0.5846 | 2.8223 | -39.5021 | -28.6710 | -2.5949 | -2.5953 | | 0.3193 | 1.76 | 900 | 1.7253 | -2.6905 | -5.5366 | 0.5912 | 2.8461 | -39.6456 | -28.7668 | -2.5945 | -2.5949 | | 0.3209 | 1.86 | 950 | 1.7242 | -2.6996 | -5.5548 | 0.5912 | 2.8552 | -39.6820 | -28.7851 | -2.5942 | -2.5946 | | 0.278 | 1.95 | 1000 | 1.7261 | -2.7031 | -5.5561 | 0.5890 | 2.8530 | -39.6846 | -28.7920 | -2.5943 | -2.5947 | ### Framework versions - Transformers 4.38.2 - Pytorch 2.0.0+cu117 - Datasets 2.18.0 - Tokenizers 0.15.2