tsavage68's picture
End of training
26a8413 verified
metadata
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.1
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: v1_1000_STEPS_1e5_rate_01_beta_DPO
    results: []

v1_1000_STEPS_1e5_rate_01_beta_DPO

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1843
  • Rewards/chosen: -4.7599
  • Rewards/rejected: -4.6623
  • Rewards/accuracies: 0.4484
  • Rewards/margins: -0.0976
  • Logps/rejected: -63.5028
  • Logps/chosen: -62.8524
  • Logits/rejected: -5.1435
  • Logits/chosen: -5.1435

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
1.0273 0.05 50 1.0978 -4.1663 -4.0603 0.3868 -0.1060 -57.4823 -56.9158 -3.3835 -3.3835
1.4867 0.1 100 1.6188 -8.0229 -8.1216 0.4659 0.0987 -98.0960 -95.4825 -3.5244 -3.5244
1.3148 0.15 150 1.2591 -5.2353 -5.1881 0.4637 -0.0472 -68.7606 -67.6062 -3.9106 -3.9106
1.5023 0.2 200 1.2648 -5.2617 -5.1757 0.4462 -0.0860 -68.6362 -67.8701 -3.6954 -3.6954
1.1555 0.24 250 1.2170 -4.8688 -4.7930 0.4505 -0.0758 -64.8100 -63.9413 -3.9459 -3.9459
1.4516 0.29 300 1.2267 -5.0560 -4.9621 0.4286 -0.0940 -66.5001 -65.8132 -4.3050 -4.3050
1.2594 0.34 350 1.2134 -4.8394 -4.7583 0.4440 -0.0811 -64.4627 -63.6475 -4.6928 -4.6928
2.0063 0.39 400 1.1957 -4.7217 -4.6265 0.4462 -0.0952 -63.1444 -62.4697 -5.2439 -5.2439
1.8188 0.44 450 1.2070 -4.9169 -4.8208 0.4462 -0.0962 -65.0873 -64.4224 -5.5926 -5.5926
1.6531 0.49 500 1.2595 -5.3935 -5.3580 0.4374 -0.0356 -70.4593 -69.1884 -2.4458 -2.4457
1.6375 0.54 550 1.2036 -4.7802 -4.6765 0.4418 -0.1036 -63.6449 -63.0546 -5.4670 -5.4670
1.0633 0.59 600 1.2013 -4.7613 -4.6562 0.4396 -0.1051 -63.4414 -62.8664 -5.4292 -5.4292
1.7188 0.64 650 1.1996 -4.7873 -4.6740 0.4484 -0.1133 -63.6198 -63.1264 -5.5429 -5.5429
1.5469 0.68 700 1.1910 -4.7298 -4.6299 0.4484 -0.1000 -63.1784 -62.5515 -5.2089 -5.2089
1.0102 0.73 750 1.1953 -4.7801 -4.6716 0.4462 -0.1085 -63.5956 -63.0540 -5.6196 -5.6196
0.8289 0.78 800 1.1935 -4.7729 -4.6677 0.4484 -0.1051 -63.5568 -62.9817 -5.5697 -5.5697
1.8281 0.83 850 1.1860 -4.7551 -4.6562 0.4484 -0.0989 -63.4419 -62.8043 -5.1995 -5.1995
1.193 0.88 900 1.1845 -4.7609 -4.6632 0.4484 -0.0977 -63.5115 -62.8620 -5.1522 -5.1522
1.6672 0.93 950 1.1844 -4.7599 -4.6622 0.4484 -0.0977 -63.5018 -62.8523 -5.1417 -5.1417
1.4906 0.98 1000 1.1843 -4.7599 -4.6623 0.4484 -0.0976 -63.5028 -62.8524 -5.1435 -5.1435

Framework versions

  • Transformers 4.39.1
  • Pytorch 2.0.0+cu117
  • Datasets 2.18.0
  • Tokenizers 0.15.2