mistral-dpo / README.md
sonu2023's picture
aritrasen/mistral-dpo
5dc6a9f verified
metadata
license: apache-2.0
base_model: TheBloke/OpenHermes-2-Mistral-7B-GPTQ
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: mistral-dpo
    results: []

mistral-dpo

This model is a fine-tuned version of TheBloke/OpenHermes-2-Mistral-7B-GPTQ on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.7175
  • Rewards/chosen: 0.5987
  • Rewards/rejected: 0.4947
  • Rewards/accuracies: 0.5769
  • Rewards/margins: 0.1040
  • Logps/rejected: -155.3645
  • Logps/chosen: -178.6683
  • Logits/rejected: -2.3247
  • Logits/chosen: -2.3598

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 2
  • training_steps: 100
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.7039 0.0 10 0.6929 0.0710 0.0692 0.5 0.0018 -159.6188 -183.9449 -2.2886 -2.3271
0.68 0.0 20 0.7113 -0.0478 -0.0188 0.4519 -0.0290 -160.4993 -185.1333 -2.2979 -2.3361
0.8777 0.0 30 0.7367 -0.3027 -0.2538 0.4904 -0.0489 -162.8490 -187.6822 -2.3132 -2.3525
0.8501 0.0 40 0.7407 -0.2893 -0.2458 0.4327 -0.0435 -162.7690 -187.5477 -2.3173 -2.3556
0.7253 0.0 50 0.7207 -0.0228 -0.0265 0.4904 0.0037 -160.5759 -184.8833 -2.3167 -2.3538
0.7293 0.0 60 0.7066 0.1787 0.1240 0.5673 0.0547 -159.0715 -182.8687 -2.3194 -2.3553
0.6057 0.01 70 0.6851 0.4039 0.2915 0.5769 0.1125 -157.3963 -180.6157 -2.3192 -2.3543
0.7169 0.01 80 0.6853 0.5467 0.4175 0.5769 0.1291 -156.1357 -179.1884 -2.3219 -2.3564
0.6324 0.01 90 0.7046 0.5751 0.4602 0.5769 0.1149 -155.7090 -178.9038 -2.3232 -2.3580
0.5915 0.01 100 0.7175 0.5987 0.4947 0.5769 0.1040 -155.3645 -178.6683 -2.3247 -2.3598

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.0.1+cu117
  • Datasets 2.15.0
  • Tokenizers 0.15.0