---
license: apache-2.0
base_model: TheBloke/OpenHermes-2-Mistral-7B-GPTQ
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: mistral-dpo
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# mistral-dpo

This model is a fine-tuned version of [TheBloke/OpenHermes-2-Mistral-7B-GPTQ](https://huggingface.co/TheBloke/OpenHermes-2-Mistral-7B-GPTQ) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.7175
- Rewards/chosen: 0.5987
- Rewards/rejected: 0.4947
- Rewards/accuracies: 0.5769
- Rewards/margins: 0.1040
- Logps/rejected: -155.3645
- Logps/chosen: -178.6683
- Logits/rejected: -2.3247
- Logits/chosen: -2.3598

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 2
- training_steps: 100
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.7039        | 0.0   | 10   | 0.6929          | 0.0710         | 0.0692           | 0.5                | 0.0018          | -159.6188      | -183.9449    | -2.2886         | -2.3271       |
| 0.68          | 0.0   | 20   | 0.7113          | -0.0478        | -0.0188          | 0.4519             | -0.0290         | -160.4993      | -185.1333    | -2.2979         | -2.3361       |
| 0.8777        | 0.0   | 30   | 0.7367          | -0.3027        | -0.2538          | 0.4904             | -0.0489         | -162.8490      | -187.6822    | -2.3132         | -2.3525       |
| 0.8501        | 0.0   | 40   | 0.7407          | -0.2893        | -0.2458          | 0.4327             | -0.0435         | -162.7690      | -187.5477    | -2.3173         | -2.3556       |
| 0.7253        | 0.0   | 50   | 0.7207          | -0.0228        | -0.0265          | 0.4904             | 0.0037          | -160.5759      | -184.8833    | -2.3167         | -2.3538       |
| 0.7293        | 0.0   | 60   | 0.7066          | 0.1787         | 0.1240           | 0.5673             | 0.0547          | -159.0715      | -182.8687    | -2.3194         | -2.3553       |
| 0.6057        | 0.01  | 70   | 0.6851          | 0.4039         | 0.2915           | 0.5769             | 0.1125          | -157.3963      | -180.6157    | -2.3192         | -2.3543       |
| 0.7169        | 0.01  | 80   | 0.6853          | 0.5467         | 0.4175           | 0.5769             | 0.1291          | -156.1357      | -179.1884    | -2.3219         | -2.3564       |
| 0.6324        | 0.01  | 90   | 0.7046          | 0.5751         | 0.4602           | 0.5769             | 0.1149          | -155.7090      | -178.9038    | -2.3232         | -2.3580       |
| 0.5915        | 0.01  | 100  | 0.7175          | 0.5987         | 0.4947           | 0.5769             | 0.1040          | -155.3645      | -178.6683    | -2.3247         | -2.3598       |


### Framework versions

- Transformers 4.35.2
- Pytorch 2.0.1+cu117
- Datasets 2.15.0
- Tokenizers 0.15.0