---
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.2
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: mistralit2_1000_STEPS_1e6_05_beta_DPO
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# mistralit2_1000_STEPS_1e6_05_beta_DPO

This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.7261
- Rewards/chosen: -2.7031
- Rewards/rejected: -5.5561
- Rewards/accuracies: 0.5890
- Rewards/margins: 2.8530
- Logps/rejected: -39.6846
- Logps/chosen: -28.7920
- Logits/rejected: -2.5943
- Logits/chosen: -2.5947

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 4
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.7251        | 0.1   | 50   | 0.8837          | 0.1755         | -0.1037          | 0.4901             | 0.2792          | -28.7799       | -23.0348     | -2.8359         | -2.8362       |
| 0.9163        | 0.2   | 100  | 1.7788          | -4.6432        | -6.2118          | 0.5231             | 1.5686          | -40.9959       | -32.6723     | -2.6192         | -2.6196       |
| 2.5499        | 0.29  | 150  | 1.9611          | -3.8807        | -4.8711          | 0.5033             | 0.9904          | -38.3145       | -31.1472     | -2.8718         | -2.8723       |
| 1.6289        | 0.39  | 200  | 2.1262          | -4.2615        | -4.3039          | 0.4462             | 0.0423          | -37.1802       | -31.9089     | -2.5439         | -2.5442       |
| 2.3907        | 0.49  | 250  | 2.1527          | -2.9174        | -2.6939          | 0.4527             | -0.2235         | -33.9602       | -29.2207     | -2.7643         | -2.7646       |
| 1.4887        | 0.59  | 300  | 2.2144          | -2.7649        | -3.3119          | 0.4725             | 0.5470          | -35.1962       | -28.9157     | -2.7607         | -2.7611       |
| 1.9594        | 0.68  | 350  | 2.1934          | -0.0315        | 0.0006           | 0.4593             | -0.0322         | -28.5711       | -23.4489     | -2.6191         | -2.6193       |
| 2.1399        | 0.78  | 400  | 1.9044          | -4.4917        | -5.1288          | 0.4989             | 0.6371          | -38.8300       | -32.3693     | -2.8491         | -2.8494       |
| 1.1937        | 0.88  | 450  | 1.9658          | -2.8086        | -3.5888          | 0.4989             | 0.7802          | -35.7500       | -29.0030     | -2.8330         | -2.8333       |
| 1.6222        | 0.98  | 500  | 1.8626          | -2.3058        | -3.5222          | 0.5363             | 1.2164          | -35.6167       | -27.9974     | -2.7302         | -2.7305       |
| 0.5066        | 1.07  | 550  | 1.8660          | -2.9490        | -5.0994          | 0.5758             | 2.1504          | -38.7712       | -29.2838     | -2.7083         | -2.7087       |
| 0.4413        | 1.17  | 600  | 1.7645          | -4.3370        | -6.8789          | 0.5868             | 2.5419          | -42.3302       | -32.0597     | -2.6355         | -2.6360       |
| 0.2726        | 1.27  | 650  | 1.7971          | -1.8488        | -4.1281          | 0.5780             | 2.2793          | -36.8285       | -27.0834     | -2.6083         | -2.6085       |
| 0.2803        | 1.37  | 700  | 1.7498          | -2.2886        | -4.8524          | 0.5802             | 2.5639          | -38.2772       | -27.9629     | -2.6089         | -2.6092       |
| 0.199         | 1.46  | 750  | 1.7383          | -2.5467        | -5.2810          | 0.5868             | 2.7343          | -39.1343       | -28.4792     | -2.5998         | -2.6002       |
| 0.2405        | 1.56  | 800  | 1.7280          | -2.4873        | -5.2804          | 0.5890             | 2.7931          | -39.1332       | -28.3604     | -2.5980         | -2.5984       |
| 0.2125        | 1.66  | 850  | 1.7269          | -2.6426        | -5.4648          | 0.5846             | 2.8223          | -39.5021       | -28.6710     | -2.5949         | -2.5953       |
| 0.3193        | 1.76  | 900  | 1.7253          | -2.6905        | -5.5366          | 0.5912             | 2.8461          | -39.6456       | -28.7668     | -2.5945         | -2.5949       |
| 0.3209        | 1.86  | 950  | 1.7242          | -2.6996        | -5.5548          | 0.5912             | 2.8552          | -39.6820       | -28.7851     | -2.5942         | -2.5946       |
| 0.278         | 1.95  | 1000 | 1.7261          | -2.7031        | -5.5561          | 0.5890             | 2.8530          | -39.6846       | -28.7920     | -2.5943         | -2.5947       |


### Framework versions

- Transformers 4.38.2
- Pytorch 2.0.0+cu117
- Datasets 2.18.0
- Tokenizers 0.15.2