---
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.1
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: v1_1000_STEPS_1e6_rate_03_beta_DPO2
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# v1_1000_STEPS_1e6_rate_03_beta_DPO2

This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.8487
- Rewards/chosen: -2.2151
- Rewards/rejected: -3.0240
- Rewards/accuracies: 0.5758
- Rewards/margins: 0.8089
- Logps/rejected: -26.9594
- Logps/chosen: -22.6366
- Logits/rejected: -3.2869
- Logits/chosen: -3.2870

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.7016        | 0.05  | 50   | 0.6686          | -0.1010        | -0.1780          | 0.5516             | 0.0771          | -17.4730       | -15.5896     | -3.3830         | -3.3830       |
| 0.6727        | 0.1   | 100  | 0.7732          | -1.2186        | -1.6229          | 0.5275             | 0.4043          | -22.2891       | -19.3150     | -3.3350         | -3.3352       |
| 1.2098        | 0.15  | 150  | 0.9205          | -1.6685        | -2.0242          | 0.5209             | 0.3558          | -23.6270       | -20.8147     | -3.3998         | -3.4000       |
| 0.8607        | 0.2   | 200  | 0.9312          | -1.7362        | -2.0915          | 0.5099             | 0.3553          | -23.8513       | -21.0405     | -3.3324         | -3.3326       |
| 0.896         | 0.24  | 250  | 0.9765          | -1.8658        | -2.0921          | 0.5011             | 0.2263          | -23.8533       | -21.4723     | -3.2214         | -3.2215       |
| 0.9783        | 0.29  | 300  | 0.9234          | -1.9658        | -2.3835          | 0.5165             | 0.4177          | -24.8244       | -21.8057     | -3.3158         | -3.3160       |
| 1.0592        | 0.34  | 350  | 0.9509          | -3.1300        | -3.4037          | 0.5033             | 0.2738          | -28.2253       | -25.6863     | -3.2697         | -3.2698       |
| 1.0391        | 0.39  | 400  | 0.9067          | -2.4562        | -2.8182          | 0.5231             | 0.3619          | -26.2735       | -23.4405     | -3.3616         | -3.3617       |
| 0.9409        | 0.44  | 450  | 0.9081          | -2.8095        | -3.1865          | 0.5231             | 0.3771          | -27.5014       | -24.6179     | -3.3324         | -3.3325       |
| 0.8139        | 0.49  | 500  | 0.9131          | -2.8071        | -3.2564          | 0.5560             | 0.4493          | -27.7343       | -24.6100     | -3.3362         | -3.3363       |
| 0.8732        | 0.54  | 550  | 0.8745          | -2.3409        | -3.0357          | 0.5516             | 0.6948          | -26.9986       | -23.0562     | -3.3124         | -3.3125       |
| 0.8179        | 0.59  | 600  | 0.8632          | -2.1460        | -2.9478          | 0.5692             | 0.8018          | -26.7055       | -22.4063     | -3.3039         | -3.3040       |
| 0.825         | 0.64  | 650  | 0.8769          | -1.9605        | -2.7326          | 0.5626             | 0.7721          | -25.9882       | -21.7879     | -3.3006         | -3.3007       |
| 0.7539        | 0.68  | 700  | 0.8600          | -2.1758        | -2.9531          | 0.5714             | 0.7773          | -26.7232       | -22.5059     | -3.2794         | -3.2795       |
| 0.7835        | 0.73  | 750  | 0.8551          | -2.2525        | -3.0394          | 0.5692             | 0.7868          | -27.0107       | -22.7614     | -3.2905         | -3.2906       |
| 0.925         | 0.78  | 800  | 0.8479          | -2.2131        | -3.0235          | 0.5736             | 0.8105          | -26.9579       | -22.6299     | -3.2902         | -3.2903       |
| 1.0166        | 0.83  | 850  | 0.8493          | -2.2090        | -3.0157          | 0.5780             | 0.8067          | -26.9319       | -22.6164     | -3.2872         | -3.2873       |
| 1.0711        | 0.88  | 900  | 0.8480          | -2.2126        | -3.0221          | 0.5758             | 0.8095          | -26.9532       | -22.6283     | -3.2869         | -3.2870       |
| 0.9928        | 0.93  | 950  | 0.8487          | -2.2161        | -3.0255          | 0.5802             | 0.8094          | -26.9646       | -22.6400     | -3.2869         | -3.2870       |
| 0.6707        | 0.98  | 1000 | 0.8487          | -2.2151        | -3.0240          | 0.5758             | 0.8089          | -26.9594       | -22.6366     | -3.2869         | -3.2870       |


### Framework versions

- Transformers 4.39.2
- Pytorch 2.0.0+cu117
- Datasets 2.18.0
- Tokenizers 0.15.2