---
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.2
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: mistralit2_1000_STEPS_1e8_rate_0.1_beta_DPO
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# mistralit2_1000_STEPS_1e8_rate_0.1_beta_DPO

This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.6920
- Rewards/chosen: -0.0058
- Rewards/rejected: -0.0082
- Rewards/accuracies: 0.5121
- Rewards/margins: 0.0024
- Logps/rejected: -28.6543
- Logps/chosen: -23.4436
- Logits/rejected: -2.8649
- Logits/chosen: -2.8652

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-08
- train_batch_size: 4
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.693         | 0.1   | 50   | 0.6928          | 0.0007         | -0.0000          | 0.4549             | 0.0007          | -28.5728       | -23.3792     | -2.8652         | -2.8654       |
| 0.693         | 0.2   | 100  | 0.6920          | 0.0012         | -0.0011          | 0.4945             | 0.0023          | -28.5838       | -23.3741     | -2.8653         | -2.8655       |
| 0.693         | 0.29  | 150  | 0.6923          | -0.0015        | -0.0033          | 0.4989             | 0.0018          | -28.6052       | -23.4006     | -2.8651         | -2.8653       |
| 0.694         | 0.39  | 200  | 0.6923          | -0.0020        | -0.0037          | 0.4813             | 0.0017          | -28.6093       | -23.4058     | -2.8651         | -2.8653       |
| 0.6916        | 0.49  | 250  | 0.6922          | -0.0026        | -0.0046          | 0.4879             | 0.0021          | -28.6189       | -23.4118     | -2.8651         | -2.8654       |
| 0.6927        | 0.59  | 300  | 0.6920          | -0.0039        | -0.0063          | 0.5011             | 0.0023          | -28.6350       | -23.4253     | -2.8650         | -2.8653       |
| 0.6941        | 0.68  | 350  | 0.6927          | -0.0048        | -0.0058          | 0.4659             | 0.0010          | -28.6304       | -23.4334     | -2.8650         | -2.8652       |
| 0.6924        | 0.78  | 400  | 0.6922          | -0.0049        | -0.0068          | 0.4989             | 0.0019          | -28.6399       | -23.4345     | -2.8650         | -2.8653       |
| 0.6919        | 0.88  | 450  | 0.6918          | -0.0056        | -0.0084          | 0.4857             | 0.0028          | -28.6562       | -23.4418     | -2.8650         | -2.8653       |
| 0.6913        | 0.98  | 500  | 0.6913          | -0.0047        | -0.0085          | 0.5077             | 0.0038          | -28.6577       | -23.4328     | -2.8649         | -2.8652       |
| 0.6914        | 1.07  | 550  | 0.6915          | -0.0034        | -0.0067          | 0.5143             | 0.0033          | -28.6398       | -23.4200     | -2.8650         | -2.8653       |
| 0.6939        | 1.17  | 600  | 0.6922          | -0.0069        | -0.0089          | 0.5033             | 0.0020          | -28.6613       | -23.4550     | -2.8650         | -2.8652       |
| 0.6917        | 1.27  | 650  | 0.6920          | -0.0056        | -0.0081          | 0.5231             | 0.0025          | -28.6535       | -23.4422     | -2.8650         | -2.8653       |
| 0.6919        | 1.37  | 700  | 0.6921          | -0.0052        | -0.0074          | 0.5055             | 0.0021          | -28.6463       | -23.4383     | -2.8650         | -2.8653       |
| 0.6929        | 1.46  | 750  | 0.6915          | -0.0044        | -0.0078          | 0.5363             | 0.0034          | -28.6506       | -23.4298     | -2.8650         | -2.8653       |
| 0.6919        | 1.56  | 800  | 0.6922          | -0.0063        | -0.0083          | 0.5209             | 0.0020          | -28.6553       | -23.4489     | -2.8649         | -2.8652       |
| 0.6925        | 1.66  | 850  | 0.6921          | -0.0058        | -0.0080          | 0.5121             | 0.0022          | -28.6528       | -23.4438     | -2.8649         | -2.8652       |
| 0.6925        | 1.76  | 900  | 0.6920          | -0.0058        | -0.0082          | 0.5121             | 0.0024          | -28.6543       | -23.4436     | -2.8649         | -2.8652       |
| 0.6939        | 1.86  | 950  | 0.6920          | -0.0058        | -0.0082          | 0.5121             | 0.0024          | -28.6543       | -23.4436     | -2.8649         | -2.8652       |
| 0.6924        | 1.95  | 1000 | 0.6920          | -0.0058        | -0.0082          | 0.5121             | 0.0024          | -28.6543       | -23.4436     | -2.8649         | -2.8652       |


### Framework versions

- Transformers 4.38.2
- Pytorch 2.0.0+cu117
- Datasets 2.18.0
- Tokenizers 0.15.2