---
license: apache-2.0
base_model: mosaicml/mpt-7b-instruct
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: mpt_1000_STEPS_1e5_rate_05_beta_DPO
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# mpt_1000_STEPS_1e5_rate_05_beta_DPO

This model is a fine-tuned version of [mosaicml/mpt-7b-instruct](https://huggingface.co/mosaicml/mpt-7b-instruct) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 2.1807
- Rewards/chosen: -19.4532
- Rewards/rejected: -19.2274
- Rewards/accuracies: 0.5033
- Rewards/margins: -0.2258
- Logps/rejected: -60.0122
- Logps/chosen: -59.6986
- Logits/rejected: 7.5623
- Logits/chosen: 7.5620

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 1.5203        | 0.05  | 50   | 1.5171          | -1.5689        | -1.4986          | 0.4791             | -0.0703         | -24.5546       | -23.9299     | 14.9602         | 14.9630       |
| 4.4339        | 0.1   | 100  | 2.9117          | -11.0118       | -10.8837         | 0.4813             | -0.1281         | -43.3247       | -42.8158     | 22.8545         | 22.8566       |
| 5.6756        | 0.15  | 150  | 4.3519          | -20.9772       | -20.5347         | 0.4703             | -0.4424         | -62.6269       | -62.7465     | 13.8454         | 13.8456       |
| 3.4587        | 0.2   | 200  | 3.7953          | -20.5135       | -19.9733         | 0.4549             | -0.5402         | -61.5040       | -61.8193     | 9.3162          | 9.3161        |
| 3.1326        | 0.24  | 250  | 4.2192          | -16.2805       | -16.0169         | 0.4857             | -0.2636         | -53.5912       | -53.3533     | 17.4741         | 17.4741       |
| 4.3129        | 0.29  | 300  | 3.2442          | -18.6648       | -18.0875         | 0.4462             | -0.5773         | -57.7325       | -58.1219     | 9.3299          | 9.3300        |
| 4.1056        | 0.34  | 350  | 3.0391          | -19.9243       | -19.4698         | 0.4659             | -0.4545         | -60.4970       | -60.6408     | 13.8852         | 13.8856       |
| 3.4604        | 0.39  | 400  | 3.0915          | -16.3912       | -16.0366         | 0.5055             | -0.3546         | -53.6306       | -53.5745     | 9.7129          | 9.7125        |
| 4.7084        | 0.44  | 450  | 2.7841          | -18.9738       | -18.6116         | 0.4835             | -0.3622         | -58.7806       | -58.7398     | 9.9158          | 9.9143        |
| 4.1944        | 0.49  | 500  | 2.9877          | -22.1479       | -21.8535         | 0.4901             | -0.2944         | -65.2644       | -65.0879     | 10.6479         | 10.6476       |
| 3.8283        | 0.54  | 550  | 2.4650          | -19.8299       | -19.7039         | 0.4989             | -0.1260         | -60.9653       | -60.4520     | 5.6892          | 5.6889        |
| 3.2208        | 0.59  | 600  | 2.3549          | -15.6227       | -15.7624         | 0.5385             | 0.1397          | -53.0822       | -52.0377     | 11.5783         | 11.5782       |
| 2.1741        | 0.64  | 650  | 2.4777          | -19.7204       | -19.3976         | 0.4945             | -0.3228         | -60.3526       | -60.2330     | 10.8601         | 10.8596       |
| 2.8376        | 0.68  | 700  | 2.4241          | -18.3119       | -18.1735         | 0.5055             | -0.1384         | -57.9045       | -57.4161     | 8.0859          | 8.0854        |
| 2.4514        | 0.73  | 750  | 2.2743          | -20.2330       | -20.0266         | 0.5033             | -0.2064         | -61.6106       | -61.2582     | 6.6227          | 6.6223        |
| 1.8899        | 0.78  | 800  | 2.2326          | -19.6323       | -19.3966         | 0.5121             | -0.2358         | -60.3506       | -60.0568     | 7.6793          | 7.6789        |
| 2.435         | 0.83  | 850  | 2.1976          | -19.5253       | -19.2881         | 0.5121             | -0.2372         | -60.1336       | -59.8427     | 7.3698          | 7.3695        |
| 2.7112        | 0.88  | 900  | 2.1806          | -19.4443       | -19.2182         | 0.5011             | -0.2261         | -59.9939       | -59.6808     | 7.5579          | 7.5575        |
| 2.6506        | 0.93  | 950  | 2.1819          | -19.4556       | -19.2275         | 0.5011             | -0.2280         | -60.0125       | -59.7034     | 7.5627          | 7.5623        |
| 1.5392        | 0.98  | 1000 | 2.1807          | -19.4532       | -19.2274         | 0.5033             | -0.2258         | -60.0122       | -59.6986     | 7.5623          | 7.5620        |


### Framework versions

- Transformers 4.39.1
- Pytorch 2.0.0+cu117
- Datasets 2.18.0
- Tokenizers 0.15.2