File size: 5,784 Bytes
642a7a9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
---
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.2
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: mistralit2_1000_STEPS_1e7_rate_0.1_beta_DPO
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# mistralit2_1000_STEPS_1e7_rate_0.1_beta_DPO

This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.5956
- Rewards/chosen: -1.5853
- Rewards/rejected: -2.2163
- Rewards/accuracies: 0.6308
- Rewards/margins: 0.6310
- Logps/rejected: -50.7358
- Logps/chosen: -39.2390
- Logits/rejected: -2.7784
- Logits/chosen: -2.7790

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-07
- train_batch_size: 4
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6877        | 0.1   | 50   | 0.6853          | -0.0549        | -0.0714          | 0.5846             | 0.0165          | -29.2866       | -23.9352     | -2.8632         | -2.8635       |
| 0.6422        | 0.2   | 100  | 0.6445          | -0.6597        | -0.8190          | 0.5758             | 0.1593          | -36.7627       | -29.9831     | -2.8277         | -2.8281       |
| 0.5395        | 0.29  | 150  | 0.6276          | -1.2920        | -1.6093          | 0.6000             | 0.3173          | -44.6654       | -36.3059     | -2.8059         | -2.8065       |
| 0.5656        | 0.39  | 200  | 0.6108          | -1.0994        | -1.4528          | 0.6000             | 0.3533          | -43.1002       | -34.3802     | -2.8123         | -2.8129       |
| 0.6317        | 0.49  | 250  | 0.5945          | -0.9164        | -1.3078          | 0.6176             | 0.3914          | -41.6506       | -32.5496     | -2.8115         | -2.8120       |
| 0.5648        | 0.59  | 300  | 0.6008          | -1.5057        | -1.9938          | 0.6198             | 0.4882          | -48.5106       | -38.4425     | -2.8042         | -2.8048       |
| 0.5332        | 0.68  | 350  | 0.6081          | -1.6363        | -2.1476          | 0.6154             | 0.5113          | -50.0487       | -39.7490     | -2.7837         | -2.7843       |
| 0.5852        | 0.78  | 400  | 0.5973          | -1.4834        | -2.0381          | 0.6330             | 0.5547          | -48.9534       | -38.2196     | -2.7903         | -2.7909       |
| 0.6266        | 0.88  | 450  | 0.5981          | -1.4246        | -1.9676          | 0.6374             | 0.5430          | -48.2480       | -37.6317     | -2.7910         | -2.7916       |
| 0.5184        | 0.98  | 500  | 0.5916          | -1.2806        | -1.8253          | 0.6286             | 0.5447          | -46.8255       | -36.1919     | -2.7922         | -2.7928       |
| 0.4736        | 1.07  | 550  | 0.5909          | -1.3628        | -1.9386          | 0.6330             | 0.5758          | -47.9585       | -37.0137     | -2.7867         | -2.7874       |
| 0.4708        | 1.17  | 600  | 0.5950          | -1.4680        | -2.0567          | 0.6330             | 0.5887          | -49.1391       | -38.0658     | -2.7836         | -2.7842       |
| 0.5232        | 1.27  | 650  | 0.5965          | -1.5438        | -2.1546          | 0.6308             | 0.6108          | -50.1188       | -38.8241     | -2.7804         | -2.7811       |
| 0.455         | 1.37  | 700  | 0.5976          | -1.5823        | -2.2032          | 0.6308             | 0.6209          | -50.6042       | -39.2085     | -2.7793         | -2.7799       |
| 0.4032        | 1.46  | 750  | 0.5958          | -1.5721        | -2.1999          | 0.6352             | 0.6278          | -50.5717       | -39.1070     | -2.7788         | -2.7795       |
| 0.4487        | 1.56  | 800  | 0.5957          | -1.5857        | -2.2157          | 0.6308             | 0.6300          | -50.7295       | -39.2429     | -2.7785         | -2.7791       |
| 0.5015        | 1.66  | 850  | 0.5972          | -1.5836        | -2.2125          | 0.6308             | 0.6289          | -50.6972       | -39.2220     | -2.7785         | -2.7791       |
| 0.419         | 1.76  | 900  | 0.5966          | -1.5861        | -2.2157          | 0.6308             | 0.6296          | -50.7298       | -39.2470     | -2.7783         | -2.7790       |
| 0.4581        | 1.86  | 950  | 0.5956          | -1.5849        | -2.2158          | 0.6308             | 0.6309          | -50.7304       | -39.2349     | -2.7784         | -2.7790       |
| 0.381         | 1.95  | 1000 | 0.5956          | -1.5853        | -2.2163          | 0.6308             | 0.6310          | -50.7358       | -39.2390     | -2.7784         | -2.7790       |


### Framework versions

- Transformers 4.38.2
- Pytorch 2.0.0+cu117
- Datasets 2.18.0
- Tokenizers 0.15.2