File size: 13,057 Bytes
f425dd2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
---
license: apache-2.0
base_model: alignment-handbook/zephyr-7b-sft-full
tags:
- generated_from_trainer
model-index:
- name: zephyr-7b-dpo-full
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# zephyr-7b-dpo-full

This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.7337
- Rewards/chosen: -4.9100
- Rewards/rejected: -8.6806
- Rewards/accuracies: 0.7720
- Rewards/margins: 3.7705
- Logps/rejected: -315.2896
- Logps/chosen: -320.2513
- Logits/rejected: -2.5449
- Logits/chosen: -2.5953

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 32
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6144        | 0.05  | 100  | 0.5938          | 0.0567         | -0.2214          | 0.7220             | 0.2780          | -230.6976      | -270.5843    | -3.0045         | -3.0186       |
| 0.4957        | 0.1   | 200  | 0.5132          | 0.0606         | -0.7482          | 0.7460             | 0.8088          | -235.9661      | -270.5448    | -2.9556         | -2.9714       |
| 0.5257        | 0.15  | 300  | 0.4975          | -0.0361        | -1.0262          | 0.7520             | 0.9901          | -238.7455      | -271.5117    | -2.9853         | -2.9989       |
| 0.556         | 0.21  | 400  | 0.4935          | -0.1016        | -1.1994          | 0.7760             | 1.0978          | -240.4776      | -272.1671    | -3.0847         | -3.0931       |
| 0.5409        | 0.26  | 500  | 0.4953          | -0.4001        | -1.5875          | 0.7780             | 1.1874          | -244.3592      | -275.1525    | -3.0544         | -3.0767       |
| 0.5161        | 0.31  | 600  | 0.5195          | -0.3148        | -1.4151          | 0.7420             | 1.1003          | -242.6347      | -274.2988    | -3.0235         | -3.0461       |
| 0.4913        | 0.36  | 700  | 0.5228          | -0.5853        | -1.8669          | 0.7800             | 1.2816          | -247.1535      | -277.0044    | -2.9302         | -2.9586       |
| 0.4724        | 0.41  | 800  | 0.5142          | -0.6071        | -2.0565          | 0.7620             | 1.4494          | -249.0490      | -277.2221    | -2.7988         | -2.8297       |
| 0.5157        | 0.46  | 900  | 0.5050          | -0.5865        | -1.8166          | 0.7660             | 1.2302          | -246.6503      | -277.0157    | -2.9463         | -2.9778       |
| 0.4641        | 0.52  | 1000 | 0.5091          | -0.5151        | -1.9977          | 0.7580             | 1.4826          | -248.4611      | -276.3019    | -2.8916         | -2.9216       |
| 0.5558        | 0.57  | 1100 | 0.4971          | -0.8116        | -2.1120          | 0.7700             | 1.3004          | -249.6036      | -279.2668    | -2.8601         | -2.8914       |
| 0.4877        | 0.62  | 1200 | 0.5092          | -0.5596        | -1.8948          | 0.7640             | 1.3352          | -247.4319      | -276.7474    | -2.8340         | -2.8770       |
| 0.4922        | 0.67  | 1300 | 0.5181          | -0.9340        | -2.3745          | 0.7460             | 1.4405          | -252.2287      | -280.4910    | -2.8187         | -2.8517       |
| 0.5515        | 0.72  | 1400 | 0.5081          | -0.9873        | -2.2119          | 0.7440             | 1.2247          | -250.6034      | -281.0239    | -2.8488         | -2.8704       |
| 0.4349        | 0.77  | 1500 | 0.4996          | -0.9048        | -2.4262          | 0.7580             | 1.5214          | -252.7459      | -280.1994    | -2.8402         | -2.8601       |
| 0.5446        | 0.83  | 1600 | 0.4927          | -0.8717        | -2.4390          | 0.7660             | 1.5673          | -252.8737      | -279.8681    | -2.7610         | -2.7853       |
| 0.5242        | 0.88  | 1700 | 0.4864          | -0.6984        | -2.1381          | 0.7780             | 1.4397          | -249.8655      | -278.1355    | -2.8269         | -2.8525       |
| 0.5266        | 0.93  | 1800 | 0.5020          | -0.5411        | -1.9479          | 0.7760             | 1.4068          | -247.9628      | -276.5621    | -2.7381         | -2.7715       |
| 0.498         | 0.98  | 1900 | 0.5086          | -0.6894        | -2.0331          | 0.7640             | 1.3437          | -248.8150      | -278.0452    | -2.7298         | -2.7664       |
| 0.0664        | 1.03  | 2000 | 0.5137          | -1.1702        | -3.1723          | 0.7620             | 2.0021          | -260.2072      | -282.8530    | -2.6137         | -2.6605       |
| 0.0698        | 1.08  | 2100 | 0.5327          | -1.3645        | -3.5669          | 0.7680             | 2.2023          | -264.1527      | -284.7966    | -2.6219         | -2.6692       |
| 0.0715        | 1.14  | 2200 | 0.5423          | -2.0519        | -4.1983          | 0.7620             | 2.1464          | -270.4673      | -291.6701    | -2.6949         | -2.7397       |
| 0.0548        | 1.19  | 2300 | 0.5459          | -1.7539        | -4.0546          | 0.7700             | 2.3007          | -269.0301      | -288.6898    | -2.5996         | -2.6425       |
| 0.0897        | 1.24  | 2400 | 0.5317          | -1.6549        | -3.7228          | 0.7640             | 2.0679          | -265.7117      | -287.7002    | -2.6512         | -2.6870       |
| 0.0842        | 1.29  | 2500 | 0.5710          | -2.3000        | -4.5267          | 0.7660             | 2.2267          | -273.7511      | -294.1512    | -2.6530         | -2.6843       |
| 0.1321        | 1.34  | 2600 | 0.5334          | -1.8238        | -3.8561          | 0.75               | 2.0323          | -267.0450      | -289.3895    | -2.7094         | -2.7343       |
| 0.0862        | 1.39  | 2700 | 0.5443          | -1.8480        | -3.9514          | 0.7520             | 2.1034          | -267.9976      | -289.6307    | -2.6953         | -2.7169       |
| 0.0954        | 1.45  | 2800 | 0.5472          | -1.9317        | -3.9982          | 0.7620             | 2.0665          | -268.4658      | -290.4683    | -2.6900         | -2.7121       |
| 0.0979        | 1.5   | 2900 | 0.5471          | -2.1452        | -4.1979          | 0.7540             | 2.0526          | -270.4626      | -292.6034    | -2.6466         | -2.6788       |
| 0.0732        | 1.55  | 3000 | 0.5512          | -2.0252        | -4.2019          | 0.75               | 2.1767          | -270.5027      | -291.4029    | -2.6716         | -2.6981       |
| 0.0799        | 1.6   | 3100 | 0.5415          | -1.8888        | -3.8739          | 0.75               | 1.9851          | -267.2229      | -290.0393    | -2.6703         | -2.7143       |
| 0.07          | 1.65  | 3200 | 0.5399          | -1.8457        | -4.0299          | 0.7640             | 2.1843          | -268.7833      | -289.6078    | -2.6566         | -2.7002       |
| 0.0808        | 1.7   | 3300 | 0.5594          | -2.2307        | -4.6355          | 0.7640             | 2.4048          | -274.8385      | -293.4576    | -2.6843         | -2.7340       |
| 0.0501        | 1.76  | 3400 | 0.5704          | -2.5155        | -4.9551          | 0.7660             | 2.4396          | -278.0345      | -296.3059    | -2.6427         | -2.6944       |
| 0.061         | 1.81  | 3500 | 0.5562          | -2.2172        | -4.4937          | 0.7600             | 2.2765          | -273.4208      | -293.3234    | -2.7086         | -2.7404       |
| 0.0979        | 1.86  | 3600 | 0.5656          | -2.6495        | -5.0323          | 0.7520             | 2.3828          | -278.8068      | -297.6461    | -2.6381         | -2.6765       |
| 0.0631        | 1.91  | 3700 | 0.5668          | -2.5055        | -4.7949          | 0.7560             | 2.2895          | -276.4331      | -296.2057    | -2.6407         | -2.6818       |
| 0.1202        | 1.96  | 3800 | 0.5678          | -2.6581        | -4.7249          | 0.7580             | 2.0668          | -275.7330      | -297.7322    | -2.6716         | -2.7125       |
| 0.022         | 2.01  | 3900 | 0.5657          | -2.6893        | -5.1672          | 0.7720             | 2.4778          | -280.1555      | -298.0444    | -2.6680         | -2.7125       |
| 0.0177        | 2.07  | 4000 | 0.6171          | -3.3461        | -6.2908          | 0.7680             | 2.9447          | -291.3919      | -304.6117    | -2.6431         | -2.6916       |
| 0.0108        | 2.12  | 4100 | 0.6389          | -3.3448        | -6.3803          | 0.7660             | 3.0355          | -292.2874      | -304.5994    | -2.6225         | -2.6701       |
| 0.0108        | 2.17  | 4200 | 0.6562          | -3.5386        | -6.6028          | 0.7620             | 3.0642          | -294.5121      | -306.5373    | -2.6323         | -2.6797       |
| 0.0105        | 2.22  | 4300 | 0.6742          | -3.7048        | -6.8992          | 0.7560             | 3.1944          | -297.4764      | -308.1995    | -2.6192         | -2.6678       |
| 0.018         | 2.27  | 4400 | 0.6982          | -4.1642        | -7.4837          | 0.7680             | 3.3195          | -303.3213      | -312.7930    | -2.5975         | -2.6454       |
| 0.0173        | 2.32  | 4500 | 0.6661          | -3.9139        | -6.9481          | 0.7660             | 3.0342          | -297.9650      | -310.2904    | -2.5967         | -2.6394       |
| 0.011         | 2.37  | 4600 | 0.6606          | -3.7121        | -6.8279          | 0.7640             | 3.1158          | -296.7630      | -308.2721    | -2.5628         | -2.6068       |
| 0.0096        | 2.43  | 4700 | 0.6705          | -3.9088        | -7.1613          | 0.7680             | 3.2524          | -300.0965      | -310.2393    | -2.5127         | -2.5613       |
| 0.0099        | 2.48  | 4800 | 0.6825          | -3.9836        | -7.2552          | 0.7720             | 3.2716          | -301.0364      | -310.9875    | -2.5169         | -2.5658       |
| 0.0106        | 2.53  | 4900 | 0.6938          | -4.2534        | -7.7587          | 0.7660             | 3.5053          | -306.0710      | -313.6849    | -2.5330         | -2.5844       |
| 0.0106        | 2.58  | 5000 | 0.6949          | -4.2978        | -7.7919          | 0.7660             | 3.4942          | -306.4034      | -314.1288    | -2.5330         | -2.5826       |
| 0.0099        | 2.63  | 5100 | 0.7239          | -4.3508        | -8.0105          | 0.7640             | 3.6598          | -308.5892      | -314.6587    | -2.5095         | -2.5620       |
| 0.0074        | 2.68  | 5200 | 0.7394          | -4.7364        | -8.4819          | 0.7660             | 3.7456          | -313.3035      | -318.5147    | -2.5378         | -2.5891       |
| 0.0043        | 2.74  | 5300 | 0.7335          | -4.6351        | -8.3990          | 0.7720             | 3.7639          | -312.4740      | -317.5019    | -2.5539         | -2.6052       |
| 0.0163        | 2.79  | 5400 | 0.7317          | -4.6741        | -8.3958          | 0.7700             | 3.7217          | -312.4420      | -317.8924    | -2.5490         | -2.5993       |
| 0.0081        | 2.84  | 5500 | 0.7420          | -4.9166        | -8.6945          | 0.7740             | 3.7779          | -315.4291      | -320.3167    | -2.5307         | -2.5816       |
| 0.0067        | 2.89  | 5600 | 0.7369          | -4.9581        | -8.7224          | 0.7680             | 3.7643          | -315.7077      | -320.7321    | -2.5437         | -2.5941       |
| 0.0081        | 2.94  | 5700 | 0.7345          | -4.9719        | -8.7499          | 0.7720             | 3.7780          | -315.9826      | -320.8700    | -2.5442         | -2.5946       |
| 0.0043        | 2.99  | 5800 | 0.7338          | -4.9141        | -8.6850          | 0.7700             | 3.7709          | -315.3341      | -320.2925    | -2.5452         | -2.5956       |


### Framework versions

- Transformers 4.35.0
- Pytorch 2.1.0
- Datasets 2.14.6
- Tokenizers 0.14.1