File size: 8,627 Bytes
65b0291
 
 
 
 
7d13ca1
65b0291
 
 
 
 
 
 
 
 
 
 
 
 
aeb82f8
 
 
 
 
 
 
 
 
65b0291
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aeb82f8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
65b0291
 
 
 
1643ab1
65b0291
1643ab1
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
---
library_name: transformers
tags:
- trl
- dpo
- alignment-handbook
- generated_from_trainer
model-index:
- name: OpenELM-1_1B-DPO-full-max-4-reward
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# OpenELM-1_1B-DPO-full-max-4-reward

This model was trained from scratch on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.6190
- Rewards/chosen: -13.625
- Rewards/rejected: -15.0625
- Rewards/accuracies: 0.5996
- Rewards/margins: 1.4688
- Logps/rejected: -1800.0
- Logps/chosen: -1680.0
- Logits/rejected: 1.0625
- Logits/chosen: -0.2695

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 16
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- total_eval_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6378        | 0.0838 | 80   | 0.6868          | -0.6758        | -0.7656          | 0.5684             | 0.0918          | -366.0         | -386.0       | -9.875          | -10.125       |
| 0.6219        | 0.1675 | 160  | 0.6949          | -0.9102        | -1.0547          | 0.5977             | 0.1406          | -394.0         | -410.0       | -10.125         | -10.5         |
| 0.6151        | 0.2513 | 240  | 0.7637          | -2.4531        | -2.6562          | 0.5566             | 0.2031          | -552.0         | -564.0       | -10.9375        | -11.25        |
| 0.6607        | 0.3351 | 320  | 0.7307          | -2.7344        | -2.9375          | 0.5742             | 0.1992          | -584.0         | -592.0       | -14.25          | -14.4375      |
| 0.6304        | 0.4188 | 400  | 0.7129          | -2.7344        | -3.0156          | 0.5898             | 0.2715          | -588.0         | -592.0       | -12.5           | -13.0         |
| 0.623         | 0.5026 | 480  | 0.7718          | -2.5469        | -2.9375          | 0.5859             | 0.3887          | -584.0         | -572.0       | -8.0625         | -9.0          |
| 0.6091        | 0.5864 | 560  | 0.7543          | -3.3281        | -3.6562          | 0.5957             | 0.3320          | -656.0         | -652.0       | -12.0           | -12.75        |
| 0.583         | 0.6702 | 640  | 0.7081          | -3.25          | -3.7031          | 0.6406             | 0.4648          | -660.0         | -644.0       | -9.0            | -10.0625      |
| 0.6183        | 0.7539 | 720  | 0.7397          | -3.7812        | -4.0938          | 0.5996             | 0.3242          | -700.0         | -696.0       | -8.5625         | -9.4375       |
| 0.5988        | 0.8377 | 800  | 0.7986          | -4.4688        | -4.9375          | 0.5898             | 0.4609          | -784.0         | -764.0       | -7.9062         | -8.9375       |
| 0.5882        | 0.9215 | 880  | 0.7997          | -3.2656        | -3.6562          | 0.5879             | 0.3906          | -656.0         | -644.0       | -8.3125         | -9.1875       |
| 0.4256        | 1.0052 | 960  | 0.7816          | -4.5312        | -5.1875          | 0.6172             | 0.6367          | -808.0         | -772.0       | -6.75           | -7.9062       |
| 0.2006        | 1.0890 | 1040 | 0.9734          | -5.9688        | -6.6875          | 0.6094             | 0.7383          | -960.0         | -916.0       | -4.7812         | -6.0625       |
| 0.1977        | 1.1728 | 1120 | 0.9420          | -6.25          | -7.0             | 0.6094             | 0.7578          | -988.0         | -944.0       | -5.0            | -6.25         |
| 0.1717        | 1.2565 | 1200 | 1.0548          | -7.4688        | -8.25            | 0.5918             | 0.7852          | -1112.0        | -1064.0      | -4.5            | -5.8125       |
| 0.1881        | 1.3403 | 1280 | 0.9567          | -6.9688        | -7.8125          | 0.6035             | 0.8672          | -1072.0        | -1012.0      | -3.2188         | -4.4688       |
| 0.1897        | 1.4241 | 1360 | 0.9563          | -6.9688        | -7.8438          | 0.6055             | 0.8867          | -1072.0        | -1016.0      | -4.2812         | -5.6875       |
| 0.1383        | 1.5079 | 1440 | 1.1196          | -8.5625        | -9.5             | 0.6055             | 0.9922          | -1240.0        | -1176.0      | -2.5938         | -3.9062       |
| 0.146         | 1.5916 | 1520 | 1.0767          | -9.5           | -10.5            | 0.6055             | 1.0078          | -1336.0        | -1264.0      | -1.6797         | -3.0312       |
| 0.1831        | 1.6754 | 1600 | 0.9776          | -8.0625        | -8.9375          | 0.6055             | 0.8516          | -1184.0        | -1128.0      | -2.2344         | -3.5938       |
| 0.1667        | 1.7592 | 1680 | 1.0210          | -7.75          | -8.625           | 0.5957             | 0.9023          | -1152.0        | -1088.0      | -1.7344         | -3.2344       |
| 0.1514        | 1.8429 | 1760 | 1.0214          | -8.6875        | -9.6875          | 0.6133             | 0.9805          | -1256.0        | -1184.0      | -1.1719         | -2.5312       |
| 0.1594        | 1.9267 | 1840 | 1.0633          | -8.8125        | -9.75            | 0.5977             | 0.9727          | -1264.0        | -1200.0      | -1.2344         | -2.625        |
| 0.0307        | 2.0105 | 1920 | 1.0948          | -8.75          | -9.75            | 0.6172             | 1.0312          | -1264.0        | -1192.0      | -1.4531         | -2.9844       |
| 0.0214        | 2.0942 | 2000 | 1.5354          | -12.25         | -13.3125         | 0.6094             | 1.1016          | -1624.0        | -1544.0      | 0.1973          | -1.2031       |
| 0.0186        | 2.1780 | 2080 | 1.5790          | -13.5625       | -14.9375         | 0.6055             | 1.3906          | -1784.0        | -1680.0      | 0.4902          | -0.9102       |
| 0.0395        | 2.2618 | 2160 | 1.5234          | -12.0625       | -13.1875         | 0.6035             | 1.1406          | -1608.0        | -1520.0      | 0.5391          | -0.7656       |
| 0.0217        | 2.3455 | 2240 | 1.5867          | -13.1875       | -14.5625         | 0.6035             | 1.375           | -1744.0        | -1632.0      | 0.8945          | -0.4141       |
| 0.0268        | 2.4293 | 2320 | 1.5888          | -13.0          | -14.375          | 0.6035             | 1.4219          | -1728.0        | -1616.0      | 0.6797          | -0.6758       |
| 0.0238        | 2.5131 | 2400 | 1.6647          | -13.625        | -15.0625         | 0.6055             | 1.4453          | -1792.0        | -1680.0      | 0.9648          | -0.3633       |
| 0.0227        | 2.5969 | 2480 | 1.5873          | -13.125        | -14.5625         | 0.6094             | 1.4375          | -1744.0        | -1632.0      | 0.9258          | -0.4199       |
| 0.0233        | 2.6806 | 2560 | 1.5836          | -13.1875       | -14.625          | 0.6035             | 1.4297          | -1752.0        | -1640.0      | 0.9297          | -0.4180       |
| 0.021         | 2.7644 | 2640 | 1.5917          | -13.4375       | -14.9375         | 0.6094             | 1.4609          | -1776.0        | -1664.0      | 1.0078          | -0.3223       |
| 0.0221        | 2.8482 | 2720 | 1.6077          | -13.5625       | -15.0            | 0.6035             | 1.4609          | -1792.0        | -1672.0      | 1.0469          | -0.2793       |
| 0.0182        | 2.9319 | 2800 | 1.6190          | -13.625        | -15.0625         | 0.5996             | 1.4688          | -1800.0        | -1680.0      | 1.0625          | -0.2695       |


### Framework versions

- Transformers 4.45.1
- Pytorch 2.3.0
- Datasets 3.0.1
- Tokenizers 0.20.0