File size: 6,428 Bytes
09ffb66
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
---

library_name: transformers
license: mit
base_model: openai-community/gpt2
tags:
- trl
- orpo
- generated_from_trainer
datasets:
- piqa
model-index:
- name: HW2-orpo
  results: []
---


<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# HW2-orpo

This model is a fine-tuned version of [openai-community/gpt2](https://huggingface.co/openai-community/gpt2) on the piqa dataset.
It achieves the following results on the evaluation set:
- Loss: 3.8617
- Rewards/chosen: -0.3716
- Rewards/rejected: -0.3885
- Rewards/accuracies: 0.6390
- Rewards/margins: 0.0170
- Logps/rejected: -3.8851
- Logps/chosen: -3.7156
- Logits/rejected: -3.3968
- Logits/chosen: -3.5059
- Nll Loss: 3.7885
- Log Odds Ratio: -0.7324
- Log Odds Chosen: 0.1830

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05

- train_batch_size: 1

- eval_batch_size: 1

- seed: 42

- gradient_accumulation_steps: 8

- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 5

- mixed_precision_training: Native AMP



### Training results



| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Nll Loss | Log Odds Ratio | Log Odds Chosen |

|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------:|:--------------:|:---------------:|

| 3.5511        | 0.2758 | 500  | 3.4162          | -0.3146        | -0.3224          | 0.6303             | 0.0078          | -3.2238        | -3.1457      | -12.1919        | -12.3316      | 3.3464   | -0.6978        | 0.0837          |

| 3.3852        | 0.5517 | 1000 | 3.3345          | -0.3060        | -0.3152          | 0.6421             | 0.0092          | -3.1517        | -3.0602      | -3.3351         | -3.5024       | 3.2656   | -0.6894        | 0.0984          |

| 3.2734        | 0.8275 | 1500 | 3.2903          | -0.3011        | -0.3101          | 0.6309             | 0.0090          | -3.1013        | -3.0113      | -5.6602         | -5.7320       | 3.2211   | -0.6920        | 0.0975          |

| 3.104         | 1.1034 | 2000 | 3.2933          | -0.3021        | -0.3118          | 0.6371             | 0.0097          | -3.1182        | -3.0211      | -0.2253         | -0.3135       | 3.2237   | -0.6956        | 0.1062          |

| 2.8138        | 1.3792 | 2500 | 3.2816          | -0.3018        | -0.3125          | 0.6464             | 0.0107          | -3.1253        | -3.0179      | 1.3216          | 1.2346        | 3.2125   | -0.6916        | 0.1172          |

| 2.8178        | 1.6551 | 3000 | 3.2660          | -0.2998        | -0.3108          | 0.6383             | 0.0109          | -3.1080        | -2.9985      | -0.7475         | -0.8064       | 3.1968   | -0.6923        | 0.1204          |

| 2.8122        | 1.9309 | 3500 | 3.2586          | -0.2992        | -0.3104          | 0.6433             | 0.0112          | -3.1039        | -2.9922      | -2.8285         | -2.9509       | 3.1893   | -0.6925        | 0.1228          |

| 2.4931        | 2.2067 | 4000 | 3.3765          | -0.3130        | -0.3256          | 0.6427             | 0.0127          | -3.2563        | -3.1296      | 1.6707          | 1.5380        | 3.3063   | -0.7020        | 0.1392          |

| 2.3999        | 2.4826 | 4500 | 3.4109          | -0.3174        | -0.3298          | 0.6402             | 0.0125          | -3.2982        | -3.1736      | 1.4695          | 1.2634        | 3.3402   | -0.7069        | 0.1373          |

| 2.4254        | 2.7584 | 5000 | 3.3882          | -0.3150        | -0.3278          | 0.6439             | 0.0128          | -3.2781        | -3.1497      | 2.1282          | 1.9044        | 3.3180   | -0.7018        | 0.1416          |

| 2.373         | 3.0343 | 5500 | 3.5698          | -0.3370        | -0.3515          | 0.6408             | 0.0145          | -3.5149        | -3.3698      | 3.7150          | 3.6601        | 3.4983   | -0.7147        | 0.1595          |

| 2.0541        | 3.3101 | 6000 | 3.6256          | -0.3430        | -0.3570          | 0.6284             | 0.0140          | -3.5700        | -3.4302      | 1.1269          | 0.9714        | 3.5532   | -0.7240        | 0.1540          |

| 2.0641        | 3.5860 | 6500 | 3.6157          | -0.3425        | -0.3577          | 0.6445             | 0.0152          | -3.5771        | -3.4246      | -0.6703         | -0.8165       | 3.5439   | -0.7178        | 0.1665          |

| 2.0747        | 3.8618 | 7000 | 3.6335          | -0.3447        | -0.3598          | 0.6402             | 0.0151          | -3.5983        | -3.4474      | -0.1967         | -0.3291       | 3.5616   | -0.7193        | 0.1640          |

| 1.9377        | 4.1376 | 7500 | 3.8286          | -0.3671        | -0.3838          | 0.6445             | 0.0167          | -3.8381        | -3.6712      | -2.6871         | -2.8058       | 3.7557   | -0.7288        | 0.1800          |

| 1.8001        | 4.4135 | 8000 | 3.8629          | -0.3715        | -0.3882          | 0.6414             | 0.0168          | -3.8822        | -3.7146      | -3.4193         | -3.5370       | 3.7898   | -0.7315        | 0.1810          |

| 1.81          | 4.6893 | 8500 | 3.8574          | -0.3711        | -0.3879          | 0.6396             | 0.0168          | -3.8789        | -3.7110      | -4.2176         | -4.3406       | 3.7842   | -0.7321        | 0.1814          |

| 1.8108        | 4.9652 | 9000 | 3.8617          | -0.3716        | -0.3885          | 0.6390             | 0.0170          | -3.8851        | -3.7156      | -3.3968         | -3.5059       | 3.7885   | -0.7324        | 0.1830          |





### Framework versions



- Transformers 4.44.2

- Pytorch 2.4.0+cu118

- Datasets 2.21.0

- Tokenizers 0.19.1