thorirhrafn
/

gpt1B_DPO_model2

+---
+license: apache-2.0
+library_name: peft
+tags:
+- trl
+- dpo
+- generated_from_trainer
+base_model: AI-Sweden-Models/gpt-sw3-1.3b
+model-index:
+- name: gpt1B_DPO_model2
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# gpt1B_DPO_model2
+This model is a fine-tuned version of [AI-Sweden-Models/gpt-sw3-1.3b](https://huggingface.co/AI-Sweden-Models/gpt-sw3-1.3b) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.0172
+- Rewards/chosen: 0.0550
+- Rewards/rejected: -5.0967
+- Rewards/accuracies: 1.0
+- Rewards/margins: 5.1517
+- Logps/rejected: -272.7118
+- Logps/chosen: -126.5166
+- Logits/rejected: -2.8059
+- Logits/chosen: -3.0134
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 5e-06
+- train_batch_size: 1
+- eval_batch_size: 1
+- seed: 42
+- gradient_accumulation_steps: 8
+- total_train_batch_size: 8
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- num_epochs: 2
+### Training results
+| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
+|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.5638        | 0.1   | 25   | 0.4588          | 0.0798         | -0.4775          | 0.9933             | 0.5572          | -226.5192      | -126.2689    | -3.1511         | -3.3013       |
+| 0.2438        | 0.2   | 50   | 0.2407          | 0.1391         | -1.2630          | 0.9967             | 1.4021          | -234.3750      | -125.6758    | -3.0948         | -3.2573       |
+| 0.1419        | 0.3   | 75   | 0.1251          | 0.1512         | -2.1203          | 0.9967             | 2.2715          | -242.9475      | -125.5544    | -3.0169         | -3.1907       |
+| 0.0637        | 0.4   | 100  | 0.0685          | 0.1313         | -3.0008          | 0.9967             | 3.1321          | -251.7525      | -125.7539    | -2.9258         | -3.1120       |
+| 0.0467        | 0.5   | 125  | 0.0435          | 0.0748         | -3.7561          | 0.9967             | 3.8309          | -259.3056      | -126.3189    | -2.8674         | -3.0627       |
+| 0.029         | 0.6   | 150  | 0.0326          | 0.0369         | -4.2568          | 0.9967             | 4.2937          | -264.3123      | -126.6974    | -2.8396         | -3.0402       |
+| 0.0248        | 0.7   | 175  | 0.0272          | 0.0298         | -4.5229          | 0.9967             | 4.5528          | -266.9736      | -126.7682    | -2.8248         | -3.0283       |
+| 0.0226        | 0.79  | 200  | 0.0233          | 0.0416         | -4.7048          | 0.9967             | 4.7463          | -268.7922      | -126.6510    | -2.8199         | -3.0251       |
+| 0.0149        | 0.89  | 225  | 0.0218          | 0.0346         | -4.8496          | 0.9967             | 4.8843          | -270.2410      | -126.7205    | -2.8109         | -3.0175       |
+| 0.0139        | 0.99  | 250  | 0.0204          | 0.0329         | -4.9460          | 0.9967             | 4.9789          | -271.2041      | -126.7377    | -2.8070         | -3.0145       |
+| 0.0106        | 1.09  | 275  | 0.0191          | 0.0298         | -5.0258          | 1.0                | 5.0556          | -272.0027      | -126.7688    | -2.8048         | -3.0123       |
+| 0.0166        | 1.19  | 300  | 0.0187          | 0.0372         | -5.0554          | 1.0                | 5.0926          | -272.2990      | -126.6948    | -2.8040         | -3.0115       |
+| 0.0123        | 1.29  | 325  | 0.0182          | 0.0438         | -5.0713          | 1.0                | 5.1151          | -272.4578      | -126.6287    | -2.8040         | -3.0117       |
+| 0.017         | 1.39  | 350  | 0.0178          | 0.0474         | -5.0755          | 1.0                | 5.1228          | -272.4991      | -126.5928    | -2.8055         | -3.0130       |
+| 0.0115        | 1.49  | 375  | 0.0177          | 0.0530         | -5.0809          | 1.0                | 5.1339          | -272.5540      | -126.5368    | -2.8064         | -3.0139       |
+| 0.0107        | 1.59  | 400  | 0.0175          | 0.0549         | -5.0879          | 1.0                | 5.1428          | -272.6239      | -126.5175    | -2.8059         | -3.0134       |
+| 0.0144        | 1.69  | 425  | 0.0174          | 0.0546         | -5.0923          | 1.0                | 5.1468          | -272.6672      | -126.5208    | -2.8063         | -3.0137       |
+| 0.0112        | 1.79  | 450  | 0.0175          | 0.0549         | -5.0935          | 1.0                | 5.1484          | -272.6794      | -126.5173    | -2.8062         | -3.0136       |
+| 0.0101        | 1.89  | 475  | 0.0175          | 0.0549         | -5.0958          | 1.0                | 5.1508          | -272.7028      | -126.5172    | -2.8061         | -3.0135       |
+| 0.011         | 1.99  | 500  | 0.0172          | 0.0550         | -5.0967          | 1.0                | 5.1517          | -272.7118      | -126.5166    | -2.8059         | -3.0134       |
+### Framework versions
+- PEFT 0.8.2
+- Transformers 4.38.1
+- Pytorch 2.2.0+cu118
+- Datasets 2.17.1
+- Tokenizers 0.15.2