File size: 4,742 Bytes

---
license: apache-2.0
library_name: peft
tags:
- trl
- dpo
- generated_from_trainer
base_model: AI-Sweden-Models/gpt-sw3-1.3b
model-index:
- name: gpt1B_DPO_model
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# gpt1B_DPO_model

This model is a fine-tuned version of [AI-Sweden-Models/gpt-sw3-1.3b](https://huggingface.co/AI-Sweden-Models/gpt-sw3-1.3b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0123
- Rewards/chosen: 0.0352
- Rewards/rejected: -5.6889
- Rewards/accuracies: 1.0
- Rewards/margins: 5.7242
- Logps/rejected: -278.6341
- Logps/chosen: -126.7145
- Logits/rejected: -2.7863
- Logits/chosen: -2.9985

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 3

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.2383        | 0.2   | 50   | 0.2344          | 0.1296         | -1.3092          | 0.9967             | 1.4389          | -234.8370      | -125.7705    | -3.0903         | -3.2537       |
| 0.0573        | 0.4   | 100  | 0.0615          | 0.1058         | -3.2004          | 0.9967             | 3.3063          | -253.7490      | -126.0084    | -2.9086         | -3.0985       |
| 0.0262        | 0.6   | 150  | 0.0291          | -0.0050        | -4.5248          | 0.9967             | 4.5198          | -266.9924      | -127.1163    | -2.8221         | -3.0267       |
| 0.0191        | 0.79  | 200  | 0.0205          | 0.0107         | -4.9990          | 0.9967             | 5.0096          | -271.7344      | -126.9600    | -2.8042         | -3.0131       |
| 0.0106        | 0.99  | 250  | 0.0171          | -0.0051        | -5.3187          | 0.9967             | 5.3135          | -274.9313      | -127.1180    | -2.7884         | -3.0001       |
| 0.0129        | 1.19  | 300  | 0.0148          | 0.0024         | -5.4879          | 1.0                | 5.4902          | -276.6234      | -127.0432    | -2.7840         | -2.9962       |
| 0.0125        | 1.39  | 350  | 0.0137          | 0.0243         | -5.5389          | 1.0                | 5.5632          | -277.1337      | -126.8233    | -2.7873         | -2.9994       |
| 0.0079        | 1.59  | 400  | 0.0129          | 0.0313         | -5.5885          | 1.0                | 5.6198          | -277.6297      | -126.7539    | -2.7878         | -3.0000       |
| 0.0077        | 1.79  | 450  | 0.0126          | 0.0332         | -5.6246          | 1.0                | 5.6578          | -277.9906      | -126.7342    | -2.7878         | -2.9998       |
| 0.0073        | 1.99  | 500  | 0.0126          | 0.0322         | -5.6582          | 1.0                | 5.6905          | -278.3270      | -126.7444    | -2.7863         | -2.9985       |
| 0.0087        | 2.19  | 550  | 0.0123          | 0.0334         | -5.6819          | 1.0                | 5.7153          | -278.5634      | -126.7327    | -2.7862         | -2.9983       |
| 0.0111        | 2.38  | 600  | 0.0123          | 0.0324         | -5.6898          | 1.0                | 5.7222          | -278.6425      | -126.7427    | -2.7862         | -2.9984       |
| 0.0086        | 2.58  | 650  | 0.0122          | 0.0357         | -5.6877          | 1.0                | 5.7234          | -278.6218      | -126.7101    | -2.7863         | -2.9984       |
| 0.0067        | 2.78  | 700  | 0.0122          | 0.0352         | -5.6897          | 1.0                | 5.7249          | -278.6414      | -126.7143    | -2.7860         | -2.9981       |
| 0.0067        | 2.98  | 750  | 0.0123          | 0.0352         | -5.6889          | 1.0                | 5.7242          | -278.6341      | -126.7145    | -2.7863         | -2.9985       |


### Framework versions

- PEFT 0.8.2
- Transformers 4.38.1
- Pytorch 2.2.0+cu118
- Datasets 2.17.1
- Tokenizers 0.15.2