---
license: mit
base_model: EleutherAI/gpt-neo-125M
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: model
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# model

This model is a fine-tuned version of [EleutherAI/gpt-neo-125M](https://huggingface.co/EleutherAI/gpt-neo-125M) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.6955
- Rewards/chosen: -0.0079
- Rewards/rejected: -0.0080
- Rewards/accuracies: 0.4813
- Rewards/margins: 0.0001
- Logps/rejected: -478.8612
- Logps/chosen: -494.2958
- Logits/rejected: -18.3633
- Logits/chosen: -18.4819

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6955        | 0.2992 | 100  | 0.6958          | -0.0017        | -0.0008          | 0.4701             | -0.0008         | -478.7900      | -494.2336    | -18.3637        | -18.4824      |
| 0.6906        | 0.5984 | 200  | 0.6962          | -0.0028        | -0.0016          | 0.4744             | -0.0013         | -478.7974      | -494.2453    | -18.3625        | -18.4806      |
| 0.6985        | 0.8975 | 300  | 0.6959          | -0.0222        | -0.0214          | 0.4738             | -0.0008         | -478.9952      | -494.4388    | -18.3624        | -18.4809      |
| 0.6946        | 1.1967 | 400  | 0.6955          | 0.0015         | 0.0015           | 0.4753             | 0.0000          | -478.7664      | -494.2018    | -18.3628        | -18.4811      |
| 0.6946        | 1.4959 | 500  | 0.6960          | -0.0046        | -0.0040          | 0.4791             | -0.0006         | -478.8223      | -494.2634    | -18.3631        | -18.4816      |
| 0.6952        | 1.7951 | 600  | 0.6951          | -0.0047        | -0.0057          | 0.4882             | 0.0011          | -478.8391      | -494.2639    | -18.3636        | -18.4821      |
| 0.6947        | 2.0942 | 700  | 0.6955          | -0.0053        | -0.0056          | 0.4822             | 0.0003          | -478.8379      | -494.2701    | -18.3634        | -18.4820      |
| 0.6995        | 2.3934 | 800  | 0.6948          | -0.0060        | -0.0076          | 0.4918             | 0.0015          | -478.8574      | -494.2774    | -18.3632        | -18.4818      |
| 0.6932        | 2.6926 | 900  | 0.6952          | -0.0080        | -0.0087          | 0.4837             | 0.0008          | -478.8692      | -494.2970    | -18.3633        | -18.4817      |
| 0.6964        | 2.9918 | 1000 | 0.6955          | -0.0079        | -0.0080          | 0.4813             | 0.0001          | -478.8612      | -494.2958    | -18.3633        | -18.4819      |


### Framework versions

- Transformers 4.41.2
- Pytorch 2.3.0+cu121
- Datasets 2.19.1
- Tokenizers 0.19.1