---
library_name: transformers
license: llama3
base_model: tsavage68/Na_L3_100steps_1e6rate_SFT
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: Na_L3_1000steps_1e6rate_01beta_cSFTDPO
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Na_L3_1000steps_1e6rate_01beta_cSFTDPO

This model is a fine-tuned version of [tsavage68/Na_L3_100steps_1e6rate_SFT](https://huggingface.co/tsavage68/Na_L3_100steps_1e6rate_SFT) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0000
- Rewards/chosen: 1.6118
- Rewards/rejected: -12.0281
- Rewards/accuracies: 1.0
- Rewards/margins: 13.6398
- Logps/rejected: -161.7823
- Logps/chosen: -8.7726
- Logits/rejected: -0.9066
- Logits/chosen: -0.8232

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.0004        | 0.2667 | 50   | 0.0002          | 1.1139         | -7.3111          | 1.0                | 8.4250          | -114.6128      | -13.7514     | -0.9426         | -0.8757       |
| 0.0           | 0.5333 | 100  | 0.0000          | 1.3245         | -9.2705          | 1.0                | 10.5951         | -134.2070      | -11.6448     | -0.9282         | -0.8570       |
| 0.0           | 0.8    | 150  | 0.0000          | 1.4013         | -10.1633         | 1.0                | 11.5646         | -143.1346      | -10.8774     | -0.9219         | -0.8480       |
| 0.0           | 1.0667 | 200  | 0.0000          | 1.4458         | -10.6256         | 1.0                | 12.0714         | -147.7574      | -10.4319     | -0.9199         | -0.8445       |
| 0.0           | 1.3333 | 250  | 0.0000          | 1.4852         | -10.9716         | 1.0                | 12.4568         | -151.2177      | -10.0381     | -0.9162         | -0.8391       |
| 0.0           | 1.6    | 300  | 0.0000          | 1.5139         | -11.2034         | 1.0                | 12.7173         | -153.5357      | -9.7513      | -0.9157         | -0.8372       |
| 0.0           | 1.8667 | 350  | 0.0000          | 1.5399         | -11.3960         | 1.0                | 12.9358         | -155.4616      | -9.4916      | -0.9126         | -0.8332       |
| 0.0           | 2.1333 | 400  | 0.0000          | 1.5600         | -11.5474         | 1.0                | 13.1074         | -156.9758      | -9.2899      | -0.9114         | -0.8310       |
| 0.0           | 2.4    | 450  | 0.0000          | 1.5740         | -11.6695         | 1.0                | 13.2435         | -158.1971      | -9.1505      | -0.9103         | -0.8292       |
| 0.0           | 2.6667 | 500  | 0.0000          | 1.5786         | -11.7703         | 1.0                | 13.3489         | -159.2048      | -9.1044      | -0.9090         | -0.8273       |
| 0.0           | 2.9333 | 550  | 0.0000          | 1.5997         | -11.8482         | 1.0                | 13.4479         | -159.9833      | -8.8929      | -0.9085         | -0.8260       |
| 0.0           | 3.2    | 600  | 0.0000          | 1.6059         | -11.9156         | 1.0                | 13.5215         | -160.6575      | -8.8312      | -0.9080         | -0.8251       |
| 0.0           | 3.4667 | 650  | 0.0000          | 1.6043         | -11.9725         | 1.0                | 13.5768         | -161.2263      | -8.8467      | -0.9080         | -0.8248       |
| 0.0           | 3.7333 | 700  | 0.0000          | 1.6126         | -11.9912         | 1.0                | 13.6038         | -161.4137      | -8.7638      | -0.9076         | -0.8242       |
| 0.0           | 4.0    | 750  | 0.0000          | 1.6085         | -12.0144         | 1.0                | 13.6229         | -161.6453      | -8.8050      | -0.9078         | -0.8243       |
| 0.0           | 4.2667 | 800  | 0.0000          | 1.6098         | -12.0215         | 1.0                | 13.6313         | -161.7169      | -8.7922      | -0.9070         | -0.8237       |
| 0.0           | 4.5333 | 850  | 0.0000          | 1.6207         | -12.0233         | 1.0                | 13.6439         | -161.7346      | -8.6836      | -0.9078         | -0.8244       |
| 0.0           | 4.8    | 900  | 0.0000          | 1.6133         | -12.0299         | 1.0                | 13.6432         | -161.8011      | -8.7572      | -0.9067         | -0.8232       |
| 0.0           | 5.0667 | 950  | 0.0000          | 1.6119         | -12.0262         | 1.0                | 13.6382         | -161.7639      | -8.7708      | -0.9066         | -0.8232       |
| 0.0           | 5.3333 | 1000 | 0.0000          | 1.6118         | -12.0281         | 1.0                | 13.6398         | -161.7823      | -8.7726      | -0.9066         | -0.8232       |


### Framework versions

- Transformers 4.44.2
- Pytorch 2.4.0+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1