---
library_name: transformers
license: llama3
base_model: tsavage68/IE_L3_1000steps_1e6rate_SFT
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: IE_L3_1000steps_1e6rate_05beta_cSFTDPO
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# IE_L3_1000steps_1e6rate_05beta_cSFTDPO

This model is a fine-tuned version of [tsavage68/IE_L3_1000steps_1e6rate_SFT](https://huggingface.co/tsavage68/IE_L3_1000steps_1e6rate_SFT) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.1802
- Rewards/chosen: -1.4168
- Rewards/rejected: -13.8543
- Rewards/accuracies: 0.7400
- Rewards/margins: 12.4374
- Logps/rejected: -103.3358
- Logps/chosen: -85.6314
- Logits/rejected: -0.7970
- Logits/chosen: -0.7188

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.1906        | 0.4   | 50   | 0.1802          | -1.0109        | -11.1903         | 0.7400             | 10.1794         | -98.0078       | -84.8196     | -0.7939         | -0.7206       |
| 0.1386        | 0.8   | 100  | 0.1802          | -1.2190        | -12.1625         | 0.7400             | 10.9435         | -99.9523       | -85.2358     | -0.7944         | -0.7197       |
| 0.1386        | 1.2   | 150  | 0.1802          | -1.2782        | -12.5852         | 0.7400             | 11.3070         | -100.7976      | -85.3541     | -0.7943         | -0.7189       |
| 0.1733        | 1.6   | 200  | 0.1802          | -1.3094        | -13.0296         | 0.7400             | 11.7202         | -101.6864      | -85.4166     | -0.7948         | -0.7186       |
| 0.2253        | 2.0   | 250  | 0.1802          | -1.3248        | -13.1625         | 0.7400             | 11.8377         | -101.9522      | -85.4473     | -0.7952         | -0.7186       |
| 0.1386        | 2.4   | 300  | 0.1802          | -1.3337        | -13.2622         | 0.7400             | 11.9285         | -102.1515      | -85.4652     | -0.7942         | -0.7174       |
| 0.1213        | 2.8   | 350  | 0.1802          | -1.3670        | -13.4507         | 0.7400             | 12.0837         | -102.5286      | -85.5317     | -0.7953         | -0.7178       |
| 0.1906        | 3.2   | 400  | 0.1802          | -1.3818        | -13.5334         | 0.7400             | 12.1517         | -102.6941      | -85.5613     | -0.7964         | -0.7189       |
| 0.1906        | 3.6   | 450  | 0.1802          | -1.3800        | -13.5899         | 0.7400             | 12.2099         | -102.8071      | -85.5577     | -0.7964         | -0.7189       |
| 0.2079        | 4.0   | 500  | 0.1802          | -1.3816        | -13.6722         | 0.7400             | 12.2906         | -102.9716      | -85.5610     | -0.7966         | -0.7187       |
| 0.156         | 4.4   | 550  | 0.1802          | -1.4142        | -13.7800         | 0.7400             | 12.3657         | -103.1872      | -85.6262     | -0.7956         | -0.7175       |
| 0.1213        | 4.8   | 600  | 0.1802          | -1.3864        | -13.7736         | 0.7400             | 12.3872         | -103.1744      | -85.5705     | -0.7974         | -0.7192       |
| 0.1906        | 5.2   | 650  | 0.1802          | -1.4252        | -13.8450         | 0.7400             | 12.4197         | -103.3172      | -85.6483     | -0.7969         | -0.7187       |
| 0.2426        | 5.6   | 700  | 0.1802          | -1.4087        | -13.8154         | 0.7400             | 12.4068         | -103.2581      | -85.6151     | -0.7974         | -0.7196       |
| 0.2599        | 6.0   | 750  | 0.1802          | -1.4077        | -13.8712         | 0.7400             | 12.4635         | -103.3696      | -85.6131     | -0.7977         | -0.7194       |
| 0.1213        | 6.4   | 800  | 0.1802          | -1.4158        | -13.9034         | 0.7400             | 12.4876         | -103.4339      | -85.6293     | -0.7977         | -0.7195       |
| 0.2426        | 6.8   | 850  | 0.1802          | -1.4105        | -13.8922         | 0.7400             | 12.4817         | -103.4116      | -85.6187     | -0.7979         | -0.7200       |
| 0.1733        | 7.2   | 900  | 0.1802          | -1.4075        | -13.8657         | 0.7400             | 12.4582         | -103.3587      | -85.6128     | -0.7970         | -0.7189       |
| 0.1386        | 7.6   | 950  | 0.1802          | -1.4138        | -13.8523         | 0.7400             | 12.4386         | -103.3319      | -85.6253     | -0.7971         | -0.7188       |
| 0.156         | 8.0   | 1000 | 0.1802          | -1.4168        | -13.8543         | 0.7400             | 12.4374         | -103.3358      | -85.6314     | -0.7970         | -0.7188       |


### Framework versions

- Transformers 4.44.2
- Pytorch 2.0.0+cu117
- Datasets 3.0.0
- Tokenizers 0.19.1