---
base_model: NewEden/trashdwag
library_name: peft
license: other
tags:
- llama-factory
- lora
- generated_from_trainer
model-index:
- name: tinymagnum-r2-KTO-r1
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# tinymagnum-r2-KTO-r1

This model is a fine-tuned version of [NewEden/trashdwag](https://huggingface.co/NewEden/trashdwag) on the combined_kto.json dataset.
It achieves the following results on the evaluation set:
- Loss: 0.5003
- Rewards/chosen: 0.0061
- Logps/chosen: -12.0862
- Rewards/rejected: 0.0023
- Logps/rejected: -16.1405
- Rewards/margins: 0.0039
- Kl: 0.0447

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 64
- total_eval_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.25
- num_epochs: 1.0

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Logps/chosen | Rewards/rejected | Logps/rejected | Rewards/margins | Kl     |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:------------:|:----------------:|:--------------:|:---------------:|:------:|
| 0.5025        | 0.1078 | 16   | 0.5038          | 0.0004         | -12.1438     | 0.0007           | -16.1563       | -0.0003         | 0.0099 |
| 0.502         | 0.2157 | 32   | 0.5019          | 0.0033         | -12.1150     | 0.0018           | -16.1450       | 0.0014          | 0.0200 |
| 0.5026        | 0.3235 | 48   | 0.5013          | 0.0051         | -12.0964     | 0.0027           | -16.1358       | 0.0024          | 0.0335 |
| 0.5021        | 0.4313 | 64   | 0.5015          | 0.0058         | -12.0893     | 0.0036           | -16.1270       | 0.0022          | 0.0406 |
| 0.5017        | 0.5392 | 80   | 0.5012          | 0.0064         | -12.0833     | 0.0037           | -16.1265       | 0.0027          | 0.0434 |
| 0.5003        | 0.6470 | 96   | 0.5007          | 0.0066         | -12.0812     | 0.0032           | -16.1311       | 0.0034          | 0.0431 |
| 0.4996        | 0.7548 | 112  | 0.5012          | 0.0063         | -12.0846     | 0.0028           | -16.1353       | 0.0035          | 0.0437 |
| 0.5077        | 0.8627 | 128  | 0.5005          | 0.0063         | -12.0844     | 0.0026           | -16.1374       | 0.0037          | 0.0433 |
| 0.5012        | 0.9705 | 144  | 0.5004          | 0.0064         | -12.0837     | 0.0023           | -16.1401       | 0.0041          | 0.0431 |


### Framework versions

- PEFT 0.12.0
- Transformers 4.45.0.dev0
- Pytorch 2.3.0a0+ebedce2
- Datasets 2.20.0
- Tokenizers 0.19.1