---
license: mit
library_name: peft
tags:
- trl
- kto
- generated_from_trainer
base_model: HuggingFaceH4/zephyr-7b-beta
model-index:
- name: WeniGPT-QA-Zephyr-7B-4.0.0-KTO
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# WeniGPT-QA-Zephyr-7B-4.0.0-KTO

This model is a fine-tuned version of [HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0172
- Rewards/chosen: 5.2018
- Rewards/rejected: -101.1277
- Rewards/margins: 106.3295
- Kl: 0.6591
- Logps/chosen: -123.7008
- Logps/rejected: -1204.3472

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.03
- training_steps: 786
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/margins | Kl     | Logps/chosen | Logps/rejected |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:---------------:|:------:|:------------:|:--------------:|
| 124.9389      | 0.19  | 50   | 0.0980          | 4.3712         | -4.4622          | 8.8334          | 3.2830 | -132.0074    | -237.6924      |
| 10.8269       | 0.38  | 100  | 0.0399          | 4.0267         | -34.8306         | 38.8572         | 0.7623 | -135.4527    | -541.3764      |
| 276.3512      | 0.57  | 150  | 0.0280          | 4.7987         | -20.4823         | 25.2810         | 1.6861 | -127.7321    | -397.8935      |
| 5.7214        | 0.76  | 200  | 0.0299          | 5.0010         | -21.9689         | 26.9699         | 1.5452 | -125.7095    | -412.7599      |
| 207.9747      | 0.94  | 250  | 0.0262          | 4.8172         | -61.3154         | 66.1326         | 1.1824 | -127.5472    | -806.2249      |
| 25.0348       | 1.13  | 300  | 0.0206          | 4.9858         | -70.8381         | 75.8240         | 1.4845 | -125.8608    | -901.4517      |
| 3.1951        | 1.32  | 350  | 0.0265          | 4.6896         | -82.7767         | 87.4663         | 0.6364 | -128.8232    | -1020.8375     |
| 68.7248       | 1.51  | 400  | 0.0201          | 5.0567         | -53.7706         | 58.8272         | 1.2176 | -125.1527    | -730.7762      |
| 10.659        | 1.7   | 450  | 0.0263          | 4.9077         | -76.2636         | 81.1714         | 0.8826 | -126.6419    | -955.7070      |
| 177.5836      | 1.89  | 500  | 0.0187          | 5.1836         | -82.5033         | 87.6869         | 0.4794 | -123.8830    | -1018.1035     |
| 15.4933       | 2.08  | 550  | 0.0281          | 4.7980         | -95.1968         | 99.9948         | 0.9202 | -127.7392    | -1145.0382     |
| 3.827         | 2.27  | 600  | 0.0178          | 5.0335         | -96.9958         | 102.0293        | 0.4925 | -125.3841    | -1163.0284     |
| 16.3759       | 2.45  | 650  | 0.0194          | 5.1136         | -106.3420        | 111.4556        | 0.6069 | -124.5831    | -1256.4906     |
| 7.4087        | 2.64  | 700  | 0.0172          | 5.2018         | -101.1277        | 106.3295        | 0.6591 | -123.7008    | -1204.3472     |
| 23.8901       | 2.83  | 750  | 0.0177          | 5.2007         | -102.1235        | 107.3241        | 0.6737 | -123.7124    | -1214.3054     |


### Framework versions

- PEFT 0.10.0
- Transformers 4.39.1
- Pytorch 2.1.0+cu118
- Datasets 2.18.0
- Tokenizers 0.15.2