Weni
/

WeniGPT-QA-Zephyr-7B-4.0.0-KTO

+---
+license: mit
+library_name: peft
+tags:
+- trl
+- kto
+- generated_from_trainer
+base_model: HuggingFaceH4/zephyr-7b-beta
+model-index:
+- name: WeniGPT-QA-Zephyr-7B-4.0.0-KTO
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# WeniGPT-QA-Zephyr-7B-4.0.0-KTO
+This model is a fine-tuned version of [HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) on the None dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.0172
+- Rewards/chosen: 5.2018
+- Rewards/rejected: -101.1277
+- Rewards/margins: 106.3295
+- Kl: 0.6591
+- Logps/chosen: -123.7008
+- Logps/rejected: -1204.3472
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0002
+- train_batch_size: 2
+- eval_batch_size: 2
+- seed: 42
+- gradient_accumulation_steps: 8
+- total_train_batch_size: 16
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_ratio: 0.03
+- training_steps: 786
+- mixed_precision_training: Native AMP
+### Training results
+| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/margins | Kl     | Logps/chosen | Logps/rejected |
+|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:---------------:|:------:|:------------:|:--------------:|
+| 124.9389      | 0.19  | 50   | 0.0980          | 4.3712         | -4.4622          | 8.8334          | 3.2830 | -132.0074    | -237.6924      |
+| 10.8269       | 0.38  | 100  | 0.0399          | 4.0267         | -34.8306         | 38.8572         | 0.7623 | -135.4527    | -541.3764      |
+| 276.3512      | 0.57  | 150  | 0.0280          | 4.7987         | -20.4823         | 25.2810         | 1.6861 | -127.7321    | -397.8935      |
+| 5.7214        | 0.76  | 200  | 0.0299          | 5.0010         | -21.9689         | 26.9699         | 1.5452 | -125.7095    | -412.7599      |
+| 207.9747      | 0.94  | 250  | 0.0262          | 4.8172         | -61.3154         | 66.1326         | 1.1824 | -127.5472    | -806.2249      |
+| 25.0348       | 1.13  | 300  | 0.0206          | 4.9858         | -70.8381         | 75.8240         | 1.4845 | -125.8608    | -901.4517      |
+| 3.1951        | 1.32  | 350  | 0.0265          | 4.6896         | -82.7767         | 87.4663         | 0.6364 | -128.8232    | -1020.8375     |
+| 68.7248       | 1.51  | 400  | 0.0201          | 5.0567         | -53.7706         | 58.8272         | 1.2176 | -125.1527    | -730.7762      |
+| 10.659        | 1.7   | 450  | 0.0263          | 4.9077         | -76.2636         | 81.1714         | 0.8826 | -126.6419    | -955.7070      |
+| 177.5836      | 1.89  | 500  | 0.0187          | 5.1836         | -82.5033         | 87.6869         | 0.4794 | -123.8830    | -1018.1035     |
+| 15.4933       | 2.08  | 550  | 0.0281          | 4.7980         | -95.1968         | 99.9948         | 0.9202 | -127.7392    | -1145.0382     |
+| 3.827         | 2.27  | 600  | 0.0178          | 5.0335         | -96.9958         | 102.0293        | 0.4925 | -125.3841    | -1163.0284     |
+| 16.3759       | 2.45  | 650  | 0.0194          | 5.1136         | -106.3420        | 111.4556        | 0.6069 | -124.5831    | -1256.4906     |
+| 7.4087        | 2.64  | 700  | 0.0172          | 5.2018         | -101.1277        | 106.3295        | 0.6591 | -123.7008    | -1204.3472     |
+| 23.8901       | 2.83  | 750  | 0.0177          | 5.2007         | -102.1235        | 107.3241        | 0.6737 | -123.7124    | -1214.3054     |
+### Framework versions
+- PEFT 0.10.0
+- Transformers 4.39.1
+- Pytorch 2.1.0+cu118
+- Datasets 2.18.0
+- Tokenizers 0.15.2