File size: 3,726 Bytes
b62ce55 797a0f9 b62ce55 797a0f9 b62ce55 797a0f9 b62ce55 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
---
base_model: princeton-nlp/Llama-3-Base-8B-SFT
library_name: peft
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
model-index:
- name: llama3-dpo-lora
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# llama3-dpo-lora
This model is a fine-tuned version of [princeton-nlp/Llama-3-Base-8B-SFT](https://huggingface.co/princeton-nlp/Llama-3-Base-8B-SFT) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.5199
- Rewards/chosen: -0.1477
- Rewards/rejected: -0.9502
- Rewards/accuracies: 0.7260
- Rewards/margins: 0.8025
- Logps/rejected: -283.9596
- Logps/chosen: -291.2388
- Logits/rejected: -0.3914
- Logits/chosen: -0.4217
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 1
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 16
- total_train_batch_size: 64
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
### Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6297 | 0.1047 | 100 | 0.6140 | 0.1358 | -0.1277 | 0.6960 | 0.2634 | -275.7340 | -288.4034 | -0.5479 | -0.5526 |
| 0.5676 | 0.2094 | 200 | 0.5569 | -0.1144 | -0.6599 | 0.7000 | 0.5455 | -281.0560 | -290.9051 | -0.4945 | -0.5116 |
| 0.5414 | 0.3141 | 300 | 0.5403 | -0.3808 | -1.0461 | 0.7260 | 0.6652 | -284.9180 | -293.5698 | -0.4540 | -0.4775 |
| 0.5124 | 0.4187 | 400 | 0.5341 | -0.2337 | -0.9896 | 0.7040 | 0.7559 | -284.3532 | -292.0986 | -0.4243 | -0.4516 |
| 0.5529 | 0.5234 | 500 | 0.5260 | -0.2177 | -1.0037 | 0.7240 | 0.7861 | -284.4948 | -291.9380 | -0.3995 | -0.4290 |
| 0.53 | 0.6281 | 600 | 0.5244 | -0.0687 | -0.8583 | 0.7200 | 0.7895 | -283.0403 | -290.4489 | -0.4028 | -0.4317 |
| 0.5028 | 0.7328 | 700 | 0.5190 | -0.3357 | -1.1360 | 0.7320 | 0.8003 | -285.8177 | -293.1184 | -0.3874 | -0.4179 |
| 0.5347 | 0.8375 | 800 | 0.5191 | -0.1404 | -0.9419 | 0.7320 | 0.8015 | -283.8760 | -291.1650 | -0.3924 | -0.4225 |
| 0.4783 | 0.9422 | 900 | 0.5190 | -0.1399 | -0.9459 | 0.7260 | 0.8060 | -283.9163 | -291.1600 | -0.3917 | -0.4219 |
### Framework versions
- PEFT 0.7.1
- Transformers 4.44.2
- Pytorch 2.2.1+cu121
- Datasets 2.14.6
- Tokenizers 0.19.1 |