--- base_model: princeton-nlp/Llama-3-Base-8B-SFT library_name: peft tags: - alignment-handbook - trl - dpo - generated_from_trainer model-index: - name: llama3-dpo-lora results: [] --- # llama3-dpo-lora This model is a fine-tuned version of [princeton-nlp/Llama-3-Base-8B-SFT](https://huggingface.co/princeton-nlp/Llama-3-Base-8B-SFT) on the None dataset. It achieves the following results on the evaluation set: - Loss: 0.5199 - Rewards/chosen: -0.1477 - Rewards/rejected: -0.9502 - Rewards/accuracies: 0.7260 - Rewards/margins: 0.8025 - Logps/rejected: -283.9596 - Logps/chosen: -291.2388 - Logits/rejected: -0.3914 - Logits/chosen: -0.4217 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-06 - train_batch_size: 1 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - num_devices: 4 - gradient_accumulation_steps: 16 - total_train_batch_size: 64 - total_eval_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6297 | 0.1047 | 100 | 0.6140 | 0.1358 | -0.1277 | 0.6960 | 0.2634 | -275.7340 | -288.4034 | -0.5479 | -0.5526 | | 0.5676 | 0.2094 | 200 | 0.5569 | -0.1144 | -0.6599 | 0.7000 | 0.5455 | -281.0560 | -290.9051 | -0.4945 | -0.5116 | | 0.5414 | 0.3141 | 300 | 0.5403 | -0.3808 | -1.0461 | 0.7260 | 0.6652 | -284.9180 | -293.5698 | -0.4540 | -0.4775 | | 0.5124 | 0.4187 | 400 | 0.5341 | -0.2337 | -0.9896 | 0.7040 | 0.7559 | -284.3532 | -292.0986 | -0.4243 | -0.4516 | | 0.5529 | 0.5234 | 500 | 0.5260 | -0.2177 | -1.0037 | 0.7240 | 0.7861 | -284.4948 | -291.9380 | -0.3995 | -0.4290 | | 0.53 | 0.6281 | 600 | 0.5244 | -0.0687 | -0.8583 | 0.7200 | 0.7895 | -283.0403 | -290.4489 | -0.4028 | -0.4317 | | 0.5028 | 0.7328 | 700 | 0.5190 | -0.3357 | -1.1360 | 0.7320 | 0.8003 | -285.8177 | -293.1184 | -0.3874 | -0.4179 | | 0.5347 | 0.8375 | 800 | 0.5191 | -0.1404 | -0.9419 | 0.7320 | 0.8015 | -283.8760 | -291.1650 | -0.3924 | -0.4225 | | 0.4783 | 0.9422 | 900 | 0.5190 | -0.1399 | -0.9459 | 0.7260 | 0.8060 | -283.9163 | -291.1600 | -0.3917 | -0.4219 | ### Framework versions - PEFT 0.7.1 - Transformers 4.44.2 - Pytorch 2.2.1+cu121 - Datasets 2.14.6 - Tokenizers 0.19.1