--- base_model: meta-llama/Meta-Llama-3-8B-Instruct library_name: peft license: llama3 tags: - trl - kto - generated_from_trainer model-index: - name: kto-aligned-model-lora results: [] --- [

](https://wandb.ai/pauld/huggingface/runs/y7di7l44) # kto-aligned-model-lora This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) on the None dataset. It achieves the following results on the evaluation set: - Loss: 0.4990 - Eval/rewards/chosen: 0.1561 - Eval/logps/chosen: -0.6624 - Eval/rewards/rejected: 0.1281 - Eval/logps/rejected: -1.9415 - Eval/rewards/margins: 0.0281 - Eval/kl: 1.5643 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0001 - train_batch_size: 1 - eval_batch_size: 4 - seed: 42 - gradient_accumulation_steps: 12 - total_train_batch_size: 12 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 5 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | | |:-------------:|:------:|:----:|:---------------:|:------:| | 0.4994 | 0.9057 | 8 | 0.4997 | 0.8856 | | 0.5 | 1.9245 | 17 | 0.4994 | 1.5546 | | 0.501 | 2.9434 | 26 | 0.4992 | 1.5634 | | 0.5004 | 3.9623 | 35 | 0.4991 | 1.5675 | | 0.4999 | 4.5283 | 40 | 0.4990 | 1.5643 | ### Framework versions - PEFT 0.11.1 - Transformers 4.42.2 - Pytorch 2.2.0 - Datasets 2.20.0 - Tokenizers 0.19.1