metadata

base_model: princeton-nlp/Llama-3-Base-8B-SFT
library_name: peft
tags:
  - alignment-handbook
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: llama3-dpo-lora
    results: []

llama3-dpo-lora

This model is a fine-tuned version of princeton-nlp/Llama-3-Base-8B-SFT on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.5199
Rewards/chosen: -0.1477
Rewards/rejected: -0.9502
Rewards/accuracies: 0.7260
Rewards/margins: 0.8025
Logps/rejected: -283.9596
Logps/chosen: -291.2388
Logits/rejected: -0.3914
Logits/chosen: -0.4217

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 1
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 16
total_train_batch_size: 64
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6297	0.1047	100	0.6140	0.1358	-0.1277	0.6960	0.2634	-275.7340	-288.4034	-0.5479	-0.5526
0.5676	0.2094	200	0.5569	-0.1144	-0.6599	0.7000	0.5455	-281.0560	-290.9051	-0.4945	-0.5116
0.5414	0.3141	300	0.5403	-0.3808	-1.0461	0.7260	0.6652	-284.9180	-293.5698	-0.4540	-0.4775
0.5124	0.4187	400	0.5341	-0.2337	-0.9896	0.7040	0.7559	-284.3532	-292.0986	-0.4243	-0.4516
0.5529	0.5234	500	0.5260	-0.2177	-1.0037	0.7240	0.7861	-284.4948	-291.9380	-0.3995	-0.4290
0.53	0.6281	600	0.5244	-0.0687	-0.8583	0.7200	0.7895	-283.0403	-290.4489	-0.4028	-0.4317
0.5028	0.7328	700	0.5190	-0.3357	-1.1360	0.7320	0.8003	-285.8177	-293.1184	-0.3874	-0.4179
0.5347	0.8375	800	0.5191	-0.1404	-0.9419	0.7320	0.8015	-283.8760	-291.1650	-0.3924	-0.4225
0.4783	0.9422	900	0.5190	-0.1399	-0.9459	0.7260	0.8060	-283.9163	-291.1600	-0.3917	-0.4219

Framework versions

PEFT 0.7.1
Transformers 4.44.2
Pytorch 2.2.1+cu121
Datasets 2.14.6
Tokenizers 0.19.1