chchen's picture
End of training
680afe0 verified
metadata
base_model: meta-llama/Llama-3.1-8B-Instruct
library_name: peft
license: llama3.1
tags:
  - llama-factory
  - lora
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: Llama-3.1-8B-Instruct-SAA-600
    results: []

Llama-3.1-8B-Instruct-SAA-600

This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the bct_non_cot_dpo_600 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0943
  • Rewards/chosen: -0.0072
  • Rewards/rejected: -0.0623
  • Rewards/accuracies: 0.8833
  • Rewards/margins: 0.0551
  • Logps/rejected: -0.6233
  • Logps/chosen: -0.0722
  • Logits/rejected: -0.4048
  • Logits/chosen: -0.3432
  • Sft Loss: 0.0119
  • Odds Ratio Loss: 0.8243

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen Sft Loss Odds Ratio Loss
1.3352 1.4815 50 1.0317 -0.0989 -0.1576 0.8333 0.0587 -1.5758 -0.9889 -0.4812 -0.4002 0.1167 9.1492
0.2371 2.9630 100 0.1655 -0.0135 -0.0699 0.8833 0.0564 -0.6987 -0.1348 -0.4551 -0.3813 0.0177 1.4782
0.1421 4.4444 150 0.1010 -0.0077 -0.0577 0.8833 0.0500 -0.5773 -0.0770 -0.4107 -0.3473 0.0124 0.8869
0.1291 5.9259 200 0.0984 -0.0075 -0.0594 0.8833 0.0518 -0.5936 -0.0752 -0.4066 -0.3442 0.0123 0.8613
0.1246 7.4074 250 0.0943 -0.0072 -0.0623 0.8833 0.0551 -0.6233 -0.0722 -0.4048 -0.3432 0.0119 0.8243
0.1045 8.8889 300 0.0948 -0.0072 -0.0628 0.8833 0.0555 -0.6277 -0.0724 -0.4046 -0.3432 0.0119 0.8292

Framework versions

  • PEFT 0.12.0
  • Transformers 4.45.2
  • Pytorch 2.3.0
  • Datasets 2.19.0
  • Tokenizers 0.20.0