Llama-3.1-8B-Instruct-SAA-1000
This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the bct_non_cot_dpo_1000 dataset. It achieves the following results on the evaluation set:
- Loss: 0.1041
- Rewards/chosen: -0.0071
- Rewards/rejected: -0.0574
- Rewards/accuracies: 0.8700
- Rewards/margins: 0.0503
- Logps/rejected: -0.5741
- Logps/chosen: -0.0707
- Logits/rejected: -0.3997
- Logits/chosen: -0.3439
- Sft Loss: 0.0083
- Odds Ratio Loss: 0.9577
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 10.0
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Sft Loss | Odds Ratio Loss |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1.61 | 0.8889 | 50 | 1.4462 | -0.1395 | -0.1818 | 0.7900 | 0.0423 | -1.8179 | -1.3950 | -0.4872 | -0.4121 | 0.1643 | 12.8185 |
0.3241 | 1.7778 | 100 | 0.2648 | -0.0222 | -0.0659 | 0.8200 | 0.0438 | -0.6595 | -0.2217 | -0.4637 | -0.3875 | 0.0232 | 2.4164 |
0.1509 | 2.6667 | 150 | 0.1238 | -0.0084 | -0.0490 | 0.8600 | 0.0406 | -0.4900 | -0.0840 | -0.4176 | -0.3601 | 0.0101 | 1.1374 |
0.1335 | 3.5556 | 200 | 0.1089 | -0.0074 | -0.0505 | 0.8600 | 0.0432 | -0.5055 | -0.0738 | -0.4038 | -0.3492 | 0.0087 | 1.0023 |
0.1253 | 4.4444 | 250 | 0.1136 | -0.0078 | -0.0536 | 0.8800 | 0.0458 | -0.5355 | -0.0776 | -0.3998 | -0.3449 | 0.0097 | 1.0396 |
0.0851 | 5.3333 | 300 | 0.1041 | -0.0071 | -0.0574 | 0.8700 | 0.0503 | -0.5741 | -0.0707 | -0.3997 | -0.3439 | 0.0083 | 0.9577 |
0.0824 | 6.2222 | 350 | 0.1065 | -0.0073 | -0.0587 | 0.8700 | 0.0514 | -0.5869 | -0.0728 | -0.3969 | -0.3419 | 0.0088 | 0.9767 |
0.0869 | 7.1111 | 400 | 0.1160 | -0.0080 | -0.0625 | 0.8800 | 0.0545 | -0.6250 | -0.0801 | -0.3942 | -0.3392 | 0.0102 | 1.0581 |
0.0715 | 8.0 | 450 | 0.1095 | -0.0075 | -0.0618 | 0.8800 | 0.0543 | -0.6184 | -0.0750 | -0.3933 | -0.3379 | 0.0092 | 1.0028 |
0.0751 | 8.8889 | 500 | 0.1095 | -0.0075 | -0.0618 | 0.8800 | 0.0543 | -0.6181 | -0.0752 | -0.3939 | -0.3386 | 0.0093 | 1.0026 |
0.0784 | 9.7778 | 550 | 0.1089 | -0.0075 | -0.0622 | 0.8700 | 0.0547 | -0.6221 | -0.0747 | -0.3937 | -0.3381 | 0.0091 | 0.9983 |
Framework versions
- PEFT 0.12.0
- Transformers 4.45.2
- Pytorch 2.3.0
- Datasets 2.19.0
- Tokenizers 0.20.0
- Downloads last month
- 8
Model tree for chchen/Llama-3.1-8B-Instruct-SAA-1000
Base model
meta-llama/Llama-3.1-8B
Finetuned
meta-llama/Llama-3.1-8B-Instruct