model_hh_usp4_dpo9
This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 4.0767
- Rewards/chosen: -1.1762
- Rewards/rejected: -7.5013
- Rewards/accuracies: 0.6300
- Rewards/margins: 6.3252
- Logps/rejected: -117.1809
- Logps/chosen: -115.1588
- Logits/rejected: -0.1065
- Logits/chosen: -0.0807
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0005
- train_batch_size: 4
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.0472 | 2.67 | 100 | 1.5866 | -2.4737 | -5.2244 | 0.6600 | 2.7507 | -114.6509 | -116.6005 | -0.1123 | -0.1103 |
0.061 | 5.33 | 200 | 2.8352 | -8.5414 | -13.8302 | 0.6600 | 5.2888 | -124.2130 | -123.3425 | -0.2214 | -0.1997 |
0.0022 | 8.0 | 300 | 3.6078 | -5.7355 | -11.8144 | 0.6600 | 6.0789 | -121.9732 | -120.2247 | -0.2463 | -0.2014 |
0.0001 | 10.67 | 400 | 4.1244 | -1.6102 | -7.8752 | 0.6300 | 6.2650 | -117.5963 | -115.6411 | -0.1230 | -0.0965 |
0.0 | 13.33 | 500 | 4.0644 | -1.1614 | -7.5191 | 0.6300 | 6.3577 | -117.2006 | -115.1424 | -0.1061 | -0.0806 |
0.0 | 16.0 | 600 | 4.0669 | -1.1412 | -7.4965 | 0.6300 | 6.3554 | -117.1756 | -115.1199 | -0.1068 | -0.0813 |
0.0 | 18.67 | 700 | 4.0482 | -1.1597 | -7.5269 | 0.6300 | 6.3672 | -117.2094 | -115.1405 | -0.1065 | -0.0810 |
0.0 | 21.33 | 800 | 4.0720 | -1.1432 | -7.5025 | 0.6300 | 6.3594 | -117.1822 | -115.1221 | -0.1067 | -0.0811 |
0.0 | 24.0 | 900 | 4.0691 | -1.1439 | -7.4980 | 0.6300 | 6.3541 | -117.1772 | -115.1229 | -0.1069 | -0.0810 |
0.0 | 26.67 | 1000 | 4.0767 | -1.1762 | -7.5013 | 0.6300 | 6.3252 | -117.1809 | -115.1588 | -0.1065 | -0.0807 |
Framework versions
- PEFT 0.10.0
- Transformers 4.39.3
- Pytorch 2.2.2+cu121
- Datasets 2.18.0
- Tokenizers 0.15.2
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for guoyu-zhang/model_hh_usp4_dpo9
Base model
meta-llama/Llama-2-7b-chat-hf