model_hh_usp3_dpo9
This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 4.9971
- Rewards/chosen: -22.6484
- Rewards/rejected: -28.5100
- Rewards/accuracies: 0.6200
- Rewards/margins: 5.8617
- Logps/rejected: -144.5019
- Logps/chosen: -138.1667
- Logits/rejected: -0.4573
- Logits/chosen: -0.4284
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0005
- train_batch_size: 4
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.0667 | 2.67 | 100 | 1.2931 | -0.0486 | -2.3792 | 0.6500 | 2.3307 | -115.4677 | -113.0558 | -0.0497 | -0.0386 |
0.0265 | 5.33 | 200 | 2.5238 | -3.3105 | -7.5646 | 0.6600 | 4.2541 | -121.2292 | -116.6801 | -0.3923 | -0.3765 |
0.139 | 8.0 | 300 | 4.4570 | -13.8321 | -19.1751 | 0.6100 | 5.3430 | -134.1298 | -128.3709 | -0.2657 | -0.2456 |
0.0061 | 10.67 | 400 | 4.9964 | -19.0684 | -25.0784 | 0.6300 | 6.0099 | -140.6890 | -134.1890 | -0.4660 | -0.4443 |
0.0 | 13.33 | 500 | 5.0051 | -22.7007 | -28.5148 | 0.6100 | 5.8141 | -144.5073 | -138.2248 | -0.4580 | -0.4287 |
0.0 | 16.0 | 600 | 4.9951 | -22.7131 | -28.5252 | 0.6000 | 5.8121 | -144.5188 | -138.2386 | -0.4569 | -0.4278 |
0.0 | 18.67 | 700 | 4.9801 | -22.6913 | -28.5241 | 0.6200 | 5.8329 | -144.5176 | -138.2144 | -0.4571 | -0.4278 |
0.0 | 21.33 | 800 | 4.9915 | -22.6547 | -28.5091 | 0.6000 | 5.8544 | -144.5009 | -138.1738 | -0.4569 | -0.4278 |
0.0 | 24.0 | 900 | 4.9990 | -22.6732 | -28.5298 | 0.6200 | 5.8566 | -144.5239 | -138.1943 | -0.4568 | -0.4277 |
0.0 | 26.67 | 1000 | 4.9971 | -22.6484 | -28.5100 | 0.6200 | 5.8617 | -144.5019 | -138.1667 | -0.4573 | -0.4284 |
Framework versions
- PEFT 0.10.0
- Transformers 4.39.3
- Pytorch 2.2.2+cu121
- Datasets 2.18.0
- Tokenizers 0.15.2
- Downloads last month
- 1
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for guoyu-zhang/model_hh_usp3_dpo9
Base model
meta-llama/Llama-2-7b-chat-hf