model_usp3_dpo5
This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.9416
- Rewards/chosen: -10.2877
- Rewards/rejected: -15.9492
- Rewards/accuracies: 0.6500
- Rewards/margins: 5.6615
- Logps/rejected: -144.5642
- Logps/chosen: -130.7199
- Logits/rejected: -0.9327
- Logits/chosen: -0.9193
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0005
- train_batch_size: 4
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.0839 | 2.67 | 100 | 1.2957 | -6.5028 | -8.3281 | 0.6100 | 1.8253 | -129.3219 | -123.1500 | -0.2745 | -0.2356 |
0.0042 | 5.33 | 200 | 1.7603 | -5.5711 | -8.4268 | 0.7000 | 2.8558 | -129.5194 | -121.2866 | -0.9020 | -0.8864 |
0.0011 | 8.0 | 300 | 1.6254 | -11.3020 | -17.4510 | 0.6900 | 6.1490 | -147.5677 | -132.7485 | -0.7903 | -0.7597 |
0.0 | 10.67 | 400 | 1.9460 | -10.1997 | -15.8452 | 0.6500 | 5.6455 | -144.3562 | -130.5438 | -0.9297 | -0.9163 |
0.0 | 13.33 | 500 | 1.9345 | -10.2432 | -15.9204 | 0.6500 | 5.6772 | -144.5066 | -130.6309 | -0.9318 | -0.9181 |
0.0 | 16.0 | 600 | 1.9390 | -10.2636 | -15.9466 | 0.6500 | 5.6831 | -144.5590 | -130.6716 | -0.9321 | -0.9182 |
0.0 | 18.67 | 700 | 1.9438 | -10.2986 | -15.9585 | 0.6500 | 5.6599 | -144.5827 | -130.7415 | -0.9326 | -0.9190 |
0.0 | 21.33 | 800 | 1.9351 | -10.2903 | -15.9732 | 0.6500 | 5.6829 | -144.6121 | -130.7250 | -0.9323 | -0.9188 |
0.0 | 24.0 | 900 | 1.9341 | -10.3034 | -15.9669 | 0.6500 | 5.6635 | -144.5995 | -130.7512 | -0.9328 | -0.9192 |
0.0 | 26.67 | 1000 | 1.9416 | -10.2877 | -15.9492 | 0.6500 | 5.6615 | -144.5642 | -130.7199 | -0.9327 | -0.9193 |
Framework versions
- PEFT 0.10.0
- Transformers 4.39.3
- Pytorch 2.2.2+cu121
- Datasets 2.18.0
- Tokenizers 0.15.2
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for guoyu-zhang/model_usp3_dpo5
Base model
meta-llama/Llama-2-7b-chat-hf