metadata
base_model: allenai/tulu-2-7b
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: tulu-2-7b-full-UF-5e-7
results: []
tulu-2-7b-full-UF-5e-7
This model is a fine-tuned version of allenai/tulu-2-7b on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.9017
- Rewards/chosen: -4.8659
- Rewards/rejected: -5.8048
- Rewards/accuracies: 0.6230
- Rewards/margins: 0.9389
- Rewards/margins Max: 5.6516
- Rewards/margins Min: -2.8163
- Rewards/margins Std: 2.7854
- Logps/rejected: -916.6636
- Logps/chosen: -832.4283
- Logits/rejected: 0.4957
- Logits/chosen: 0.2899
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 2
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 16
- total_eval_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/margins Max | Rewards/margins Min | Rewards/margins Std | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.6816 | 0.07 | 100 | 0.6919 | 0.0000 | -0.0020 | 0.5417 | 0.0021 | 0.0277 | -0.0245 | 0.0175 | -336.3843 | -345.8331 | -1.1956 | -1.1695 |
0.5468 | 0.15 | 200 | 0.6793 | -0.1136 | -0.1432 | 0.5794 | 0.0296 | 0.2495 | -0.1965 | 0.1511 | -350.5013 | -357.1989 | -1.1509 | -1.1466 |
0.3597 | 0.22 | 300 | 0.6788 | -0.9347 | -1.0641 | 0.5714 | 0.1294 | 1.0084 | -0.7320 | 0.5779 | -442.5906 | -439.3020 | -1.0512 | -1.0629 |
0.2059 | 0.29 | 400 | 0.7172 | -1.9680 | -2.3061 | 0.5972 | 0.3381 | 2.3443 | -1.3886 | 1.2205 | -566.7862 | -542.6320 | -0.8695 | -0.8807 |
0.1354 | 0.37 | 500 | 0.8082 | -3.1553 | -3.7843 | 0.6190 | 0.6290 | 4.0818 | -2.2017 | 2.0321 | -714.6080 | -661.3674 | -0.1617 | -0.2554 |
0.1327 | 0.44 | 600 | 0.8436 | -3.8517 | -4.6192 | 0.6190 | 0.7675 | 4.8313 | -2.4317 | 2.3526 | -798.1056 | -731.0093 | 0.1600 | 0.0173 |
0.0777 | 0.52 | 700 | 0.9893 | -4.9432 | -5.9282 | 0.6190 | 0.9850 | 6.3532 | -3.2959 | 3.1250 | -929.0052 | -840.1605 | 0.6301 | 0.4163 |
0.0638 | 0.59 | 800 | 0.8086 | -3.8655 | -4.6357 | 0.6190 | 0.7702 | 4.5021 | -2.2919 | 2.2427 | -799.7516 | -732.3853 | 0.2889 | 0.1244 |
0.0997 | 0.66 | 900 | 0.8639 | -4.4406 | -5.3058 | 0.6270 | 0.8652 | 5.1592 | -2.6378 | 2.5658 | -866.7603 | -789.8954 | 0.3918 | 0.2055 |
0.0708 | 0.74 | 1000 | 0.8618 | -4.4546 | -5.2895 | 0.6230 | 0.8349 | 5.0604 | -2.6224 | 2.5213 | -865.1302 | -791.2946 | 0.4063 | 0.2199 |
0.141 | 0.81 | 1100 | 0.9049 | -4.8648 | -5.7977 | 0.6190 | 0.9330 | 5.6327 | -2.8439 | 2.7856 | -915.9548 | -832.3105 | 0.5083 | 0.3017 |
0.0775 | 0.88 | 1200 | 0.9049 | -4.9040 | -5.8585 | 0.6210 | 0.9546 | 5.7130 | -2.8316 | 2.8132 | -922.0319 | -836.2313 | 0.5172 | 0.3074 |
0.0464 | 0.96 | 1300 | 0.9017 | -4.8659 | -5.8048 | 0.6230 | 0.9389 | 5.6516 | -2.8163 | 2.7854 | -916.6636 | -832.4283 | 0.4957 | 0.2899 |
Framework versions
- Transformers 4.39.0.dev0
- Pytorch 2.1.0+cu121
- Datasets 2.14.6
- Tokenizers 0.15.2