just1nseo's picture
Model save
42963e7 verified
metadata
base_model: allenai/tulu-2-7b
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: tulu-2-7b-full-UF-5e-7
    results: []

tulu-2-7b-full-UF-5e-7

This model is a fine-tuned version of allenai/tulu-2-7b on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9017
  • Rewards/chosen: -4.8659
  • Rewards/rejected: -5.8048
  • Rewards/accuracies: 0.6230
  • Rewards/margins: 0.9389
  • Rewards/margins Max: 5.6516
  • Rewards/margins Min: -2.8163
  • Rewards/margins Std: 2.7854
  • Logps/rejected: -916.6636
  • Logps/chosen: -832.4283
  • Logits/rejected: 0.4957
  • Logits/chosen: 0.2899

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 2
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 16
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Rewards/margins Max Rewards/margins Min Rewards/margins Std Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6816 0.07 100 0.6919 0.0000 -0.0020 0.5417 0.0021 0.0277 -0.0245 0.0175 -336.3843 -345.8331 -1.1956 -1.1695
0.5468 0.15 200 0.6793 -0.1136 -0.1432 0.5794 0.0296 0.2495 -0.1965 0.1511 -350.5013 -357.1989 -1.1509 -1.1466
0.3597 0.22 300 0.6788 -0.9347 -1.0641 0.5714 0.1294 1.0084 -0.7320 0.5779 -442.5906 -439.3020 -1.0512 -1.0629
0.2059 0.29 400 0.7172 -1.9680 -2.3061 0.5972 0.3381 2.3443 -1.3886 1.2205 -566.7862 -542.6320 -0.8695 -0.8807
0.1354 0.37 500 0.8082 -3.1553 -3.7843 0.6190 0.6290 4.0818 -2.2017 2.0321 -714.6080 -661.3674 -0.1617 -0.2554
0.1327 0.44 600 0.8436 -3.8517 -4.6192 0.6190 0.7675 4.8313 -2.4317 2.3526 -798.1056 -731.0093 0.1600 0.0173
0.0777 0.52 700 0.9893 -4.9432 -5.9282 0.6190 0.9850 6.3532 -3.2959 3.1250 -929.0052 -840.1605 0.6301 0.4163
0.0638 0.59 800 0.8086 -3.8655 -4.6357 0.6190 0.7702 4.5021 -2.2919 2.2427 -799.7516 -732.3853 0.2889 0.1244
0.0997 0.66 900 0.8639 -4.4406 -5.3058 0.6270 0.8652 5.1592 -2.6378 2.5658 -866.7603 -789.8954 0.3918 0.2055
0.0708 0.74 1000 0.8618 -4.4546 -5.2895 0.6230 0.8349 5.0604 -2.6224 2.5213 -865.1302 -791.2946 0.4063 0.2199
0.141 0.81 1100 0.9049 -4.8648 -5.7977 0.6190 0.9330 5.6327 -2.8439 2.7856 -915.9548 -832.3105 0.5083 0.3017
0.0775 0.88 1200 0.9049 -4.9040 -5.8585 0.6210 0.9546 5.7130 -2.8316 2.8132 -922.0319 -836.2313 0.5172 0.3074
0.0464 0.96 1300 0.9017 -4.8659 -5.8048 0.6230 0.9389 5.6516 -2.8163 2.7854 -916.6636 -832.4283 0.4957 0.2899

Framework versions

  • Transformers 4.39.0.dev0
  • Pytorch 2.1.0+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2