Mistral-7B-v0.3-dpo-10k

This model is a fine-tuned version of mistralai/Mistral-7B-v0.3 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.4922
  • Rewards/real: -4.5129
  • Rewards/generated: -4.6699
  • Rewards/accuracies: 0.4423
  • Rewards/margins: 0.1570
  • Logps/generated: -155.2124
  • Logps/real: -181.5346
  • Logits/generated: -2.1203
  • Logits/real: -2.3164

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/real Rewards/generated Rewards/accuracies Rewards/margins Logps/generated Logps/real Logits/generated Logits/real
0.7877 0.0992 31 0.7717 0.6905 0.4761 0.7308 0.2144 -103.7523 -129.5006 -2.5223 -2.6981
0.6309 0.1984 62 0.7320 2.1415 1.7428 0.7115 0.3987 -91.0858 -114.9906 -2.5564 -2.7396
0.5309 0.2976 93 0.7175 1.5709 0.9016 0.6538 0.6692 -99.4969 -120.6967 -2.4703 -2.6171
0.4323 0.3968 124 0.7714 1.9586 1.4739 0.6923 0.4847 -93.7744 -116.8195 -2.6808 -2.8349
0.297 0.496 155 0.7161 2.2903 1.7549 0.8077 0.5355 -90.9648 -113.5018 -2.6256 -2.7696
0.2144 0.5952 186 0.8257 1.6038 1.0213 0.7115 0.5825 -98.3000 -120.3671 -2.8900 -3.0602
0.2497 0.6944 217 0.6849 2.4543 1.7831 0.8077 0.6712 -90.6823 -111.8619 -2.6469 -2.8201
0.1112 0.7936 248 0.6993 2.2831 1.4322 0.7885 0.8508 -94.1910 -113.5747 -2.7020 -2.8645
0.176 0.8928 279 0.6700 2.7841 2.2447 0.7692 0.5394 -86.0663 -108.5641 -2.8051 -2.9280
0.1135 0.992 310 0.6956 2.3849 1.8198 0.7885 0.5651 -90.3149 -112.5561 -2.8024 -2.9203
0.1221 1.0912 341 0.7314 2.2046 1.6143 0.7308 0.5903 -92.3708 -114.3593 -2.5886 -2.7365
0.0864 1.1904 372 0.7718 2.3206 1.9459 0.6346 0.3747 -89.0543 -113.1994 -2.5355 -2.7014
0.0871 1.2896 403 0.8231 1.9873 1.7063 0.5962 0.2810 -91.4506 -116.5322 -2.5240 -2.6833
0.1454 1.3888 434 0.7980 1.7358 1.2782 0.6731 0.4576 -95.7309 -119.0471 -2.4325 -2.6120
0.0747 1.488 465 0.8086 1.9033 1.4938 0.6538 0.4094 -93.5750 -117.3725 -2.3557 -2.5683
0.0882 1.5872 496 0.9281 0.8252 0.4834 0.5192 0.3418 -103.6798 -128.1537 -2.2722 -2.4783
0.0693 1.6864 527 0.8954 0.5032 -0.0439 0.6154 0.5471 -108.9523 -131.3737 -2.1399 -2.3681
0.0982 1.7856 558 0.8777 1.0122 0.5411 0.6538 0.4711 -103.1028 -126.2834 -2.3326 -2.5183
0.0674 1.8848 589 0.9360 -0.0587 -0.5311 0.5962 0.4724 -113.8238 -136.9920 -2.3026 -2.4848
0.0424 1.984 620 0.9421 -0.2586 -0.6968 0.5769 0.4382 -115.4816 -138.9915 -2.2955 -2.4846
0.0235 2.0832 651 1.0939 -1.6766 -2.0193 0.5 0.3428 -128.7065 -153.1709 -2.2115 -2.3974
0.024 2.1824 682 1.1491 -2.1565 -2.5396 0.5 0.3831 -133.9093 -157.9701 -2.2049 -2.3936
0.0469 2.2816 713 1.1324 -2.0618 -2.4801 0.5 0.4183 -133.3140 -157.0232 -2.2161 -2.4094
0.0328 2.3808 744 1.1837 -2.4534 -2.7702 0.4808 0.3168 -136.2151 -160.9390 -2.2080 -2.3952
0.0367 2.48 775 1.1779 -2.6139 -2.9724 0.4808 0.3585 -138.2376 -162.5442 -2.1815 -2.3777
0.0596 2.5792 806 1.2847 -3.3490 -3.6206 0.4231 0.2716 -144.7193 -169.8953 -2.1523 -2.3458
0.0395 2.6784 837 1.3358 -3.6588 -3.9010 0.4231 0.2422 -147.5237 -172.9937 -2.1399 -2.3346
0.0302 2.7776 868 1.3725 -3.7911 -4.0386 0.4231 0.2474 -148.8990 -174.3167 -2.1529 -2.3475
0.0132 2.8768 899 1.4969 -4.4629 -4.6237 0.4423 0.1607 -154.7499 -181.0344 -2.1227 -2.3178
0.034 2.976 930 1.4922 -4.5129 -4.6699 0.4423 0.1570 -155.2124 -181.5346 -2.1203 -2.3164

Framework versions

  • Transformers 4.43.3
  • Pytorch 2.2.2+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
15
Safetensors
Model size
7.25B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for AmberYifan/Mistral-7B-v0.3-dpo-10k

Finetuned
(129)
this model