dpo
This model is a fine-tuned version of mistralai/Mistral-Nemo-Instruct-2407 on the heat_transfer_dpo dataset. It achieves the following results on the evaluation set:
- Loss: 0.1331
- Rewards/chosen: -4.9675
- Rewards/rejected: -13.7312
- Rewards/accuracies: 0.9480
- Rewards/margins: 8.7637
- Logps/chosen: -224.7040
- Logps/rejected: -310.9190
- Logits/chosen: -1.4384
- Logits/rejected: -1.4474
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 5
- eval_batch_size: 5
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- total_train_batch_size: 10
- total_eval_batch_size: 10
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 2
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/chosen | Logps/rejected | Logits/chosen | Logits/rejected |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6939 | 0.0667 | 60 | 0.6921 | -0.0219 | -0.0246 | 0.5190 | 0.0026 | -175.2482 | -173.8529 | -1.4010 | -1.4008 |
0.6871 | 0.1333 | 120 | 0.6830 | -0.0278 | -0.0494 | 0.6080 | 0.0216 | -175.3069 | -174.1010 | -1.4030 | -1.4029 |
0.6159 | 0.2 | 180 | 0.6382 | -0.5399 | -0.7225 | 0.5610 | 0.1826 | -180.4279 | -180.8317 | -1.4021 | -1.4025 |
0.368 | 0.2667 | 240 | 0.3849 | -1.3538 | -2.7449 | 0.8310 | 1.3911 | -188.5674 | -201.0563 | -1.3971 | -1.3996 |
0.3234 | 0.3333 | 300 | 0.3633 | -2.1358 | -4.6104 | 0.8230 | 2.4747 | -196.3865 | -219.7114 | -1.4248 | -1.4282 |
0.2649 | 0.4 | 360 | 0.3037 | -3.3073 | -6.0363 | 0.8800 | 2.7290 | -208.1017 | -233.9699 | -1.4411 | -1.4450 |
0.1784 | 0.4667 | 420 | 0.2159 | -3.8934 | -7.0789 | 0.9100 | 3.1855 | -213.9628 | -244.3959 | -1.4470 | -1.4523 |
0.2608 | 0.5333 | 480 | 0.2073 | -3.8076 | -7.8889 | 0.9100 | 4.0813 | -213.1049 | -252.4960 | -1.4509 | -1.4571 |
0.2459 | 0.6 | 540 | 0.2173 | -4.7738 | -9.6025 | 0.8890 | 4.8287 | -222.7667 | -269.6319 | -1.4478 | -1.4529 |
0.1729 | 0.6667 | 600 | 0.2264 | -3.6641 | -9.1186 | 0.9200 | 5.4546 | -211.6696 | -264.7935 | -1.4379 | -1.4430 |
0.2136 | 0.7333 | 660 | 0.1994 | -3.1520 | -8.0180 | 0.9190 | 4.8660 | -206.5491 | -253.7874 | -1.4456 | -1.4518 |
0.2148 | 0.8 | 720 | 0.2623 | -3.3220 | -8.6375 | 0.9040 | 5.3155 | -208.2492 | -259.9820 | -1.4527 | -1.4588 |
0.151 | 0.8667 | 780 | 0.2628 | -3.7843 | -9.3305 | 0.8830 | 5.5462 | -212.8717 | -266.9124 | -1.4556 | -1.4621 |
0.1759 | 0.9333 | 840 | 0.1736 | -3.7518 | -9.3561 | 0.9270 | 5.6043 | -212.5472 | -267.1683 | -1.4565 | -1.4631 |
0.1455 | 1.0 | 900 | 0.1967 | -3.4547 | -10.0926 | 0.9290 | 6.6379 | -209.5764 | -274.5335 | -1.4551 | -1.4625 |
0.1456 | 1.0667 | 960 | 0.2037 | -3.9507 | -10.4184 | 0.9290 | 6.4677 | -214.5359 | -277.7913 | -1.4538 | -1.4610 |
0.1276 | 1.1333 | 1020 | 0.2090 | -3.7958 | -10.3930 | 0.9240 | 6.5972 | -212.9869 | -277.5373 | -1.4494 | -1.4568 |
0.1768 | 1.2 | 1080 | 0.1744 | -3.7397 | -10.8265 | 0.9350 | 7.0868 | -212.4255 | -281.8718 | -1.4487 | -1.4565 |
0.2379 | 1.2667 | 1140 | 0.1679 | -4.2998 | -11.1092 | 0.9260 | 6.8094 | -218.0269 | -284.6993 | -1.4458 | -1.4532 |
0.0571 | 1.3333 | 1200 | 0.1626 | -4.5185 | -12.4102 | 0.9420 | 7.8917 | -220.2143 | -297.7095 | -1.4335 | -1.4415 |
0.1644 | 1.4 | 1260 | 0.1614 | -4.3048 | -12.2288 | 0.9400 | 7.9240 | -218.0764 | -295.8950 | -1.4410 | -1.4497 |
0.3264 | 1.4667 | 1320 | 0.1427 | -4.5696 | -12.5596 | 0.9470 | 7.9900 | -220.7249 | -299.2028 | -1.4390 | -1.4475 |
0.1088 | 1.5333 | 1380 | 0.1382 | -4.6426 | -12.7848 | 0.9510 | 8.1422 | -221.4554 | -301.4557 | -1.4380 | -1.4465 |
0.1853 | 1.6 | 1440 | 0.1417 | -4.9985 | -13.2069 | 0.9490 | 8.2084 | -225.0136 | -305.6761 | -1.4349 | -1.4433 |
0.1406 | 1.6667 | 1500 | 0.1741 | -5.1167 | -13.8396 | 0.9410 | 8.7229 | -226.1956 | -312.0029 | -1.4283 | -1.4373 |
0.1751 | 1.7333 | 1560 | 0.1433 | -4.9687 | -13.7012 | 0.9480 | 8.7325 | -224.7161 | -310.6195 | -1.4309 | -1.4397 |
0.1648 | 1.8 | 1620 | 0.1368 | -4.9785 | -13.6896 | 0.9500 | 8.7111 | -224.8141 | -310.5035 | -1.4335 | -1.4424 |
0.1109 | 1.8667 | 1680 | 0.1367 | -5.0609 | -13.8370 | 0.9480 | 8.7762 | -225.6376 | -311.9777 | -1.4341 | -1.4430 |
0.1875 | 1.9333 | 1740 | 0.1388 | -5.0304 | -13.7910 | 0.9500 | 8.7607 | -225.3328 | -311.5176 | -1.4356 | -1.4445 |
0.0947 | 2.0 | 1800 | 0.1331 | -4.9675 | -13.7312 | 0.9480 | 8.7637 | -224.7040 | -310.9190 | -1.4384 | -1.4474 |
Framework versions
- PEFT 0.12.0
- Transformers 4.46.0
- Pytorch 2.4.0+cu121
- Datasets 2.21.0
- Tokenizers 0.20.1
- Downloads last month
- 18
Model tree for Howard881010/heat_transfer_dpo
Base model
mistralai/Mistral-Nemo-Base-2407
Finetuned
mistralai/Mistral-Nemo-Instruct-2407