UTI2_M2_1000steps_1e8rate_01beta_CSFTDPO
This model is a fine-tuned version of tsavage68/UTI_M2_1000steps_1e7rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.6638
- Rewards/chosen: 0.0017
- Rewards/rejected: -0.0583
- Rewards/accuracies: 0.8700
- Rewards/margins: 0.0600
- Logps/rejected: -39.9390
- Logps/chosen: -19.9040
- Logits/rejected: -2.6810
- Logits/chosen: -2.6784
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-08
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6931 | 0.3333 | 25 | 0.6929 | 0.0006 | 0.0000 | 0.1900 | 0.0006 | -39.3560 | -19.9153 | -2.6832 | -2.6806 |
0.6898 | 0.6667 | 50 | 0.6940 | -0.0024 | -0.0009 | 0.4700 | -0.0015 | -39.3651 | -19.9455 | -2.6823 | -2.6798 |
0.6943 | 1.0 | 75 | 0.6915 | 0.0010 | -0.0025 | 0.5200 | 0.0035 | -39.3809 | -19.9114 | -2.6828 | -2.6803 |
0.6889 | 1.3333 | 100 | 0.6916 | 0.0004 | -0.0029 | 0.5100 | 0.0033 | -39.3855 | -19.9178 | -2.6827 | -2.6801 |
0.6891 | 1.6667 | 125 | 0.6877 | 0.0008 | -0.0102 | 0.6000 | 0.0110 | -39.4581 | -19.9132 | -2.6824 | -2.6798 |
0.6876 | 2.0 | 150 | 0.6854 | 0.0015 | -0.0143 | 0.6900 | 0.0158 | -39.4992 | -19.9067 | -2.6812 | -2.6786 |
0.681 | 2.3333 | 175 | 0.6793 | 0.0011 | -0.0270 | 0.7900 | 0.0280 | -39.6258 | -19.9108 | -2.6817 | -2.6791 |
0.6782 | 2.6667 | 200 | 0.6775 | 0.0056 | -0.0263 | 0.8000 | 0.0319 | -39.6187 | -19.8653 | -2.6821 | -2.6795 |
0.6763 | 3.0 | 225 | 0.6739 | 0.0035 | -0.0355 | 0.8200 | 0.0390 | -39.7113 | -19.8864 | -2.6810 | -2.6784 |
0.6716 | 3.3333 | 250 | 0.6697 | 0.0024 | -0.0454 | 0.8600 | 0.0478 | -39.8106 | -19.8975 | -2.6814 | -2.6788 |
0.6724 | 3.6667 | 275 | 0.6698 | 0.0055 | -0.0420 | 0.8600 | 0.0475 | -39.7765 | -19.8664 | -2.6809 | -2.6783 |
0.6706 | 4.0 | 300 | 0.6686 | 0.0042 | -0.0458 | 0.8600 | 0.0500 | -39.8142 | -19.8793 | -2.6812 | -2.6786 |
0.6644 | 4.3333 | 325 | 0.6649 | 0.0063 | -0.0515 | 0.8600 | 0.0578 | -39.8714 | -19.8588 | -2.6808 | -2.6782 |
0.6626 | 4.6667 | 350 | 0.6650 | 0.0053 | -0.0523 | 0.8300 | 0.0575 | -39.8789 | -19.8689 | -2.6814 | -2.6788 |
0.6656 | 5.0 | 375 | 0.6636 | 0.0063 | -0.0541 | 0.8800 | 0.0604 | -39.8971 | -19.8587 | -2.6809 | -2.6783 |
0.6672 | 5.3333 | 400 | 0.6643 | 0.0062 | -0.0527 | 0.8600 | 0.0589 | -39.8830 | -19.8595 | -2.6801 | -2.6775 |
0.6627 | 5.6667 | 425 | 0.6650 | 0.0019 | -0.0557 | 0.8300 | 0.0575 | -39.9129 | -19.9029 | -2.6806 | -2.6780 |
0.6641 | 6.0 | 450 | 0.6646 | 0.0073 | -0.0510 | 0.8500 | 0.0583 | -39.8660 | -19.8482 | -2.6807 | -2.6781 |
0.6608 | 6.3333 | 475 | 0.6632 | 0.0072 | -0.0541 | 0.8700 | 0.0612 | -39.8970 | -19.8500 | -2.6801 | -2.6776 |
0.6733 | 6.6667 | 500 | 0.6626 | 0.0067 | -0.0559 | 0.8400 | 0.0626 | -39.9148 | -19.8543 | -2.6808 | -2.6782 |
0.6596 | 7.0 | 525 | 0.6628 | 0.0064 | -0.0558 | 0.8500 | 0.0622 | -39.9138 | -19.8576 | -2.6813 | -2.6787 |
0.6612 | 7.3333 | 550 | 0.6627 | 0.0063 | -0.0561 | 0.8700 | 0.0624 | -39.9173 | -19.8583 | -2.6810 | -2.6784 |
0.665 | 7.6667 | 575 | 0.6627 | 0.0054 | -0.0570 | 0.8700 | 0.0624 | -39.9263 | -19.8680 | -2.6814 | -2.6788 |
0.6647 | 8.0 | 600 | 0.6641 | 0.0050 | -0.0545 | 0.8400 | 0.0595 | -39.9014 | -19.8714 | -2.6805 | -2.6779 |
0.6644 | 8.3333 | 625 | 0.6627 | 0.0046 | -0.0577 | 0.8700 | 0.0623 | -39.9335 | -19.8757 | -2.6811 | -2.6786 |
0.6589 | 8.6667 | 650 | 0.6634 | 0.0071 | -0.0537 | 0.8600 | 0.0608 | -39.8933 | -19.8503 | -2.6819 | -2.6793 |
0.6626 | 9.0 | 675 | 0.6628 | 0.0073 | -0.0547 | 0.8800 | 0.0620 | -39.9033 | -19.8483 | -2.6816 | -2.6789 |
0.6608 | 9.3333 | 700 | 0.6649 | 0.0049 | -0.0528 | 0.8600 | 0.0577 | -39.8841 | -19.8726 | -2.6813 | -2.6787 |
0.6695 | 9.6667 | 725 | 0.6628 | 0.0040 | -0.0581 | 0.9000 | 0.0622 | -39.9376 | -19.8814 | -2.6812 | -2.6786 |
0.6683 | 10.0 | 750 | 0.6635 | 0.0033 | -0.0573 | 0.8600 | 0.0606 | -39.9296 | -19.8888 | -2.6811 | -2.6785 |
0.6657 | 10.3333 | 775 | 0.6627 | 0.0039 | -0.0584 | 0.8700 | 0.0623 | -39.9402 | -19.8824 | -2.6810 | -2.6784 |
0.6668 | 10.6667 | 800 | 0.6636 | 0.0029 | -0.0575 | 0.8700 | 0.0604 | -39.9309 | -19.8927 | -2.6810 | -2.6784 |
0.6605 | 11.0 | 825 | 0.6640 | 0.0015 | -0.0580 | 0.8700 | 0.0595 | -39.9363 | -19.9063 | -2.6810 | -2.6784 |
0.6634 | 11.3333 | 850 | 0.6638 | 0.0017 | -0.0583 | 0.8700 | 0.0600 | -39.9390 | -19.9040 | -2.6810 | -2.6784 |
0.6696 | 11.6667 | 875 | 0.6638 | 0.0017 | -0.0583 | 0.8700 | 0.0600 | -39.9390 | -19.9040 | -2.6810 | -2.6784 |
0.6609 | 12.0 | 900 | 0.6638 | 0.0017 | -0.0583 | 0.8700 | 0.0600 | -39.9390 | -19.9040 | -2.6810 | -2.6784 |
0.6652 | 12.3333 | 925 | 0.6638 | 0.0017 | -0.0583 | 0.8700 | 0.0600 | -39.9390 | -19.9040 | -2.6810 | -2.6784 |
0.6614 | 12.6667 | 950 | 0.6638 | 0.0017 | -0.0583 | 0.8700 | 0.0600 | -39.9390 | -19.9040 | -2.6810 | -2.6784 |
0.6639 | 13.0 | 975 | 0.6638 | 0.0017 | -0.0583 | 0.8700 | 0.0600 | -39.9390 | -19.9040 | -2.6810 | -2.6784 |
0.6632 | 13.3333 | 1000 | 0.6638 | 0.0017 | -0.0583 | 0.8700 | 0.0600 | -39.9390 | -19.9040 | -2.6810 | -2.6784 |
Framework versions
- Transformers 4.41.2
- Pytorch 2.0.0+cu117
- Datasets 2.19.2
- Tokenizers 0.19.1
- Downloads last month
- 4
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for tsavage68/UTI2_M2_1000steps_1e8rate_01beta_CSFTDPO
Base model
mistralai/Mistral-7B-Instruct-v0.2
Finetuned
tsavage68/UTI_M2_1000steps_1e7rate_SFT