tsavage68 commited on
Commit
5c15f77
·
verified ·
1 Parent(s): d895f90

End of training

Browse files
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  license: apache-2.0
3
- base_model: tsavage68/UTI_M2_1000steps_1e5rate_SFT
4
  tags:
5
  - trl
6
  - dpo
@@ -15,17 +15,17 @@ should probably proofread and complete it, then remove this comment. -->
15
 
16
  # UTI_M2_1000steps_1e6rate_05beta_CSFTDPO
17
 
18
- This model is a fine-tuned version of [tsavage68/UTI_M2_1000steps_1e5rate_SFT](https://huggingface.co/tsavage68/UTI_M2_1000steps_1e5rate_SFT) on an unknown dataset.
19
  It achieves the following results on the evaluation set:
20
- - Loss: 0.0693
21
- - Rewards/chosen: 0.3102
22
- - Rewards/rejected: -12.9468
23
- - Rewards/accuracies: 0.9000
24
- - Rewards/margins: 13.2571
25
- - Logps/rejected: -70.0598
26
- - Logps/chosen: -19.6741
27
- - Logits/rejected: -3.8499
28
- - Logits/chosen: -3.7712
29
 
30
  ## Model description
31
 
@@ -59,46 +59,46 @@ The following hyperparameters were used during training:
59
 
60
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
61
  |:-------------:|:-------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
62
- | 0.314 | 0.3333 | 25 | 0.1006 | 1.1509 | -3.4985 | 0.9000 | 4.6494 | -51.1631 | -17.9927 | -3.8266 | -3.7515 |
63
- | 0.0175 | 0.6667 | 50 | 0.0694 | 1.8275 | -9.1239 | 0.9000 | 10.9514 | -62.4139 | -16.6395 | -3.8318 | -3.7514 |
64
- | 0.0347 | 1.0 | 75 | 0.0693 | 1.8028 | -10.0311 | 0.9000 | 11.8339 | -64.2284 | -16.6890 | -3.8454 | -3.7648 |
65
- | 0.0173 | 1.3333 | 100 | 0.0693 | 1.7834 | -10.0645 | 0.9000 | 11.8479 | -64.2951 | -16.7277 | -3.8455 | -3.7649 |
66
- | 0.0693 | 1.6667 | 125 | 0.0693 | 0.3883 | -12.0059 | 0.9000 | 12.3941 | -68.1778 | -19.5180 | -3.8483 | -3.7695 |
67
- | 0.104 | 2.0 | 150 | 0.0693 | 0.4370 | -12.0807 | 0.9000 | 12.5177 | -68.3275 | -19.4206 | -3.8488 | -3.7700 |
68
- | 0.052 | 2.3333 | 175 | 0.0693 | 0.4449 | -12.1182 | 0.9000 | 12.5631 | -68.4025 | -19.4048 | -3.8490 | -3.7702 |
69
- | 0.052 | 2.6667 | 200 | 0.0693 | 0.6029 | -12.3085 | 0.9000 | 12.9115 | -68.7831 | -19.0887 | -3.8507 | -3.7720 |
70
- | 0.0347 | 3.0 | 225 | 0.0693 | 0.5833 | -12.3174 | 0.9000 | 12.9008 | -68.8010 | -19.1279 | -3.8506 | -3.7718 |
71
- | 0.052 | 3.3333 | 250 | 0.0693 | 0.5554 | -12.3514 | 0.9000 | 12.9068 | -68.8690 | -19.1838 | -3.8506 | -3.7717 |
72
- | 0.0693 | 3.6667 | 275 | 0.0693 | 0.5454 | -12.3886 | 0.9000 | 12.9340 | -68.9433 | -19.2037 | -3.8505 | -3.7718 |
73
- | 0.0866 | 4.0 | 300 | 0.0693 | 0.5163 | -12.4481 | 0.9000 | 12.9644 | -69.0623 | -19.2620 | -3.8504 | -3.7717 |
74
- | 0.104 | 4.3333 | 325 | 0.0693 | 0.5008 | -12.4900 | 0.9000 | 12.9909 | -69.1462 | -19.2929 | -3.8505 | -3.7717 |
75
- | 0.0347 | 4.6667 | 350 | 0.0693 | 0.4983 | -12.5286 | 0.9000 | 13.0269 | -69.2233 | -19.2980 | -3.8505 | -3.7718 |
76
- | 0.0693 | 5.0 | 375 | 0.0693 | 0.4545 | -12.5730 | 0.9000 | 13.0275 | -69.3120 | -19.3855 | -3.8504 | -3.7716 |
77
- | 0.0866 | 5.3333 | 400 | 0.0693 | 0.4417 | -12.6121 | 0.9000 | 13.0537 | -69.3902 | -19.4112 | -3.8504 | -3.7717 |
78
- | 0.052 | 5.6667 | 425 | 0.0693 | 0.4226 | -12.6573 | 0.9000 | 13.0799 | -69.4807 | -19.4494 | -3.8504 | -3.7717 |
79
- | 0.0866 | 6.0 | 450 | 0.0693 | 0.4119 | -12.6909 | 0.9000 | 13.1028 | -69.5479 | -19.4707 | -3.8503 | -3.7715 |
80
- | 0.0173 | 6.3333 | 475 | 0.0693 | 0.3977 | -12.7153 | 0.9000 | 13.1130 | -69.5967 | -19.4992 | -3.8503 | -3.7715 |
81
- | 0.1213 | 6.6667 | 500 | 0.0693 | 0.3905 | -12.7442 | 0.9000 | 13.1347 | -69.6546 | -19.5136 | -3.8502 | -3.7715 |
82
- | 0.0347 | 7.0 | 525 | 0.0693 | 0.3699 | -12.7847 | 0.9000 | 13.1547 | -69.7356 | -19.5547 | -3.8500 | -3.7713 |
83
- | 0.0866 | 7.3333 | 550 | 0.0693 | 0.3576 | -12.8153 | 0.9000 | 13.1729 | -69.7966 | -19.5793 | -3.8501 | -3.7713 |
84
- | 0.052 | 7.6667 | 575 | 0.0693 | 0.3343 | -12.8435 | 0.9000 | 13.1779 | -69.8532 | -19.6259 | -3.8500 | -3.7713 |
85
- | 0.104 | 8.0 | 600 | 0.0693 | 0.3392 | -12.8707 | 0.9000 | 13.2099 | -69.9076 | -19.6162 | -3.8500 | -3.7712 |
86
- | 0.1213 | 8.3333 | 625 | 0.0693 | 0.3427 | -12.8832 | 0.9000 | 13.2259 | -69.9325 | -19.6092 | -3.8500 | -3.7712 |
87
- | 0.0173 | 8.6667 | 650 | 0.0693 | 0.3378 | -12.8947 | 0.9000 | 13.2325 | -69.9555 | -19.6189 | -3.8499 | -3.7712 |
88
- | 0.052 | 9.0 | 675 | 0.0693 | 0.3273 | -12.9057 | 0.9000 | 13.2330 | -69.9775 | -19.6400 | -3.8499 | -3.7711 |
89
- | 0.052 | 9.3333 | 700 | 0.0693 | 0.3241 | -12.9207 | 0.9000 | 13.2448 | -70.0074 | -19.6463 | -3.8499 | -3.7712 |
90
- | 0.0866 | 9.6667 | 725 | 0.0693 | 0.3219 | -12.9292 | 0.9000 | 13.2511 | -70.0246 | -19.6507 | -3.8498 | -3.7711 |
91
- | 0.0866 | 10.0 | 750 | 0.0693 | 0.3202 | -12.9371 | 0.9000 | 13.2573 | -70.0403 | -19.6541 | -3.8498 | -3.7711 |
92
- | 0.0866 | 10.3333 | 775 | 0.0693 | 0.3130 | -12.9398 | 0.9000 | 13.2528 | -70.0457 | -19.6686 | -3.8498 | -3.7711 |
93
- | 0.0866 | 10.6667 | 800 | 0.0693 | 0.3136 | -12.9343 | 0.9000 | 13.2478 | -70.0347 | -19.6675 | -3.8499 | -3.7712 |
94
- | 0.0693 | 11.0 | 825 | 0.0693 | 0.3141 | -12.9459 | 0.9000 | 13.2601 | -70.0580 | -19.6663 | -3.8499 | -3.7712 |
95
- | 0.052 | 11.3333 | 850 | 0.0693 | 0.3139 | -12.9439 | 0.9000 | 13.2578 | -70.0539 | -19.6667 | -3.8498 | -3.7710 |
96
- | 0.0693 | 11.6667 | 875 | 0.0693 | 0.3141 | -12.9448 | 0.9000 | 13.2589 | -70.0557 | -19.6663 | -3.8498 | -3.7711 |
97
- | 0.0693 | 12.0 | 900 | 0.0693 | 0.3137 | -12.9396 | 0.9000 | 13.2533 | -70.0454 | -19.6672 | -3.8498 | -3.7711 |
98
- | 0.0347 | 12.3333 | 925 | 0.0693 | 0.3137 | -12.9475 | 0.9000 | 13.2612 | -70.0612 | -19.6671 | -3.8499 | -3.7712 |
99
- | 0.0693 | 12.6667 | 950 | 0.0693 | 0.3098 | -12.9482 | 0.9000 | 13.2581 | -70.0625 | -19.6749 | -3.8499 | -3.7712 |
100
- | 0.052 | 13.0 | 975 | 0.0693 | 0.3109 | -12.9485 | 0.9000 | 13.2594 | -70.0631 | -19.6727 | -3.8499 | -3.7712 |
101
- | 0.0866 | 13.3333 | 1000 | 0.0693 | 0.3102 | -12.9468 | 0.9000 | 13.2571 | -70.0598 | -19.6741 | -3.8499 | -3.7712 |
102
 
103
 
104
  ### Framework versions
 
1
  ---
2
  license: apache-2.0
3
+ base_model: tsavage68/UTI_M2_1000steps_1e7rate_SFT
4
  tags:
5
  - trl
6
  - dpo
 
15
 
16
  # UTI_M2_1000steps_1e6rate_05beta_CSFTDPO
17
 
18
+ This model is a fine-tuned version of [tsavage68/UTI_M2_1000steps_1e7rate_SFT](https://huggingface.co/tsavage68/UTI_M2_1000steps_1e7rate_SFT) on an unknown dataset.
19
  It achieves the following results on the evaluation set:
20
+ - Loss: 0.6931
21
+ - Rewards/chosen: 0.0
22
+ - Rewards/rejected: 0.0
23
+ - Rewards/accuracies: 0.0
24
+ - Rewards/margins: 0.0
25
+ - Logps/rejected: 0.0
26
+ - Logps/chosen: 0.0
27
+ - Logits/rejected: -2.7147
28
+ - Logits/chosen: -2.7147
29
 
30
  ## Model description
31
 
 
59
 
60
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
61
  |:-------------:|:-------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
62
+ | 0.6931 | 0.3333 | 25 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
63
+ | 0.6931 | 0.6667 | 50 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
64
+ | 0.6931 | 1.0 | 75 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
65
+ | 0.6931 | 1.3333 | 100 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
66
+ | 0.6931 | 1.6667 | 125 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
67
+ | 0.6931 | 2.0 | 150 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
68
+ | 0.6931 | 2.3333 | 175 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
69
+ | 0.6931 | 2.6667 | 200 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
70
+ | 0.6931 | 3.0 | 225 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
71
+ | 0.6931 | 3.3333 | 250 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
72
+ | 0.6931 | 3.6667 | 275 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
73
+ | 0.6931 | 4.0 | 300 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
74
+ | 0.6931 | 4.3333 | 325 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
75
+ | 0.6931 | 4.6667 | 350 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
76
+ | 0.6931 | 5.0 | 375 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
77
+ | 0.6931 | 5.3333 | 400 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
78
+ | 0.6931 | 5.6667 | 425 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
79
+ | 0.6931 | 6.0 | 450 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
80
+ | 0.6931 | 6.3333 | 475 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
81
+ | 0.6931 | 6.6667 | 500 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
82
+ | 0.6931 | 7.0 | 525 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
83
+ | 0.6931 | 7.3333 | 550 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
84
+ | 0.6931 | 7.6667 | 575 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
85
+ | 0.6931 | 8.0 | 600 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
86
+ | 0.6931 | 8.3333 | 625 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
87
+ | 0.6931 | 8.6667 | 650 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
88
+ | 0.6931 | 9.0 | 675 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
89
+ | 0.6931 | 9.3333 | 700 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
90
+ | 0.6931 | 9.6667 | 725 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
91
+ | 0.6931 | 10.0 | 750 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
92
+ | 0.6931 | 10.3333 | 775 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
93
+ | 0.6931 | 10.6667 | 800 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
94
+ | 0.6931 | 11.0 | 825 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
95
+ | 0.6931 | 11.3333 | 850 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
96
+ | 0.6931 | 11.6667 | 875 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
97
+ | 0.6931 | 12.0 | 900 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
98
+ | 0.6931 | 12.3333 | 925 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
99
+ | 0.6931 | 12.6667 | 950 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
100
+ | 0.6931 | 13.0 | 975 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
101
+ | 0.6931 | 13.3333 | 1000 | 0.6931 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -2.7147 | -2.7147 |
102
 
103
 
104
  ### Framework versions
config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "_name_or_path": "tsavage68/UTI_M2_1000steps_1e5rate_SFT",
3
  "architectures": [
4
  "MistralForCausalLM"
5
  ],
 
1
  {
2
+ "_name_or_path": "tsavage68/UTI_M2_1000steps_1e7rate_SFT",
3
  "architectures": [
4
  "MistralForCausalLM"
5
  ],
final_checkpoint/config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "_name_or_path": "tsavage68/UTI_M2_1000steps_1e5rate_SFT",
3
  "architectures": [
4
  "MistralForCausalLM"
5
  ],
 
1
  {
2
+ "_name_or_path": "tsavage68/UTI_M2_1000steps_1e7rate_SFT",
3
  "architectures": [
4
  "MistralForCausalLM"
5
  ],
final_checkpoint/model-00001-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b4a18333569d1fc0949f22d34a6872d1053f7216430d8578201b1efcf33c1457
3
  size 4943162240
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9aa2e9687a5e5d24a999a996e9fe4c2bc1cf34ad347da5dc5c7e0adffcb14982
3
  size 4943162240
final_checkpoint/model-00002-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4052ad4bfb105a67ffcafeecec8fa4571a7ebed17255b1ddfa300ffd2e5fd4a6
3
  size 4999819232
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:268bb18cc8bbff53c912fa3961a6281dd5c163edd1b8e5c85c9b12e87e4e3a63
3
  size 4999819232
final_checkpoint/model-00003-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:79fdce93682fc3c954c70cc4022021506a4624564c536c47e7edfd20175eb01c
3
  size 4540516256
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bbc021dcf68d9e7ddaab0ead255721e73b7f652e3bfd34985bba6c029e0b729c
3
  size 4540516256
model-00001-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b4a18333569d1fc0949f22d34a6872d1053f7216430d8578201b1efcf33c1457
3
  size 4943162240
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9aa2e9687a5e5d24a999a996e9fe4c2bc1cf34ad347da5dc5c7e0adffcb14982
3
  size 4943162240
model-00002-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4052ad4bfb105a67ffcafeecec8fa4571a7ebed17255b1ddfa300ffd2e5fd4a6
3
  size 4999819232
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:268bb18cc8bbff53c912fa3961a6281dd5c163edd1b8e5c85c9b12e87e4e3a63
3
  size 4999819232
model-00003-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:79fdce93682fc3c954c70cc4022021506a4624564c536c47e7edfd20175eb01c
3
  size 4540516256
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bbc021dcf68d9e7ddaab0ead255721e73b7f652e3bfd34985bba6c029e0b729c
3
  size 4540516256
tokenizer_config.json CHANGED
@@ -33,7 +33,7 @@
33
  "clean_up_tokenization_spaces": false,
34
  "eos_token": "</s>",
35
  "legacy": true,
36
- "max_length": 100,
37
  "model_max_length": 1000000000000000019884624838656,
38
  "pad_token": "</s>",
39
  "sp_model_kwargs": {},
 
33
  "clean_up_tokenization_spaces": false,
34
  "eos_token": "</s>",
35
  "legacy": true,
36
+ "max_length": 1024,
37
  "model_max_length": 1000000000000000019884624838656,
38
  "pad_token": "</s>",
39
  "sp_model_kwargs": {},
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6dd710aa2ad0ba5f56ae1ee4eff56674c4895eb09ca96e495523d04ddec3c718
3
  size 4667
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ea8bcbd1c9db0a4e7f6c3b82c5dc94ec84b20324ecbcaab86028fc5e8667fa19
3
  size 4667