End of training
Browse files- README.md +49 -49
- final_checkpoint/model-00001-of-00003.safetensors +1 -1
- final_checkpoint/model-00002-of-00003.safetensors +1 -1
- final_checkpoint/model-00003-of-00003.safetensors +1 -1
- model-00001-of-00003.safetensors +1 -1
- model-00002-of-00003.safetensors +1 -1
- model-00003-of-00003.safetensors +1 -1
- training_args.bin +1 -1
README.md
CHANGED
@@ -17,15 +17,15 @@ should probably proofread and complete it, then remove this comment. -->
|
|
17 |
|
18 |
This model is a fine-tuned version of [tsavage68/UTI_M2_1000steps_1e7rate_SFT](https://huggingface.co/tsavage68/UTI_M2_1000steps_1e7rate_SFT) on an unknown dataset.
|
19 |
It achieves the following results on the evaluation set:
|
20 |
-
- Loss: 0.
|
21 |
-
- Rewards/chosen: 0.
|
22 |
-
- Rewards/rejected:
|
23 |
-
- Rewards/accuracies: 0.
|
24 |
-
- Rewards/margins:
|
25 |
-
- Logps/rejected:
|
26 |
-
- Logps/chosen:
|
27 |
-
- Logits/rejected: -2.
|
28 |
-
- Logits/chosen: -2.
|
29 |
|
30 |
## Model description
|
31 |
|
@@ -59,46 +59,46 @@ The following hyperparameters were used during training:
|
|
59 |
|
60 |
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|
61 |
|:-------------:|:-------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
|
62 |
-
| 0.
|
63 |
-
| 0.
|
64 |
-
| 0.
|
65 |
-
| 0.
|
66 |
-
| 0.
|
67 |
-
| 0.
|
68 |
-
| 0.
|
69 |
-
| 0.
|
70 |
-
| 0.
|
71 |
-
| 0.
|
72 |
-
| 0.
|
73 |
-
| 0.
|
74 |
-
| 0.
|
75 |
-
| 0.
|
76 |
-
| 0.
|
77 |
-
| 0.
|
78 |
-
| 0.
|
79 |
-
| 0.
|
80 |
-
| 0.
|
81 |
-
| 0.
|
82 |
-
| 0.
|
83 |
-
| 0.
|
84 |
-
| 0.
|
85 |
-
| 0.
|
86 |
-
| 0.
|
87 |
-
| 0.
|
88 |
-
| 0.
|
89 |
-
| 0.
|
90 |
-
| 0.
|
91 |
-
| 0.
|
92 |
-
| 0.
|
93 |
-
| 0.
|
94 |
-
| 0.
|
95 |
-
| 0.
|
96 |
-
| 0.
|
97 |
-
| 0.
|
98 |
-
| 0.
|
99 |
-
| 0.
|
100 |
-
| 0.
|
101 |
-
| 0.
|
102 |
|
103 |
|
104 |
### Framework versions
|
|
|
17 |
|
18 |
This model is a fine-tuned version of [tsavage68/UTI_M2_1000steps_1e7rate_SFT](https://huggingface.co/tsavage68/UTI_M2_1000steps_1e7rate_SFT) on an unknown dataset.
|
19 |
It achieves the following results on the evaluation set:
|
20 |
+
- Loss: 0.5476
|
21 |
+
- Rewards/chosen: 0.0699
|
22 |
+
- Rewards/rejected: -3.0830
|
23 |
+
- Rewards/accuracies: 0.2100
|
24 |
+
- Rewards/margins: 3.1530
|
25 |
+
- Logps/rejected: -15.5400
|
26 |
+
- Logps/chosen: -4.4026
|
27 |
+
- Logits/rejected: -2.6403
|
28 |
+
- Logits/chosen: -2.6398
|
29 |
|
30 |
## Model description
|
31 |
|
|
|
59 |
|
60 |
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|
61 |
|:-------------:|:-------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
|
62 |
+
| 0.7009 | 0.3333 | 25 | 0.6667 | 0.0016 | -0.0631 | 0.1900 | 0.0647 | -9.5002 | -4.5393 | -2.7063 | -2.7055 |
|
63 |
+
| 0.5794 | 0.6667 | 50 | 0.5690 | 0.0605 | -0.5674 | 0.2100 | 0.6279 | -10.5088 | -4.4215 | -2.6950 | -2.6942 |
|
64 |
+
| 0.5772 | 1.0 | 75 | 0.5638 | -0.0374 | -1.8778 | 0.2000 | 1.8404 | -13.1295 | -4.6173 | -2.6653 | -2.6647 |
|
65 |
+
| 0.5715 | 1.3333 | 100 | 0.5485 | 0.0321 | -2.2707 | 0.2100 | 2.3028 | -13.9154 | -4.4783 | -2.6560 | -2.6555 |
|
66 |
+
| 0.5545 | 1.6667 | 125 | 0.5476 | 0.1013 | -2.5349 | 0.2100 | 2.6363 | -14.4438 | -4.3398 | -2.6499 | -2.6494 |
|
67 |
+
| 0.5545 | 2.0 | 150 | 0.5476 | 0.0902 | -2.9376 | 0.2100 | 3.0278 | -15.2492 | -4.3621 | -2.6442 | -2.6437 |
|
68 |
+
| 0.5545 | 2.3333 | 175 | 0.5476 | 0.0846 | -2.9244 | 0.2100 | 3.0090 | -15.2229 | -4.3733 | -2.6424 | -2.6419 |
|
69 |
+
| 0.4852 | 2.6667 | 200 | 0.5476 | 0.0848 | -2.9648 | 0.2100 | 3.0495 | -15.3035 | -4.3729 | -2.6423 | -2.6417 |
|
70 |
+
| 0.6412 | 3.0 | 225 | 0.5476 | 0.0853 | -2.9694 | 0.2100 | 3.0547 | -15.3127 | -4.3718 | -2.6421 | -2.6415 |
|
71 |
+
| 0.5545 | 3.3333 | 250 | 0.5476 | 0.0892 | -2.9671 | 0.2100 | 3.0563 | -15.3081 | -4.3640 | -2.6429 | -2.6424 |
|
72 |
+
| 0.5372 | 3.6667 | 275 | 0.5476 | 0.0803 | -2.9507 | 0.2100 | 3.0310 | -15.2754 | -4.3819 | -2.6416 | -2.6410 |
|
73 |
+
| 0.5892 | 4.0 | 300 | 0.5476 | 0.0791 | -3.0080 | 0.2100 | 3.0871 | -15.3899 | -4.3842 | -2.6421 | -2.6415 |
|
74 |
+
| 0.4679 | 4.3333 | 325 | 0.5476 | 0.0770 | -3.0043 | 0.2100 | 3.0814 | -15.3826 | -4.3884 | -2.6420 | -2.6415 |
|
75 |
+
| 0.5718 | 4.6667 | 350 | 0.5476 | 0.0767 | -3.0040 | 0.2100 | 3.0808 | -15.3820 | -4.3890 | -2.6414 | -2.6409 |
|
76 |
+
| 0.5199 | 5.0 | 375 | 0.5476 | 0.0830 | -3.0444 | 0.2100 | 3.1274 | -15.4628 | -4.3765 | -2.6415 | -2.6410 |
|
77 |
+
| 0.5025 | 5.3333 | 400 | 0.5476 | 0.0784 | -3.0520 | 0.2100 | 3.1304 | -15.4779 | -4.3857 | -2.6406 | -2.6401 |
|
78 |
+
| 0.5199 | 5.6667 | 425 | 0.5476 | 0.0772 | -3.0417 | 0.2100 | 3.1189 | -15.4575 | -4.3882 | -2.6418 | -2.6412 |
|
79 |
+
| 0.5025 | 6.0 | 450 | 0.5476 | 0.0775 | -3.0690 | 0.2100 | 3.1465 | -15.5119 | -4.3875 | -2.6403 | -2.6398 |
|
80 |
+
| 0.5718 | 6.3333 | 475 | 0.5476 | 0.0722 | -3.0608 | 0.2100 | 3.1330 | -15.4956 | -4.3980 | -2.6403 | -2.6398 |
|
81 |
+
| 0.5718 | 6.6667 | 500 | 0.5476 | 0.0733 | -3.0661 | 0.2100 | 3.1394 | -15.5061 | -4.3958 | -2.6403 | -2.6397 |
|
82 |
+
| 0.5025 | 7.0 | 525 | 0.5476 | 0.0687 | -3.0692 | 0.2100 | 3.1379 | -15.5123 | -4.4051 | -2.6407 | -2.6402 |
|
83 |
+
| 0.5199 | 7.3333 | 550 | 0.5476 | 0.0691 | -3.0762 | 0.2100 | 3.1454 | -15.5265 | -4.4042 | -2.6401 | -2.6396 |
|
84 |
+
| 0.5372 | 7.6667 | 575 | 0.5476 | 0.0728 | -3.0945 | 0.2100 | 3.1672 | -15.5629 | -4.3970 | -2.6414 | -2.6409 |
|
85 |
+
| 0.5718 | 8.0 | 600 | 0.5476 | 0.0736 | -3.0806 | 0.2100 | 3.1541 | -15.5351 | -4.3953 | -2.6405 | -2.6400 |
|
86 |
+
| 0.5372 | 8.3333 | 625 | 0.5476 | 0.0806 | -3.0954 | 0.2100 | 3.1759 | -15.5647 | -4.3813 | -2.6410 | -2.6405 |
|
87 |
+
| 0.4332 | 8.6667 | 650 | 0.5476 | 0.0762 | -3.0922 | 0.2100 | 3.1684 | -15.5583 | -4.3900 | -2.6412 | -2.6407 |
|
88 |
+
| 0.5372 | 9.0 | 675 | 0.5476 | 0.0738 | -3.0924 | 0.2100 | 3.1662 | -15.5587 | -4.3948 | -2.6408 | -2.6403 |
|
89 |
+
| 0.5025 | 9.3333 | 700 | 0.5476 | 0.0702 | -3.0892 | 0.2100 | 3.1594 | -15.5524 | -4.4020 | -2.6405 | -2.6400 |
|
90 |
+
| 0.5025 | 9.6667 | 725 | 0.5476 | 0.0641 | -3.0956 | 0.2100 | 3.1597 | -15.5651 | -4.4142 | -2.6410 | -2.6405 |
|
91 |
+
| 0.5892 | 10.0 | 750 | 0.5476 | 0.0696 | -3.0933 | 0.2100 | 3.1630 | -15.5606 | -4.4032 | -2.6403 | -2.6398 |
|
92 |
+
| 0.5199 | 10.3333 | 775 | 0.5476 | 0.0764 | -3.0810 | 0.2100 | 3.1574 | -15.5361 | -4.3897 | -2.6404 | -2.6399 |
|
93 |
+
| 0.5199 | 10.6667 | 800 | 0.5476 | 0.0750 | -3.0945 | 0.2100 | 3.1695 | -15.5629 | -4.3925 | -2.6399 | -2.6394 |
|
94 |
+
| 0.5372 | 11.0 | 825 | 0.5477 | 0.0727 | -3.0777 | 0.2100 | 3.1504 | -15.5293 | -4.3970 | -2.6405 | -2.6399 |
|
95 |
+
| 0.5199 | 11.3333 | 850 | 0.5477 | 0.0760 | -3.0775 | 0.2100 | 3.1534 | -15.5289 | -4.3905 | -2.6402 | -2.6397 |
|
96 |
+
| 0.6065 | 11.6667 | 875 | 0.5476 | 0.0737 | -3.0877 | 0.2100 | 3.1615 | -15.5495 | -4.3950 | -2.6404 | -2.6398 |
|
97 |
+
| 0.5718 | 12.0 | 900 | 0.5476 | 0.0713 | -3.0915 | 0.2100 | 3.1628 | -15.5570 | -4.3999 | -2.6403 | -2.6398 |
|
98 |
+
| 0.4159 | 12.3333 | 925 | 0.5476 | 0.0687 | -3.0820 | 0.2100 | 3.1507 | -15.5379 | -4.4051 | -2.6403 | -2.6398 |
|
99 |
+
| 0.6238 | 12.6667 | 950 | 0.5476 | 0.0699 | -3.0830 | 0.2100 | 3.1530 | -15.5400 | -4.4026 | -2.6403 | -2.6398 |
|
100 |
+
| 0.6065 | 13.0 | 975 | 0.5476 | 0.0699 | -3.0830 | 0.2100 | 3.1530 | -15.5400 | -4.4026 | -2.6403 | -2.6398 |
|
101 |
+
| 0.5025 | 13.3333 | 1000 | 0.5476 | 0.0699 | -3.0830 | 0.2100 | 3.1530 | -15.5400 | -4.4026 | -2.6403 | -2.6398 |
|
102 |
|
103 |
|
104 |
### Framework versions
|
final_checkpoint/model-00001-of-00003.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 4943162240
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:055596a77af9659da7d9951fb3f75af45721eb5b2733ae8b926220c9486f6f4e
|
3 |
size 4943162240
|
final_checkpoint/model-00002-of-00003.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 4999819232
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:6eab27f229973937f2de1d94236003e583e3eeb20ef32700ef02e9c8b0bb1703
|
3 |
size 4999819232
|
final_checkpoint/model-00003-of-00003.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 4540516256
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:7fb19c1138598d76cc73ff424bdd75abc5ded37d3446ab091e303c3a1e22d2f7
|
3 |
size 4540516256
|
model-00001-of-00003.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 4943162240
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:055596a77af9659da7d9951fb3f75af45721eb5b2733ae8b926220c9486f6f4e
|
3 |
size 4943162240
|
model-00002-of-00003.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 4999819232
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:6eab27f229973937f2de1d94236003e583e3eeb20ef32700ef02e9c8b0bb1703
|
3 |
size 4999819232
|
model-00003-of-00003.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 4540516256
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:7fb19c1138598d76cc73ff424bdd75abc5ded37d3446ab091e303c3a1e22d2f7
|
3 |
size 4540516256
|
training_args.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 4667
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e53d741e157840027466f7d5de111d0f3d8a1350974d7796488b54e8c05605c8
|
3 |
size 4667
|