End of training

Browse files

Files changed (8) hide show

README.md +49 -49
final_checkpoint/model-00001-of-00003.safetensors +1 -1
final_checkpoint/model-00002-of-00003.safetensors +1 -1
final_checkpoint/model-00003-of-00003.safetensors +1 -1
model-00001-of-00003.safetensors +1 -1
model-00002-of-00003.safetensors +1 -1
model-00003-of-00003.safetensors +1 -1
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -17,15 +17,15 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [tsavage68/UTI_M2_1000steps_1e7rate_SFT](https://huggingface.co/tsavage68/UTI_M2_1000steps_1e7rate_SFT) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.6931
-- Rewards/chosen: 0.0
-- Rewards/rejected: 0.0
-- Rewards/accuracies: 0.0
-- Rewards/margins: 0.0
-- Logps/rejected: 0.0
-- Logps/chosen: 0.0
-- Logits/rejected: -2.7147
-- Logits/chosen: -2.7147
 ## Model description
@@ -59,46 +59,46 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch   | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
-| 0.6931        | 0.3333  | 25   | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 0.6667  | 50   | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 1.0     | 75   | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 1.3333  | 100  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 1.6667  | 125  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 2.0     | 150  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 2.3333  | 175  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 2.6667  | 200  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 3.0     | 225  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 3.3333  | 250  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 3.6667  | 275  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 4.0     | 300  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 4.3333  | 325  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 4.6667  | 350  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 5.0     | 375  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 5.3333  | 400  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 5.6667  | 425  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 6.0     | 450  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 6.3333  | 475  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 6.6667  | 500  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 7.0     | 525  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 7.3333  | 550  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 7.6667  | 575  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 8.0     | 600  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 8.3333  | 625  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 8.6667  | 650  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 9.0     | 675  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 9.3333  | 700  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 9.6667  | 725  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 10.0    | 750  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 10.3333 | 775  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 10.6667 | 800  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 11.0    | 825  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 11.3333 | 850  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 11.6667 | 875  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 12.0    | 900  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 12.3333 | 925  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 12.6667 | 950  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 13.0    | 975  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
-| 0.6931        | 13.3333 | 1000 | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -2.7147         | -2.7147       |
 ### Framework versions

 This model is a fine-tuned version of [tsavage68/UTI_M2_1000steps_1e7rate_SFT](https://huggingface.co/tsavage68/UTI_M2_1000steps_1e7rate_SFT) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.5476
+- Rewards/chosen: 0.0699
+- Rewards/rejected: -3.0830
+- Rewards/accuracies: 0.2100
+- Rewards/margins: 3.1530
+- Logps/rejected: -15.5400
+- Logps/chosen: -4.4026
+- Logits/rejected: -2.6403
+- Logits/chosen: -2.6398
 ## Model description
 | Training Loss | Epoch   | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.7009        | 0.3333  | 25   | 0.6667          | 0.0016         | -0.0631          | 0.1900             | 0.0647          | -9.5002        | -4.5393      | -2.7063         | -2.7055       |
+| 0.5794        | 0.6667  | 50   | 0.5690          | 0.0605         | -0.5674          | 0.2100             | 0.6279          | -10.5088       | -4.4215      | -2.6950         | -2.6942       |
+| 0.5772        | 1.0     | 75   | 0.5638          | -0.0374        | -1.8778          | 0.2000             | 1.8404          | -13.1295       | -4.6173      | -2.6653         | -2.6647       |
+| 0.5715        | 1.3333  | 100  | 0.5485          | 0.0321         | -2.2707          | 0.2100             | 2.3028          | -13.9154       | -4.4783      | -2.6560         | -2.6555       |
+| 0.5545        | 1.6667  | 125  | 0.5476          | 0.1013         | -2.5349          | 0.2100             | 2.6363          | -14.4438       | -4.3398      | -2.6499         | -2.6494       |
+| 0.5545        | 2.0     | 150  | 0.5476          | 0.0902         | -2.9376          | 0.2100             | 3.0278          | -15.2492       | -4.3621      | -2.6442         | -2.6437       |
+| 0.5545        | 2.3333  | 175  | 0.5476          | 0.0846         | -2.9244          | 0.2100             | 3.0090          | -15.2229       | -4.3733      | -2.6424         | -2.6419       |
+| 0.4852        | 2.6667  | 200  | 0.5476          | 0.0848         | -2.9648          | 0.2100             | 3.0495          | -15.3035       | -4.3729      | -2.6423         | -2.6417       |
+| 0.6412        | 3.0     | 225  | 0.5476          | 0.0853         | -2.9694          | 0.2100             | 3.0547          | -15.3127       | -4.3718      | -2.6421         | -2.6415       |
+| 0.5545        | 3.3333  | 250  | 0.5476          | 0.0892         | -2.9671          | 0.2100             | 3.0563          | -15.3081       | -4.3640      | -2.6429         | -2.6424       |
+| 0.5372        | 3.6667  | 275  | 0.5476          | 0.0803         | -2.9507          | 0.2100             | 3.0310          | -15.2754       | -4.3819      | -2.6416         | -2.6410       |
+| 0.5892        | 4.0     | 300  | 0.5476          | 0.0791         | -3.0080          | 0.2100             | 3.0871          | -15.3899       | -4.3842      | -2.6421         | -2.6415       |
+| 0.4679        | 4.3333  | 325  | 0.5476          | 0.0770         | -3.0043          | 0.2100             | 3.0814          | -15.3826       | -4.3884      | -2.6420         | -2.6415       |
+| 0.5718        | 4.6667  | 350  | 0.5476          | 0.0767         | -3.0040          | 0.2100             | 3.0808          | -15.3820       | -4.3890      | -2.6414         | -2.6409       |
+| 0.5199        | 5.0     | 375  | 0.5476          | 0.0830         | -3.0444          | 0.2100             | 3.1274          | -15.4628       | -4.3765      | -2.6415         | -2.6410       |
+| 0.5025        | 5.3333  | 400  | 0.5476          | 0.0784         | -3.0520          | 0.2100             | 3.1304          | -15.4779       | -4.3857      | -2.6406         | -2.6401       |
+| 0.5199        | 5.6667  | 425  | 0.5476          | 0.0772         | -3.0417          | 0.2100             | 3.1189          | -15.4575       | -4.3882      | -2.6418         | -2.6412       |
+| 0.5025        | 6.0     | 450  | 0.5476          | 0.0775         | -3.0690          | 0.2100             | 3.1465          | -15.5119       | -4.3875      | -2.6403         | -2.6398       |
+| 0.5718        | 6.3333  | 475  | 0.5476          | 0.0722         | -3.0608          | 0.2100             | 3.1330          | -15.4956       | -4.3980      | -2.6403         | -2.6398       |
+| 0.5718        | 6.6667  | 500  | 0.5476          | 0.0733         | -3.0661          | 0.2100             | 3.1394          | -15.5061       | -4.3958      | -2.6403         | -2.6397       |
+| 0.5025        | 7.0     | 525  | 0.5476          | 0.0687         | -3.0692          | 0.2100             | 3.1379          | -15.5123       | -4.4051      | -2.6407         | -2.6402       |
+| 0.5199        | 7.3333  | 550  | 0.5476          | 0.0691         | -3.0762          | 0.2100             | 3.1454          | -15.5265       | -4.4042      | -2.6401         | -2.6396       |
+| 0.5372        | 7.6667  | 575  | 0.5476          | 0.0728         | -3.0945          | 0.2100             | 3.1672          | -15.5629       | -4.3970      | -2.6414         | -2.6409       |
+| 0.5718        | 8.0     | 600  | 0.5476          | 0.0736         | -3.0806          | 0.2100             | 3.1541          | -15.5351       | -4.3953      | -2.6405         | -2.6400       |
+| 0.5372        | 8.3333  | 625  | 0.5476          | 0.0806         | -3.0954          | 0.2100             | 3.1759          | -15.5647       | -4.3813      | -2.6410         | -2.6405       |
+| 0.4332        | 8.6667  | 650  | 0.5476          | 0.0762         | -3.0922          | 0.2100             | 3.1684          | -15.5583       | -4.3900      | -2.6412         | -2.6407       |
+| 0.5372        | 9.0     | 675  | 0.5476          | 0.0738         | -3.0924          | 0.2100             | 3.1662          | -15.5587       | -4.3948      | -2.6408         | -2.6403       |
+| 0.5025        | 9.3333  | 700  | 0.5476          | 0.0702         | -3.0892          | 0.2100             | 3.1594          | -15.5524       | -4.4020      | -2.6405         | -2.6400       |
+| 0.5025        | 9.6667  | 725  | 0.5476          | 0.0641         | -3.0956          | 0.2100             | 3.1597          | -15.5651       | -4.4142      | -2.6410         | -2.6405       |
+| 0.5892        | 10.0    | 750  | 0.5476          | 0.0696         | -3.0933          | 0.2100             | 3.1630          | -15.5606       | -4.4032      | -2.6403         | -2.6398       |
+| 0.5199        | 10.3333 | 775  | 0.5476          | 0.0764         | -3.0810          | 0.2100             | 3.1574          | -15.5361       | -4.3897      | -2.6404         | -2.6399       |
+| 0.5199        | 10.6667 | 800  | 0.5476          | 0.0750         | -3.0945          | 0.2100             | 3.1695          | -15.5629       | -4.3925      | -2.6399         | -2.6394       |
+| 0.5372        | 11.0    | 825  | 0.5477          | 0.0727         | -3.0777          | 0.2100             | 3.1504          | -15.5293       | -4.3970      | -2.6405         | -2.6399       |
+| 0.5199        | 11.3333 | 850  | 0.5477          | 0.0760         | -3.0775          | 0.2100             | 3.1534          | -15.5289       | -4.3905      | -2.6402         | -2.6397       |
+| 0.6065        | 11.6667 | 875  | 0.5476          | 0.0737         | -3.0877          | 0.2100             | 3.1615          | -15.5495       | -4.3950      | -2.6404         | -2.6398       |
+| 0.5718        | 12.0    | 900  | 0.5476          | 0.0713         | -3.0915          | 0.2100             | 3.1628          | -15.5570       | -4.3999      | -2.6403         | -2.6398       |
+| 0.4159        | 12.3333 | 925  | 0.5476          | 0.0687         | -3.0820          | 0.2100             | 3.1507          | -15.5379       | -4.4051      | -2.6403         | -2.6398       |
+| 0.6238        | 12.6667 | 950  | 0.5476          | 0.0699         | -3.0830          | 0.2100             | 3.1530          | -15.5400       | -4.4026      | -2.6403         | -2.6398       |
+| 0.6065        | 13.0    | 975  | 0.5476          | 0.0699         | -3.0830          | 0.2100             | 3.1530          | -15.5400       | -4.4026      | -2.6403         | -2.6398       |
+| 0.5025        | 13.3333 | 1000 | 0.5476          | 0.0699         | -3.0830          | 0.2100             | 3.1530          | -15.5400       | -4.4026      | -2.6403         | -2.6398       |
 ### Framework versions

final_checkpoint/model-00001-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9aa2e9687a5e5d24a999a996e9fe4c2bc1cf34ad347da5dc5c7e0adffcb14982
 size 4943162240

 version https://git-lfs.github.com/spec/v1
+oid sha256:055596a77af9659da7d9951fb3f75af45721eb5b2733ae8b926220c9486f6f4e
 size 4943162240

final_checkpoint/model-00002-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:268bb18cc8bbff53c912fa3961a6281dd5c163edd1b8e5c85c9b12e87e4e3a63
 size 4999819232

 version https://git-lfs.github.com/spec/v1
+oid sha256:6eab27f229973937f2de1d94236003e583e3eeb20ef32700ef02e9c8b0bb1703
 size 4999819232

final_checkpoint/model-00003-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:bbc021dcf68d9e7ddaab0ead255721e73b7f652e3bfd34985bba6c029e0b729c
 size 4540516256

 version https://git-lfs.github.com/spec/v1
+oid sha256:7fb19c1138598d76cc73ff424bdd75abc5ded37d3446ab091e303c3a1e22d2f7
 size 4540516256

model-00001-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9aa2e9687a5e5d24a999a996e9fe4c2bc1cf34ad347da5dc5c7e0adffcb14982
 size 4943162240

 version https://git-lfs.github.com/spec/v1
+oid sha256:055596a77af9659da7d9951fb3f75af45721eb5b2733ae8b926220c9486f6f4e
 size 4943162240

model-00002-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:268bb18cc8bbff53c912fa3961a6281dd5c163edd1b8e5c85c9b12e87e4e3a63
 size 4999819232

 version https://git-lfs.github.com/spec/v1
+oid sha256:6eab27f229973937f2de1d94236003e583e3eeb20ef32700ef02e9c8b0bb1703
 size 4999819232

model-00003-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:bbc021dcf68d9e7ddaab0ead255721e73b7f652e3bfd34985bba6c029e0b729c
 size 4540516256

 version https://git-lfs.github.com/spec/v1
+oid sha256:7fb19c1138598d76cc73ff424bdd75abc5ded37d3446ab091e303c3a1e22d2f7
 size 4540516256

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:91c299c1029aeb4a9610760559d7581036fc79df156c10b8ddf53908122495f9
 size 4667

 version https://git-lfs.github.com/spec/v1
+oid sha256:e53d741e157840027466f7d5de111d0f3d8a1350974d7796488b54e8c05605c8
 size 4667