End of training

Browse files

Files changed (4) hide show

README.md +38 -0
model-00001-of-00002.safetensors +1 -1
model-00002-of-00002.safetensors +1 -1
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -3,6 +3,8 @@ license: other
 base_model: facebook/opt-1.3b
 tags:
 - generated_from_trainer
 model-index:
 - name: reward_modeling_anthropic_hh
   results: []
@@ -14,6 +16,23 @@ should probably proofread and complete it, then remove this comment. -->
 # reward_modeling_anthropic_hh
 This model is a fine-tuned version of [facebook/opt-1.3b](https://huggingface.co/facebook/opt-1.3b) on an unknown dataset.
 ## Model description
@@ -44,6 +63,25 @@ The following hyperparameters were used during training:
 ### Training results
 ### Framework versions

 base_model: facebook/opt-1.3b
 tags:
 - generated_from_trainer
+metrics:
+- accuracy
 model-index:
 - name: reward_modeling_anthropic_hh
   results: []
 # reward_modeling_anthropic_hh
 This model is a fine-tuned version of [facebook/opt-1.3b](https://huggingface.co/facebook/opt-1.3b) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.6907
+- Accuracy: 0.6825
+- Train Rewards/chosen: -1.8222
+- Train Rewards/rejected: -3.6005
+- Train Rewards/accuracies: 0.8138
+- Train Rewards/margins: 1.7783
+- Train Nll Loss: 2.4635
+- Train  Logit Total Loss: 0.4241
+- Train  Logit Loss: 0.4035
+- Rewards/chosen: -2.0106
+- Rewards/rejected: -3.0639
+- Rewards/accuracies: 0.6657
+- Rewards/margins: 1.0533
+- Nll Loss: 2.4906
+-  Logit Total Loss: 0.6892
+-  Logit Loss: 0.6710
 ## Model description
 ### Training results
+| Training Loss | Epoch | Step | Validation Loss | Accuracy | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Nll Loss |  Logit Total Loss |  Logit Loss |
+|:-------------:|:-----:|:----:|:---------------:|:--------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------:|:-----------------:|:-----------:|
+| 0.7169        | 0.11  | 100  | 0.6921          | 0.5959   | -1.7367        | -1.8694          | 0.5855             | 0.1326          | 3.0057   | 0.6899            | 0.6665      |
+| 0.7082        | 0.23  | 200  | 0.6978          | 0.5938   | -3.3995        | -3.5818          | 0.5802             | 0.1823          | 3.2073   | 0.6959            | 0.6706      |
+| 0.6744        | 0.34  | 300  | 0.6681          | 0.6062   | -2.3751        | -2.7036          | 0.5956             | 0.3285          | 2.7061   | 0.6656            | 0.6450      |
+| 0.6154        | 0.46  | 400  | 0.6490          | 0.6433   | -1.5136        | -1.9306          | 0.6310             | 0.4171          | 2.8065   | 0.6474            | 0.6256      |
+| 0.6405        | 0.57  | 500  | 0.6573          | 0.6351   | -1.4041        | -1.8257          | 0.6226             | 0.4216          | 2.6995   | 0.6577            | 0.6371      |
+| 0.6284        | 0.69  | 600  | 0.6448          | 0.6557   | -2.3215        | -2.7092          | 0.6440             | 0.3877          | 2.6968   | 0.6433            | 0.6225      |
+| 0.6399        | 0.8   | 700  | 0.6454          | 0.6227   | -2.0755        | -2.4642          | 0.6125             | 0.3887          | 2.8089   | 0.6435            | 0.6217      |
+| 0.669         | 0.91  | 800  | 0.6385          | 0.6474   | -1.7053        | -2.1240          | 0.6379             | 0.4187          | 2.6687   | 0.6350            | 0.6145      |
+| 0.4788        | 1.03  | 900  | 0.6636          | 0.6577   | -2.1522        | -2.8529          | 0.6435             | 0.7007          | 2.5723   | 0.6620            | 0.6427      |
+| 0.4529        | 1.14  | 1000 | 0.6938          | 0.6577   | -1.1456        | -2.0167          | 0.6488             | 0.8712          | 2.5628   | 0.6897            | 0.6708      |
+| 0.4378        | 1.26  | 1100 | 0.7319          | 0.6536   | -1.4771        | -2.4829          | 0.6427             | 1.0058          | 2.5495   | 0.7282            | 0.7098      |
+| 0.4496        | 1.37  | 1200 | 0.7034          | 0.6660   | -2.6046        | -3.5817          | 0.6524             | 0.9771          | 2.5483   | 0.7006            | 0.6819      |
+| 0.3539        | 1.49  | 1300 | 0.7023          | 0.6598   | -2.2279        | -3.2122          | 0.6516             | 0.9842          | 2.5144   | 0.6963            | 0.6780      |
+| 0.5494        | 1.6   | 1400 | 0.6784          | 0.6536   | -2.3300        | -3.3018          | 0.6435             | 0.9718          | 2.4946   | 0.6749            | 0.6565      |
+| 0.4075        | 1.71  | 1500 | 0.6935          | 0.6948   | -0.9575        | -2.0411          | 0.6843             | 1.0836          | 2.4900   | 0.6884            | 0.6702      |
+| 0.4789        | 1.83  | 1600 | 0.6941          | 0.6598   | -2.1270        | -3.1756          | 0.6496             | 1.0487          | 2.5026   | 0.6924            | 0.6741      |
+| 0.4093        | 1.94  | 1700 | 0.6907          | 0.6825   | -2.0106        | -3.0639          | 0.6657             | 1.0533          | 2.4906   | 0.6892            | 0.6710      |
 ### Framework versions

model-00001-of-00002.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d8a0da600ec3486f0dbd03b936fe6a9458d69f0d9327cca41cf8c7d23709347e
 size 4994509120

 version https://git-lfs.github.com/spec/v1
+oid sha256:2494dfa4996ff1be1dcefdac3b34b708c35621a339f462faeac5ccd8c52d3e56
 size 4994509120

model-00002-of-00002.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2c361db31cc766925fa2ca0215345df7dd63d91934c3f0feda09fb821299dffe
 size 680405464

 version https://git-lfs.github.com/spec/v1
+oid sha256:bee7abff6fd72bbad68526821b11a59eac8bfbbe8e87b97e801842a572ddfe6b
 size 680405464

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:33120e34391adee413e065e0ddaf6c2b8b1e21ad204a9898e56d62685bd5cd55
 size 4792

 version https://git-lfs.github.com/spec/v1
+oid sha256:70f683ac4762bf0f000cd3067042fd9bde80b236908879918681cb2ffd0384b2
 size 4792