aisuko commited on
Commit
29428ff
1 Parent(s): 592228b

End of training

Browse files
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
- base_model: HuggingFaceTB/SmolLM-135M-Instruct
3
  license: apache-2.0
 
4
  tags:
5
  - trl
6
  - orpo
@@ -16,6 +16,19 @@ should probably proofread and complete it, then remove this comment. -->
16
  # ft-smollm-135M-instruct-on-hf-ultrafeedback
17
 
18
  This model is a fine-tuned version of [HuggingFaceTB/SmolLM-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct) on the HuggingFaceH4/ultrafeedback_binarized dataset.
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
  ## Model description
21
 
@@ -35,11 +48,11 @@ More information needed
35
 
36
  The following hyperparameters were used during training:
37
  - learning_rate: 0.0003
38
- - train_batch_size: 8
39
- - eval_batch_size: 8
40
  - seed: 42
41
  - gradient_accumulation_steps: 2
42
- - total_train_batch_size: 16
43
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
44
  - lr_scheduler_type: linear
45
  - lr_scheduler_warmup_ratio: 0.1
@@ -47,6 +60,45 @@ The following hyperparameters were used during training:
47
 
48
  ### Training results
49
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
 
51
 
52
  ### Framework versions
 
1
  ---
 
2
  license: apache-2.0
3
+ base_model: HuggingFaceTB/SmolLM-135M-Instruct
4
  tags:
5
  - trl
6
  - orpo
 
16
  # ft-smollm-135M-instruct-on-hf-ultrafeedback
17
 
18
  This model is a fine-tuned version of [HuggingFaceTB/SmolLM-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct) on the HuggingFaceH4/ultrafeedback_binarized dataset.
19
+ It achieves the following results on the evaluation set:
20
+ - Loss: 1.0652
21
+ - Rewards/chosen: -0.1245
22
+ - Rewards/rejected: -0.1253
23
+ - Rewards/accuracies: 0.4770
24
+ - Rewards/margins: 0.0008
25
+ - Logps/rejected: -1.2525
26
+ - Logps/chosen: -1.2449
27
+ - Logits/rejected: 52.1922
28
+ - Logits/chosen: 51.8967
29
+ - Nll Loss: 0.9899
30
+ - Log Odds Ratio: -0.7525
31
+ - Log Odds Chosen: 0.0414
32
 
33
  ## Model description
34
 
 
48
 
49
  The following hyperparameters were used during training:
50
  - learning_rate: 0.0003
51
+ - train_batch_size: 4
52
+ - eval_batch_size: 4
53
  - seed: 42
54
  - gradient_accumulation_steps: 2
55
+ - total_train_batch_size: 8
56
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
57
  - lr_scheduler_type: linear
58
  - lr_scheduler_warmup_ratio: 0.1
 
60
 
61
  ### Training results
62
 
63
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Nll Loss | Log Odds Ratio | Log Odds Chosen |
64
+ |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------:|:--------------:|:---------------:|
65
+ | 2.2135 | 0.03 | 100 | 1.1267 | -0.1300 | -0.1302 | 0.4650 | 0.0001 | -1.3019 | -1.3005 | 19.2030 | 19.1013 | 1.0525 | -0.7427 | 0.0087 |
66
+ | 1.1677 | 0.05 | 200 | 1.1234 | -0.1281 | -0.1280 | 0.4670 | -0.0001 | -1.2803 | -1.2809 | 27.7444 | 27.6236 | 1.0481 | -0.7528 | 0.0127 |
67
+ | 1.1676 | 0.08 | 300 | 1.1302 | -0.1291 | -0.1286 | 0.4660 | -0.0004 | -1.2865 | -1.2908 | 30.9616 | 30.8646 | 1.0544 | -0.7587 | 0.0102 |
68
+ | 1.133 | 0.11 | 400 | 1.1538 | -0.1322 | -0.1315 | 0.4510 | -0.0008 | -1.3145 | -1.3223 | 33.4234 | 33.3438 | 1.0772 | -0.7658 | 0.0093 |
69
+ | 1.1642 | 0.13 | 500 | 1.1382 | -0.1315 | -0.1309 | 0.4520 | -0.0006 | -1.3092 | -1.3149 | 34.6557 | 34.5676 | 1.0623 | -0.7593 | 0.0099 |
70
+ | 1.1315 | 0.16 | 600 | 1.1392 | -0.1315 | -0.1306 | 0.4560 | -0.0009 | -1.3063 | -1.3154 | 36.8073 | 36.6894 | 1.0628 | -0.7639 | 0.0066 |
71
+ | 1.1564 | 0.19 | 700 | 1.1323 | -0.1313 | -0.1307 | 0.4710 | -0.0005 | -1.3073 | -1.3126 | 38.2088 | 38.0446 | 1.0565 | -0.7576 | 0.0112 |
72
+ | 1.1562 | 0.21 | 800 | 1.1310 | -0.1314 | -0.1313 | 0.4640 | -0.0000 | -1.3133 | -1.3136 | 40.0474 | 39.8232 | 1.0554 | -0.7559 | 0.0252 |
73
+ | 1.1665 | 0.24 | 900 | 1.1220 | -0.1307 | -0.1301 | 0.4570 | -0.0006 | -1.3013 | -1.3069 | 40.6970 | 40.5118 | 1.0462 | -0.7580 | 0.0126 |
74
+ | 1.1713 | 0.27 | 1000 | 1.1329 | -0.1315 | -0.1309 | 0.4580 | -0.0005 | -1.3093 | -1.3146 | 42.3554 | 42.1528 | 1.0565 | -0.7633 | 0.0184 |
75
+ | 1.1306 | 0.29 | 1100 | 1.1211 | -0.1310 | -0.1304 | 0.4560 | -0.0006 | -1.3039 | -1.3098 | 42.6754 | 42.5111 | 1.0451 | -0.7594 | 0.0122 |
76
+ | 1.1215 | 0.32 | 1200 | 1.1273 | -0.1313 | -0.1306 | 0.4570 | -0.0007 | -1.3056 | -1.3128 | 44.4291 | 44.2082 | 1.0511 | -0.7615 | 0.0113 |
77
+ | 1.1383 | 0.35 | 1300 | 1.1156 | -0.1298 | -0.1293 | 0.4600 | -0.0006 | -1.2926 | -1.2984 | 44.8096 | 44.6178 | 1.0392 | -0.7638 | 0.0168 |
78
+ | 1.1549 | 0.37 | 1400 | 1.1090 | -0.1292 | -0.1290 | 0.4640 | -0.0003 | -1.2898 | -1.2924 | 45.3797 | 45.1471 | 1.0332 | -0.7587 | 0.0223 |
79
+ | 1.1376 | 0.4 | 1500 | 1.1113 | -0.1296 | -0.1294 | 0.4650 | -0.0002 | -1.2935 | -1.2958 | 46.4136 | 46.1814 | 1.0354 | -0.7591 | 0.0207 |
80
+ | 1.1355 | 0.43 | 1600 | 1.1051 | -0.1286 | -0.1284 | 0.4660 | -0.0002 | -1.2839 | -1.2858 | 46.8894 | 46.6616 | 1.0290 | -0.7612 | 0.0219 |
81
+ | 1.0894 | 0.45 | 1700 | 1.1001 | -0.1282 | -0.1281 | 0.4670 | -0.0001 | -1.2810 | -1.2824 | 46.8995 | 46.7032 | 1.0238 | -0.7621 | 0.0317 |
82
+ | 1.1561 | 0.48 | 1800 | 1.0976 | -0.1283 | -0.1281 | 0.4740 | -0.0002 | -1.2811 | -1.2829 | 47.7268 | 47.4906 | 1.0219 | -0.7573 | 0.0210 |
83
+ | 1.0969 | 0.51 | 1900 | 1.0952 | -0.1277 | -0.1274 | 0.4710 | -0.0003 | -1.2738 | -1.2771 | 48.0909 | 47.8791 | 1.0190 | -0.7626 | 0.0221 |
84
+ | 1.1034 | 0.53 | 2000 | 1.0971 | -0.1277 | -0.1274 | 0.4650 | -0.0004 | -1.2736 | -1.2774 | 48.6271 | 48.4186 | 1.0209 | -0.7622 | 0.0210 |
85
+ | 1.0806 | 0.56 | 2100 | 1.0894 | -0.1275 | -0.1274 | 0.4730 | -0.0001 | -1.2743 | -1.2750 | 48.9781 | 48.7443 | 1.0139 | -0.7556 | 0.0238 |
86
+ | 1.1148 | 0.59 | 2200 | 1.0917 | -0.1282 | -0.1290 | 0.4770 | 0.0008 | -1.2896 | -1.2820 | 49.9987 | 49.7273 | 1.0168 | -0.7496 | 0.0411 |
87
+ | 1.106 | 0.61 | 2300 | 1.0866 | -0.1273 | -0.1276 | 0.4760 | 0.0003 | -1.2757 | -1.2726 | 49.6562 | 49.4520 | 1.0112 | -0.7538 | 0.0327 |
88
+ | 1.1022 | 0.64 | 2400 | 1.0876 | -0.1268 | -0.1268 | 0.4700 | -0.0000 | -1.2682 | -1.2683 | 50.6454 | 50.3935 | 1.0117 | -0.7590 | 0.0296 |
89
+ | 1.0777 | 0.67 | 2500 | 1.0871 | -0.1268 | -0.1268 | 0.4690 | 0.0001 | -1.2684 | -1.2677 | 50.7985 | 50.5549 | 1.0112 | -0.7592 | 0.0329 |
90
+ | 1.1016 | 0.69 | 2600 | 1.0805 | -0.1265 | -0.1273 | 0.4770 | 0.0008 | -1.2729 | -1.2654 | 51.1070 | 50.8537 | 1.0054 | -0.7503 | 0.0416 |
91
+ | 1.1123 | 0.72 | 2700 | 1.0785 | -0.1255 | -0.1253 | 0.4730 | -0.0002 | -1.2534 | -1.2552 | 51.0774 | 50.8296 | 1.0024 | -0.7613 | 0.0234 |
92
+ | 1.1172 | 0.75 | 2800 | 1.0736 | -0.1252 | -0.1253 | 0.4750 | 0.0002 | -1.2533 | -1.2517 | 51.2562 | 50.9836 | 0.9979 | -0.7572 | 0.0271 |
93
+ | 1.0614 | 0.77 | 2900 | 1.0718 | -0.1252 | -0.1259 | 0.4760 | 0.0007 | -1.2591 | -1.2521 | 51.5419 | 51.2800 | 0.9964 | -0.7537 | 0.0404 |
94
+ | 1.0896 | 0.8 | 3000 | 1.0695 | -0.1261 | -0.1277 | 0.4810 | 0.0016 | -1.2773 | -1.2611 | 51.5967 | 51.3290 | 0.9951 | -0.7439 | 0.0530 |
95
+ | 1.0908 | 0.83 | 3100 | 1.0711 | -0.1249 | -0.1251 | 0.4760 | 0.0002 | -1.2512 | -1.2489 | 52.0281 | 51.7418 | 0.9954 | -0.7572 | 0.0330 |
96
+ | 1.09 | 0.85 | 3200 | 1.0676 | -0.1245 | -0.1247 | 0.4720 | 0.0002 | -1.2467 | -1.2450 | 52.0018 | 51.7152 | 0.9920 | -0.7566 | 0.0315 |
97
+ | 1.0677 | 0.88 | 3300 | 1.0657 | -0.1244 | -0.1248 | 0.4740 | 0.0005 | -1.2482 | -1.2435 | 52.0825 | 51.7926 | 0.9902 | -0.7552 | 0.0390 |
98
+ | 1.0712 | 0.91 | 3400 | 1.0644 | -0.1244 | -0.1250 | 0.4760 | 0.0007 | -1.2504 | -1.2437 | 52.0637 | 51.7715 | 0.9891 | -0.7529 | 0.0402 |
99
+ | 1.0732 | 0.93 | 3500 | 1.0642 | -0.1244 | -0.1251 | 0.4770 | 0.0007 | -1.2510 | -1.2438 | 52.1319 | 51.8349 | 0.9889 | -0.7526 | 0.0404 |
100
+ | 1.0669 | 0.96 | 3600 | 1.0647 | -0.1244 | -0.1252 | 0.4770 | 0.0007 | -1.2518 | -1.2443 | 52.1397 | 51.8447 | 0.9894 | -0.7525 | 0.0411 |
101
+ | 1.0774 | 0.99 | 3700 | 1.0652 | -0.1245 | -0.1253 | 0.4770 | 0.0008 | -1.2525 | -1.2449 | 52.1922 | 51.8967 | 0.9899 | -0.7525 | 0.0414 |
102
 
103
 
104
  ### Framework versions
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:bfcdc978505e0230512c7d9ebd72f068af8877c24bfdceaf6703cde20eadc53a
3
  size 269060280
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:67846779722e2777dcf6f5371db8129dbae98fe8226cfa81b5e39fc7cdfe65bd
3
  size 269060280
runs/Aug16_00-57-22_1dd0517e0b3d/events.out.tfevents.1723770052.1dd0517e0b3d.25.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:969c6401c224999f2b0d3cfd771098e52dd96970f4d339b9883e6ed0f5352d32
3
+ size 70869
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f7e9d4cc0bfae8e14cb1db76f111604e07c06867adeac3296266c972d4985128
3
  size 5304
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3fea0959dc1572e037e06e05d2320254c9cf4fbc6162922508de8e1757161fbd
3
  size 5304