aisuko's picture
End of training
167144a verified
---
license: apache-2.0
base_model: HuggingFaceTB/SmolLM-135M-Instruct
tags:
- trl
- orpo
- generated_from_trainer
model-index:
- name: ft-smollm-135M-instruct-on-hf-ultrafeedback
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# ft-smollm-135M-instruct-on-hf-ultrafeedback
This model is a fine-tuned version of [HuggingFaceTB/SmolLM-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct) on the HuggingFaceH4/ultrafeedback_binarized dataset.
It achieves the following results on the evaluation set:
- Loss: 1.0637
- Rewards/chosen: -0.1247
- Rewards/rejected: -0.1259
- Rewards/accuracies: 0.4730
- Rewards/margins: 0.0012
- Logps/rejected: -1.2589
- Logps/chosen: -1.2469
- Logits/rejected: 55.4006
- Logits/chosen: 55.1081
- Nll Loss: 0.9890
- Log Odds Ratio: -0.7474
- Log Odds Chosen: 0.0451
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0003
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
### Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Nll Loss | Log Odds Ratio | Log Odds Chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------:|:--------------:|:---------------:|
| 2.2684 | 0.02 | 100 | 1.1258 | -0.1301 | -0.1302 | 0.4680 | 0.0001 | -1.3018 | -1.3007 | 17.8837 | 17.7783 | 1.0514 | -0.7435 | 0.0082 |
| 1.1427 | 0.05 | 200 | 1.1383 | -0.1295 | -0.1295 | 0.4740 | 0.0000 | -1.2954 | -1.2951 | 28.9673 | 28.6104 | 1.0633 | -0.7496 | 0.0117 |
| 1.135 | 0.07 | 300 | 1.1305 | -0.1290 | -0.1288 | 0.4640 | -0.0002 | -1.2876 | -1.2897 | 32.8905 | 32.5299 | 1.0547 | -0.7578 | 0.0117 |
| 1.15 | 0.09 | 400 | 1.1354 | -0.1303 | -0.1297 | 0.4620 | -0.0006 | -1.2969 | -1.3029 | 35.1267 | 34.7456 | 1.0592 | -0.7623 | 0.0073 |
| 1.1138 | 0.11 | 500 | 1.1345 | -0.1311 | -0.1309 | 0.4550 | -0.0002 | -1.3089 | -1.3110 | 36.9308 | 36.5745 | 1.0588 | -0.7571 | 0.0148 |
| 1.1617 | 0.14 | 600 | 1.1364 | -0.1312 | -0.1309 | 0.4660 | -0.0003 | -1.3086 | -1.3117 | 38.4101 | 38.0669 | 1.0602 | -0.7620 | 0.0204 |
| 1.136 | 0.16 | 700 | 1.1341 | -0.1319 | -0.1314 | 0.4610 | -0.0005 | -1.3138 | -1.3185 | 40.1971 | 39.8326 | 1.0581 | -0.7601 | 0.0145 |
| 1.155 | 0.18 | 800 | 1.1349 | -0.1319 | -0.1314 | 0.4620 | -0.0005 | -1.3137 | -1.3188 | 41.2812 | 40.9449 | 1.0588 | -0.7605 | 0.0153 |
| 1.185 | 0.21 | 900 | 1.1533 | -0.1339 | -0.1331 | 0.4570 | -0.0008 | -1.3305 | -1.3387 | 42.5938 | 42.3067 | 1.0766 | -0.7669 | 0.0171 |
| 1.1612 | 0.23 | 1000 | 1.1245 | -0.1310 | -0.1301 | 0.4550 | -0.0009 | -1.3010 | -1.3097 | 43.6187 | 43.3038 | 1.0480 | -0.7649 | 0.0111 |
| 1.2078 | 0.25 | 1100 | 1.1320 | -0.1319 | -0.1311 | 0.4680 | -0.0007 | -1.3115 | -1.3189 | 44.8567 | 44.5401 | 1.0556 | -0.7642 | 0.0173 |
| 1.1671 | 0.27 | 1200 | 1.1365 | -0.1325 | -0.1318 | 0.4600 | -0.0007 | -1.3179 | -1.3250 | 46.2434 | 45.9399 | 1.0605 | -0.7604 | 0.0102 |
| 1.1141 | 0.3 | 1300 | 1.1205 | -0.1306 | -0.1302 | 0.4560 | -0.0004 | -1.3017 | -1.3062 | 46.5845 | 46.2657 | 1.0443 | -0.7615 | 0.0167 |
| 1.1555 | 0.32 | 1400 | 1.1184 | -0.1301 | -0.1298 | 0.4660 | -0.0003 | -1.2978 | -1.3012 | 47.1046 | 46.8050 | 1.0421 | -0.7636 | 0.0205 |
| 1.1108 | 0.34 | 1500 | 1.1203 | -0.1302 | -0.1296 | 0.4640 | -0.0006 | -1.2961 | -1.3016 | 47.1987 | 46.9721 | 1.0438 | -0.7648 | 0.0184 |
| 1.1335 | 0.37 | 1600 | 1.1162 | -0.1302 | -0.1296 | 0.4620 | -0.0006 | -1.2963 | -1.3024 | 48.5285 | 48.2242 | 1.0399 | -0.7628 | 0.0162 |
| 1.1315 | 0.39 | 1700 | 1.1083 | -0.1299 | -0.1299 | 0.4620 | 0.0000 | -1.2987 | -1.2987 | 48.3002 | 48.0707 | 1.0327 | -0.7559 | 0.0278 |
| 1.1034 | 0.41 | 1800 | 1.1083 | -0.1298 | -0.1295 | 0.4640 | -0.0002 | -1.2955 | -1.2978 | 49.6016 | 49.3051 | 1.0330 | -0.7531 | 0.0196 |
| 1.0558 | 0.43 | 1900 | 1.1081 | -0.1290 | -0.1284 | 0.4600 | -0.0006 | -1.2845 | -1.2901 | 49.6973 | 49.4804 | 1.0317 | -0.7645 | 0.0224 |
| 1.0987 | 0.46 | 2000 | 1.1043 | -0.1285 | -0.1280 | 0.4680 | -0.0005 | -1.2798 | -1.2850 | 50.0976 | 49.8574 | 1.0279 | -0.7639 | 0.0175 |
| 1.1083 | 0.48 | 2100 | 1.0967 | -0.1274 | -0.1270 | 0.4660 | -0.0004 | -1.2701 | -1.2744 | 50.4175 | 50.1898 | 1.0200 | -0.7677 | 0.0294 |
| 1.1532 | 0.5 | 2200 | 1.0977 | -0.1285 | -0.1285 | 0.4600 | 0.0000 | -1.2851 | -1.2850 | 51.1548 | 50.9146 | 1.0225 | -0.7521 | 0.0215 |
| 1.1204 | 0.53 | 2300 | 1.0918 | -0.1275 | -0.1276 | 0.4690 | 0.0001 | -1.2762 | -1.2750 | 51.6649 | 51.3750 | 1.0162 | -0.7559 | 0.0256 |
| 1.1226 | 0.55 | 2400 | 1.0955 | -0.1285 | -0.1292 | 0.4700 | 0.0007 | -1.2920 | -1.2848 | 52.1800 | 51.9177 | 1.0204 | -0.7503 | 0.0402 |
| 1.1085 | 0.57 | 2500 | 1.0868 | -0.1272 | -0.1276 | 0.4670 | 0.0004 | -1.2765 | -1.2725 | 52.0037 | 51.7965 | 1.0113 | -0.7554 | 0.0400 |
| 1.0762 | 0.59 | 2600 | 1.0876 | -0.1269 | -0.1271 | 0.4670 | 0.0002 | -1.2713 | -1.2691 | 53.3919 | 53.0727 | 1.0117 | -0.7592 | 0.0388 |
| 1.088 | 0.62 | 2700 | 1.0822 | -0.1263 | -0.1264 | 0.4650 | 0.0001 | -1.2640 | -1.2628 | 53.7430 | 53.4174 | 1.0063 | -0.7587 | 0.0342 |
| 1.1111 | 0.64 | 2800 | 1.0821 | -0.1267 | -0.1274 | 0.4700 | 0.0007 | -1.2740 | -1.2667 | 53.9858 | 53.6674 | 1.0069 | -0.7529 | 0.0426 |
| 1.0906 | 0.66 | 2900 | 1.0785 | -0.1262 | -0.1268 | 0.4690 | 0.0006 | -1.2678 | -1.2617 | 53.9251 | 53.6345 | 1.0033 | -0.7527 | 0.0408 |
| 1.1186 | 0.69 | 3000 | 1.0785 | -0.1258 | -0.1262 | 0.4700 | 0.0004 | -1.2625 | -1.2583 | 54.2337 | 53.9554 | 1.0026 | -0.7593 | 0.0361 |
| 1.1648 | 0.71 | 3100 | 1.0783 | -0.1262 | -0.1269 | 0.4630 | 0.0007 | -1.2693 | -1.2621 | 54.2961 | 54.0128 | 1.0031 | -0.7522 | 0.0405 |
| 1.0952 | 0.73 | 3200 | 1.0784 | -0.1263 | -0.1271 | 0.4700 | 0.0009 | -1.2714 | -1.2625 | 54.8142 | 54.5032 | 1.0034 | -0.7506 | 0.0443 |
| 1.0759 | 0.75 | 3300 | 1.0747 | -0.1260 | -0.1269 | 0.4680 | 0.0009 | -1.2686 | -1.2596 | 55.0002 | 54.6848 | 0.9995 | -0.7519 | 0.0432 |
| 1.073 | 0.78 | 3400 | 1.0688 | -0.1252 | -0.1264 | 0.4720 | 0.0011 | -1.2639 | -1.2525 | 54.9206 | 54.5984 | 0.9938 | -0.7500 | 0.0478 |
| 1.0868 | 0.8 | 3500 | 1.0705 | -0.1262 | -0.1277 | 0.4810 | 0.0015 | -1.2772 | -1.2623 | 55.3186 | 54.9809 | 0.9962 | -0.7429 | 0.0469 |
| 1.0633 | 0.82 | 3600 | 1.0692 | -0.1255 | -0.1266 | 0.4750 | 0.0011 | -1.2656 | -1.2547 | 55.3886 | 55.0766 | 0.9944 | -0.7480 | 0.0435 |
| 1.0789 | 0.85 | 3700 | 1.0660 | -0.1248 | -0.1259 | 0.4750 | 0.0011 | -1.2589 | -1.2484 | 55.2801 | 54.9772 | 0.9910 | -0.7496 | 0.0439 |
| 1.0657 | 0.87 | 3800 | 1.0659 | -0.1252 | -0.1264 | 0.4750 | 0.0012 | -1.2641 | -1.2516 | 55.3299 | 55.0358 | 0.9913 | -0.7457 | 0.0439 |
| 1.115 | 0.89 | 3900 | 1.0661 | -0.1253 | -0.1267 | 0.4790 | 0.0014 | -1.2665 | -1.2526 | 55.4077 | 55.1136 | 0.9917 | -0.7439 | 0.0471 |
| 1.1083 | 0.91 | 4000 | 1.0662 | -0.1252 | -0.1266 | 0.4740 | 0.0014 | -1.2663 | -1.2522 | 55.4230 | 55.1339 | 0.9918 | -0.7441 | 0.0479 |
| 1.079 | 0.94 | 4100 | 1.0639 | -0.1248 | -0.1260 | 0.4740 | 0.0013 | -1.2604 | -1.2477 | 55.4248 | 55.1307 | 0.9893 | -0.7466 | 0.0464 |
| 1.1014 | 0.96 | 4200 | 1.0636 | -0.1247 | -0.1259 | 0.4750 | 0.0012 | -1.2594 | -1.2470 | 55.3555 | 55.0644 | 0.9889 | -0.7470 | 0.0455 |
| 1.0669 | 0.98 | 4300 | 1.0637 | -0.1247 | -0.1259 | 0.4730 | 0.0012 | -1.2589 | -1.2469 | 55.4006 | 55.1081 | 0.9890 | -0.7474 | 0.0451 |
### Framework versions
- Transformers 4.39.3
- Pytorch 2.1.2
- Datasets 2.18.0
- Tokenizers 0.15.2