SmolLM-135M-QTimelines
This model is a fine-tuned version of HuggingFaceTB/SmolLM-135M on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.3978
- F1: 0.4759
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 64
- eval_batch_size: 64
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 8
- total_train_batch_size: 2048
- total_eval_batch_size: 256
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine_with_restarts
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 30
Training results
Training Loss | Epoch | Step | Validation Loss | F1 |
---|---|---|---|---|
3.0298 | 1.0 | 28 | 0.3978 | 0.4759 |
2.2371 | 2.0 | 56 | 0.4128 | 0.5102 |
1.1086 | 3.0 | 84 | 0.4444 | 0.6253 |
0.6046 | 4.0 | 112 | 0.5517 | 0.6217 |
0.3204 | 5.0 | 140 | 0.6515 | 0.6285 |
0.1769 | 6.0 | 168 | 0.7916 | 0.6156 |
0.1709 | 7.0 | 196 | 0.8570 | 0.6154 |
0.1253 | 8.0 | 224 | 0.9544 | 0.6176 |
0.0999 | 9.0 | 252 | 1.0336 | 0.6203 |
0.0664 | 10.0 | 280 | 0.9978 | 0.6104 |
0.0922 | 11.0 | 308 | 1.1060 | 0.6179 |
0.083 | 12.0 | 336 | 1.0753 | 0.6034 |
0.0694 | 13.0 | 364 | 1.1002 | 0.6037 |
0.0354 | 14.0 | 392 | 1.1799 | 0.5949 |
0.0265 | 15.0 | 420 | 1.1595 | 0.6202 |
0.023 | 16.0 | 448 | 1.2321 | 0.5960 |
0.0116 | 17.0 | 476 | 1.2091 | 0.6261 |
0.0037 | 18.0 | 504 | 1.2787 | 0.6117 |
0.0008 | 19.0 | 532 | 1.3259 | 0.6033 |
0.0028 | 20.0 | 560 | 1.3242 | 0.6084 |
0.0009 | 21.0 | 588 | 1.3287 | 0.6103 |
0.0008 | 22.0 | 616 | 1.3390 | 0.6100 |
0.0008 | 23.0 | 644 | 1.3445 | 0.6100 |
0.0007 | 24.0 | 672 | 1.3490 | 0.6104 |
0.0022 | 25.0 | 700 | 1.3509 | 0.6101 |
0.0007 | 26.0 | 728 | 1.3539 | 0.6104 |
0.0021 | 27.0 | 756 | 1.3562 | 0.6101 |
0.0007 | 28.0 | 784 | 1.3565 | 0.6104 |
0.0006 | 29.0 | 812 | 1.3567 | 0.6105 |
0.0007 | 30.0 | 840 | 1.3568 | 0.6104 |
Framework versions
- Transformers 4.47.1
- Pytorch 2.5.1+cu124
- Datasets 3.0.1
- Tokenizers 0.21.0
- Downloads last month
- 3
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for hugosousa/SmolLM-135M-QTimelines
Base model
HuggingFaceTB/SmolLM-135M