SmolLM2-360M-TemporalQuestions

This model is a fine-tuned version of HuggingFaceTB/SmolLM2-360M on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0257
  • F1: 0.9846

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 1024
  • total_eval_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine_with_restarts
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 30

Training results

Training Loss Epoch Step Validation Loss F1
0.086 1.0 223 0.0629 0.9514
0.1263 2.0 446 0.0466 0.9647
0.0172 3.0 669 0.0351 0.9745
0.0729 4.0 892 0.0319 0.9770
0.0254 5.0 1115 0.0320 0.9788
0.0258 6.0 1338 0.0288 0.9798
0.017 7.0 1561 0.0302 0.9812
0.0278 8.0 1784 0.0302 0.9807
0.0105 9.0 2007 0.0338 0.9797
0.0503 10.0 2230 0.0297 0.9808
0.0148 11.0 2453 0.0257 0.9846
0.0005 12.0 2676 0.0305 0.9822
0.0052 13.0 2899 0.0282 0.9853
0.0012 14.0 3122 0.0317 0.9837
0.0095 15.0 3345 0.0338 0.9859
0.0004 16.0 3568 0.0307 0.9865
0.0003 17.0 3791 0.0336 0.9856
0.0074 18.0 4014 0.0338 0.9855
0.0003 19.0 4237 0.0327 0.9864
0.0003 20.0 4460 0.0353 0.9858
0.0001 21.0 4683 0.0377 0.9858
0.0001 22.0 4906 0.0380 0.9870
0.0001 23.0 5129 0.0389 0.9866
0.0001 24.0 5352 0.0399 0.9866
0.0001 25.0 5575 0.0404 0.9866
0.0001 26.0 5798 0.0408 0.9867
0.0001 27.0 6021 0.0409 0.9867
0.0002 28.0 6244 0.0411 0.9867
0.0001 29.0 6467 0.0411 0.9867
0.0023 29.8691 6660 0.0412 0.9867

Framework versions

  • Transformers 4.47.1
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.21.0
Downloads last month
10
Safetensors
Model size
362M params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for hugosousa/SmolLM2-360M-TemporalQuestions

Finetuned
(10)
this model