results_llama_1b

This model is a fine-tuned version of meta-llama/Llama-3.2-1B on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2093

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 2
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 2
  • num_epochs: 3
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
1.2917 0.1431 1000 1.2842
1.2761 0.2862 2000 1.2652
1.2755 0.4292 3000 1.2545
1.234 0.5723 4000 1.2463
1.1903 0.7154 5000 1.2400
1.1828 0.8585 6000 1.2353
1.1757 1.0016 7000 1.2305
1.2446 1.1447 8000 1.2279
1.1145 1.2877 9000 1.2250
1.2765 1.4308 10000 1.2222
1.2232 1.5739 11000 1.2196
1.1182 1.7170 12000 1.2176
1.1981 1.8601 13000 1.2156
1.2217 2.0031 14000 1.2141
1.2394 2.1462 15000 1.2134
1.1538 2.2893 16000 1.2124
1.1579 2.4324 17000 1.2116
1.1557 2.5755 18000 1.2107
1.1528 2.7186 19000 1.2098
1.1833 2.8616 20000 1.2093

Framework versions

  • PEFT 0.14.0
  • Transformers 4.47.1
  • Pytorch 2.5.1+cu124
  • Datasets 2.17.0
  • Tokenizers 0.21.0
Downloads last month
7
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for gui8600k/results_llama_1b

Adapter
(115)
this model