Qwen2.5-1.5B-Instruct-finetune-ru-news-lora
This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct on the None dataset. It achieves the following results on the evaluation set:
- Loss: 1.5295
- Perplexity: 4.6645
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 111
- gradient_accumulation_steps: 8
- total_train_batch_size: 16
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 20
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Perplexity |
---|---|---|---|---|
No log | 0 | 0 | 1.7086 | 5.5805 |
1.5638 | 1.0 | 75 | 1.6235 | 5.1242 |
1.6127 | 2.0 | 150 | 1.5856 | 4.9323 |
1.6656 | 3.0 | 225 | 1.5689 | 4.8497 |
1.6207 | 4.0 | 300 | 1.5578 | 4.7967 |
1.5559 | 5.0 | 375 | 1.5510 | 4.7642 |
1.5766 | 6.0 | 450 | 1.5463 | 4.7420 |
1.5744 | 7.0 | 525 | 1.5428 | 4.7257 |
1.5892 | 8.0 | 600 | 1.5401 | 4.7129 |
1.4133 | 9.0 | 675 | 1.5378 | 4.7022 |
1.6007 | 10.0 | 750 | 1.5360 | 4.6939 |
1.6776 | 11.0 | 825 | 1.5345 | 4.6872 |
1.4363 | 12.0 | 900 | 1.5332 | 4.6814 |
1.3633 | 13.0 | 975 | 1.5323 | 4.6771 |
1.4944 | 14.0 | 1050 | 1.5314 | 4.6730 |
1.4514 | 15.0 | 1125 | 1.5308 | 4.6703 |
1.4892 | 16.0 | 1200 | 1.5303 | 4.6681 |
1.3994 | 17.0 | 1275 | 1.5299 | 4.6664 |
1.507 | 18.0 | 1350 | 1.5296 | 4.6651 |
1.4906 | 19.0 | 1425 | 1.5295 | 4.6645 |
1.4982 | 19.7383 | 1480 | 1.5295 | 4.6645 |
Framework versions
- PEFT 0.14.0
- Transformers 4.47.1
- Pytorch 2.5.1+cu121
- Datasets 3.2.0
- Tokenizers 0.21.0
- Downloads last month
- 6