Qwen2.5-1.5B-Instruct-finetune-ru-news-lora

This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 2
eval_batch_size: 2
seed: 111
gradient_accumulation_steps: 8
total_train_batch_size: 16
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 20
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Perplexity
No log	0	0	1.7086	5.5805
1.5638	1.0	75	1.6235	5.1242
1.6127	2.0	150	1.5856	4.9323
1.6656	3.0	225	1.5689	4.8497
1.6207	4.0	300	1.5578	4.7967
1.5559	5.0	375	1.5510	4.7642
1.5766	6.0	450	1.5463	4.7420
1.5744	7.0	525	1.5428	4.7257
1.5892	8.0	600	1.5401	4.7129
1.4133	9.0	675	1.5378	4.7022
1.6007	10.0	750	1.5360	4.6939
1.6776	11.0	825	1.5345	4.6872
1.4363	12.0	900	1.5332	4.6814
1.3633	13.0	975	1.5323	4.6771
1.4944	14.0	1050	1.5314	4.6730
1.4514	15.0	1125	1.5308	4.6703
1.4892	16.0	1200	1.5303	4.6681
1.3994	17.0	1275	1.5299	4.6664
1.507	18.0	1350	1.5296	4.6651
1.4906	19.0	1425	1.5295	4.6645
1.4982	19.7383	1480	1.5295	4.6645