--- license: other base_model: Qwen/Qwen1.5-4B tags: - generated_from_trainer datasets: - tyzhu/lmind_hotpot_train8000_eval7405_v1_qa metrics: - accuracy model-index: - name: lmind_hotpot_train8000_eval7405_v1_qa_1e-4_lora2 results: - task: name: Causal Language Modeling type: text-generation dataset: name: tyzhu/lmind_hotpot_train8000_eval7405_v1_qa type: tyzhu/lmind_hotpot_train8000_eval7405_v1_qa metrics: - name: Accuracy type: accuracy value: 0.4897142857142857 library_name: peft --- # lmind_hotpot_train8000_eval7405_v1_qa_1e-4_lora2 This model is a fine-tuned version of [Qwen/Qwen1.5-4B](https://huggingface.co/Qwen/Qwen1.5-4B) on the tyzhu/lmind_hotpot_train8000_eval7405_v1_qa dataset. It achieves the following results on the evaluation set: - Loss: 4.1528 - Accuracy: 0.4897 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0001 - train_batch_size: 2 - eval_batch_size: 2 - seed: 42 - distributed_type: multi-GPU - num_devices: 4 - gradient_accumulation_steps: 4 - total_train_batch_size: 32 - total_eval_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 50.0 ### Training results | Training Loss | Epoch | Step | Validation Loss | Accuracy | |:-------------:|:-----:|:-----:|:---------------:|:--------:| | 2.2503 | 1.0 | 250 | 2.3237 | 0.5156 | | 2.087 | 2.0 | 500 | 2.3309 | 0.5164 | | 1.849 | 3.0 | 750 | 2.4019 | 0.5145 | | 1.6193 | 4.0 | 1000 | 2.5039 | 0.5104 | | 1.3666 | 5.0 | 1250 | 2.6544 | 0.5050 | | 1.1435 | 6.0 | 1500 | 2.8436 | 0.5011 | | 0.9171 | 7.0 | 1750 | 3.0320 | 0.4971 | | 0.7531 | 8.0 | 2000 | 3.2585 | 0.4930 | | 0.6101 | 9.0 | 2250 | 3.3418 | 0.4925 | | 0.5392 | 10.0 | 2500 | 3.5373 | 0.4916 | | 0.4718 | 11.0 | 2750 | 3.6313 | 0.4893 | | 0.4446 | 12.0 | 3000 | 3.6736 | 0.4906 | | 0.4204 | 13.0 | 3250 | 3.7342 | 0.4906 | | 0.4131 | 14.0 | 3500 | 3.7778 | 0.4897 | | 0.3924 | 15.0 | 3750 | 3.8210 | 0.4897 | | 0.3913 | 16.0 | 4000 | 3.8833 | 0.4904 | | 0.376 | 17.0 | 4250 | 3.8936 | 0.4898 | | 0.3785 | 18.0 | 4500 | 3.8824 | 0.49 | | 0.367 | 19.0 | 4750 | 3.9720 | 0.4901 | | 0.3676 | 20.0 | 5000 | 3.9374 | 0.4909 | | 0.3602 | 21.0 | 5250 | 3.9380 | 0.4904 | | 0.3639 | 22.0 | 5500 | 3.9516 | 0.4910 | | 0.3533 | 23.0 | 5750 | 4.0207 | 0.4916 | | 0.3587 | 24.0 | 6000 | 3.9905 | 0.4917 | | 0.3479 | 25.0 | 6250 | 4.0617 | 0.4915 | | 0.3511 | 26.0 | 6500 | 4.0106 | 0.4903 | | 0.3442 | 27.0 | 6750 | 4.0401 | 0.4910 | | 0.3496 | 28.0 | 7000 | 4.0157 | 0.4897 | | 0.34 | 29.0 | 7250 | 4.0503 | 0.4902 | | 0.3448 | 30.0 | 7500 | 4.0786 | 0.4908 | | 0.3406 | 31.0 | 7750 | 4.1239 | 0.4905 | | 0.3375 | 32.0 | 8000 | 4.1210 | 0.4915 | | 0.339 | 33.0 | 8250 | 4.1039 | 0.4898 | | 0.3418 | 34.0 | 8500 | 4.0879 | 0.4902 | | 0.3364 | 35.0 | 8750 | 4.0782 | 0.4907 | | 0.3421 | 36.0 | 9000 | 4.0512 | 0.4910 | | 0.3337 | 37.0 | 9250 | 4.1727 | 0.4895 | | 0.3375 | 38.0 | 9500 | 4.1615 | 0.4889 | | 0.3304 | 39.0 | 9750 | 4.1755 | 0.4899 | | 0.3341 | 40.0 | 10000 | 4.1542 | 0.4903 | | 0.3311 | 41.0 | 10250 | 4.1479 | 0.4889 | | 0.3337 | 42.0 | 10500 | 4.1005 | 0.4907 | | 0.3284 | 43.0 | 10750 | 4.1688 | 0.4909 | | 0.3343 | 44.0 | 11000 | 4.1412 | 0.4904 | | 0.3301 | 45.0 | 11250 | 4.0906 | 0.4917 | | 0.3307 | 46.0 | 11500 | 4.1221 | 0.4895 | | 0.328 | 47.0 | 11750 | 4.1250 | 0.4892 | | 0.3293 | 48.0 | 12000 | 4.1082 | 0.4911 | | 0.3261 | 49.0 | 12250 | 4.1219 | 0.4903 | | 0.3279 | 50.0 | 12500 | 4.1528 | 0.4897 | ### Framework versions - PEFT 0.5.0 - Transformers 4.41.1 - Pytorch 2.1.0+cu121 - Datasets 2.19.1 - Tokenizers 0.19.1