--- library_name: peft license: other base_model: unsloth/Llama-3.2-3B-Instruct tags: - llama-factory - lora - unsloth - generated_from_trainer model-index: - name: llm3br256 results: [] --- # llm3br256 This model is a fine-tuned version of [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) on the rommel_importgenius_4b8 dataset. It achieves the following results on the evaluation set: - Loss: 0.0130 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0001 - train_batch_size: 4 - eval_batch_size: 4 - seed: 42 - gradient_accumulation_steps: 8 - total_train_batch_size: 32 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 5.0 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 0.0672 | 0.0418 | 5 | 0.0755 | | 0.0476 | 0.0837 | 10 | 0.0452 | | 0.0337 | 0.1255 | 15 | 0.0356 | | 0.0333 | 0.1674 | 20 | 0.0308 | | 0.0258 | 0.2092 | 25 | 0.0272 | | 0.023 | 0.2510 | 30 | 0.0255 | | 0.0202 | 0.2929 | 35 | 0.0234 | | 0.0188 | 0.3347 | 40 | 0.0218 | | 0.0185 | 0.3766 | 45 | 0.0208 | | 0.0199 | 0.4184 | 50 | 0.0200 | | 0.0198 | 0.4603 | 55 | 0.0195 | | 0.0179 | 0.5021 | 60 | 0.0189 | | 0.0185 | 0.5439 | 65 | 0.0186 | | 0.0174 | 0.5858 | 70 | 0.0186 | | 0.0157 | 0.6276 | 75 | 0.0183 | | 0.0175 | 0.6695 | 80 | 0.0176 | | 0.0175 | 0.7113 | 85 | 0.0176 | | 0.0164 | 0.7531 | 90 | 0.0171 | | 0.0182 | 0.7950 | 95 | 0.0168 | | 0.019 | 0.8368 | 100 | 0.0167 | | 0.0163 | 0.8787 | 105 | 0.0158 | | 0.0145 | 0.9205 | 110 | 0.0158 | | 0.0165 | 0.9623 | 115 | 0.0155 | | 0.0205 | 1.0042 | 120 | 0.0152 | | 0.0105 | 1.0460 | 125 | 0.0155 | | 0.0147 | 1.0879 | 130 | 0.0157 | | 0.0148 | 1.1297 | 135 | 0.0160 | | 0.0115 | 1.1715 | 140 | 0.0153 | | 0.0166 | 1.2134 | 145 | 0.0153 | | 0.015 | 1.2552 | 150 | 0.0156 | | 0.0148 | 1.2971 | 155 | 0.0157 | | 0.0112 | 1.3389 | 160 | 0.0159 | | 0.0128 | 1.3808 | 165 | 0.0153 | | 0.0125 | 1.4226 | 170 | 0.0151 | | 0.0137 | 1.4644 | 175 | 0.0150 | | 0.0131 | 1.5063 | 180 | 0.0145 | | 0.0105 | 1.5481 | 185 | 0.0145 | | 0.0126 | 1.5900 | 190 | 0.0144 | | 0.0119 | 1.6318 | 195 | 0.0145 | | 0.016 | 1.6736 | 200 | 0.0147 | | 0.0143 | 1.7155 | 205 | 0.0150 | | 0.0139 | 1.7573 | 210 | 0.0150 | | 0.0139 | 1.7992 | 215 | 0.0145 | | 0.0161 | 1.8410 | 220 | 0.0143 | | 0.0098 | 1.8828 | 225 | 0.0138 | | 0.0108 | 1.9247 | 230 | 0.0140 | | 0.0117 | 1.9665 | 235 | 0.0141 | | 0.0109 | 2.0084 | 240 | 0.0138 | | 0.0093 | 2.0502 | 245 | 0.0145 | | 0.0102 | 2.0921 | 250 | 0.0143 | | 0.0104 | 2.1339 | 255 | 0.0141 | | 0.0108 | 2.1757 | 260 | 0.0147 | | 0.0104 | 2.2176 | 265 | 0.0142 | | 0.0103 | 2.2594 | 270 | 0.0144 | | 0.0107 | 2.3013 | 275 | 0.0144 | | 0.0104 | 2.3431 | 280 | 0.0141 | | 0.0092 | 2.3849 | 285 | 0.0143 | | 0.0107 | 2.4268 | 290 | 0.0140 | | 0.0112 | 2.4686 | 295 | 0.0143 | | 0.01 | 2.5105 | 300 | 0.0143 | | 0.0096 | 2.5523 | 305 | 0.0138 | | 0.0096 | 2.5941 | 310 | 0.0137 | | 0.0099 | 2.6360 | 315 | 0.0137 | | 0.009 | 2.6778 | 320 | 0.0138 | | 0.0097 | 2.7197 | 325 | 0.0137 | | 0.0097 | 2.7615 | 330 | 0.0136 | | 0.0108 | 2.8033 | 335 | 0.0136 | | 0.0092 | 2.8452 | 340 | 0.0132 | | 0.0092 | 2.8870 | 345 | 0.0132 | | 0.0095 | 2.9289 | 350 | 0.0130 | | 0.0094 | 2.9707 | 355 | 0.0127 | | 0.0088 | 3.0126 | 360 | 0.0127 | | 0.0086 | 3.0544 | 365 | 0.0131 | | 0.0094 | 3.0962 | 370 | 0.0134 | | 0.0075 | 3.1381 | 375 | 0.0137 | | 0.0068 | 3.1799 | 380 | 0.0136 | | 0.0096 | 3.2218 | 385 | 0.0136 | | 0.0088 | 3.2636 | 390 | 0.0137 | | 0.008 | 3.3054 | 395 | 0.0138 | | 0.0085 | 3.3473 | 400 | 0.0137 | | 0.0091 | 3.3891 | 405 | 0.0136 | | 0.0049 | 3.4310 | 410 | 0.0134 | | 0.0072 | 3.4728 | 415 | 0.0131 | | 0.0063 | 3.5146 | 420 | 0.0133 | | 0.0076 | 3.5565 | 425 | 0.0131 | | 0.0076 | 3.5983 | 430 | 0.0129 | | 0.0074 | 3.6402 | 435 | 0.0130 | | 0.0074 | 3.6820 | 440 | 0.0132 | | 0.0067 | 3.7238 | 445 | 0.0132 | | 0.0064 | 3.7657 | 450 | 0.0130 | | 0.0091 | 3.8075 | 455 | 0.0130 | | 0.0074 | 3.8494 | 460 | 0.0131 | | 0.0076 | 3.8912 | 465 | 0.0132 | | 0.007 | 3.9331 | 470 | 0.0132 | | 0.0082 | 3.9749 | 475 | 0.0132 | | 0.0059 | 4.0167 | 480 | 0.0133 | | 0.0066 | 4.0586 | 485 | 0.0135 | | 0.0063 | 4.1004 | 490 | 0.0140 | | 0.0059 | 4.1423 | 495 | 0.0144 | | 0.0066 | 4.1841 | 500 | 0.0142 | | 0.0055 | 4.2259 | 505 | 0.0142 | | 0.0067 | 4.2678 | 510 | 0.0142 | | 0.0065 | 4.3096 | 515 | 0.0143 | | 0.0062 | 4.3515 | 520 | 0.0142 | | 0.0065 | 4.3933 | 525 | 0.0141 | | 0.007 | 4.4351 | 530 | 0.0139 | | 0.0058 | 4.4770 | 535 | 0.0139 | | 0.0056 | 4.5188 | 540 | 0.0139 | | 0.0062 | 4.5607 | 545 | 0.0139 | | 0.0061 | 4.6025 | 550 | 0.0139 | | 0.0061 | 4.6444 | 555 | 0.0139 | | 0.0068 | 4.6862 | 560 | 0.0138 | | 0.0069 | 4.7280 | 565 | 0.0139 | | 0.0063 | 4.7699 | 570 | 0.0139 | | 0.0065 | 4.8117 | 575 | 0.0139 | | 0.0064 | 4.8536 | 580 | 0.0139 | | 0.0062 | 4.8954 | 585 | 0.0139 | | 0.0065 | 4.9372 | 590 | 0.0139 | | 0.0055 | 4.9791 | 595 | 0.0139 | ### Framework versions - PEFT 0.12.0 - Transformers 4.46.1 - Pytorch 2.4.0+cu121 - Datasets 3.1.0 - Tokenizers 0.20.3