llm3br256

This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on the asianpaints dataset. It achieves the following results on the evaluation set:

Loss: 0.0114

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5.0

Training results

Training Loss	Epoch	Step	Validation Loss
0.0567	0.0460	5	0.0584
0.0378	0.0920	10	0.0384
0.0301	0.1379	15	0.0318
0.0248	0.1839	20	0.0281
0.0241	0.2299	25	0.0256
0.021	0.2759	30	0.0234
0.0213	0.3218	35	0.0225
0.0211	0.3678	40	0.0214
0.0185	0.4138	45	0.0200
0.0162	0.4598	50	0.0196
0.0177	0.5057	55	0.0189
0.0168	0.5517	60	0.0184
0.017	0.5977	65	0.0182
0.0143	0.6437	70	0.0177
0.0143	0.6897	75	0.0176
0.0155	0.7356	80	0.0176
0.0162	0.7816	85	0.0169
0.0164	0.8276	90	0.0164
0.0154	0.8736	95	0.0162
0.0164	0.9195	100	0.0159
0.0156	0.9655	105	0.0160
0.0145	1.0115	110	0.0159
0.0133	1.0575	115	0.0156
0.0126	1.1034	120	0.0155
0.0145	1.1494	125	0.0154
0.0125	1.1954	130	0.0150
0.0122	1.2414	135	0.0148
0.0127	1.2874	140	0.0147
0.0139	1.3333	145	0.0144
0.0122	1.3793	150	0.0144
0.0138	1.4253	155	0.0139
0.0143	1.4713	160	0.0139
0.0124	1.5172	165	0.0138
0.0124	1.5632	170	0.0135
0.0138	1.6092	175	0.0132
0.0112	1.6552	180	0.0136
0.0102	1.7011	185	0.0135
0.0135	1.7471	190	0.0133
0.01	1.7931	195	0.0135
0.0115	1.8391	200	0.0131
0.0113	1.8851	205	0.0127
0.0107	1.9310	210	0.0128
0.0122	1.9770	215	0.0128
0.0099	2.0230	220	0.0128
0.0121	2.0690	225	0.0129
0.0103	2.1149	230	0.0128
0.01	2.1609	235	0.0127
0.0089	2.2069	240	0.0127
0.0089	2.2529	245	0.0127
0.0105	2.2989	250	0.0125
0.0093	2.3448	255	0.0124
0.0097	2.3908	260	0.0126
0.0091	2.4368	265	0.0126
0.0095	2.4828	270	0.0124
0.0094	2.5287	275	0.0123
0.0092	2.5747	280	0.0119
0.0084	2.6207	285	0.0121
0.0098	2.6667	290	0.0120
0.0097	2.7126	295	0.0122
0.0093	2.7586	300	0.0121
0.0096	2.8046	305	0.0119
0.0097	2.8506	310	0.0117
0.0101	2.8966	315	0.0118
0.0088	2.9425	320	0.0118
0.0096	2.9885	325	0.0118
0.0078	3.0345	330	0.0119
0.0064	3.0805	335	0.0119
0.0073	3.1264	340	0.0121
0.0066	3.1724	345	0.0121
0.0067	3.2184	350	0.0117
0.007	3.2644	355	0.0118
0.0072	3.3103	360	0.0116
0.0074	3.3563	365	0.0117
0.0067	3.4023	370	0.0117
0.0072	3.4483	375	0.0117
0.0069	3.4943	380	0.0117
0.0076	3.5402	385	0.0116
0.0068	3.5862	390	0.0114
0.0074	3.6322	395	0.0115
0.0065	3.6782	400	0.0114
0.007	3.7241	405	0.0112
0.0064	3.7701	410	0.0112
0.0073	3.8161	415	0.0111
0.0065	3.8621	420	0.0113
0.0069	3.9080	425	0.0111
0.0065	3.9540	430	0.0111
0.0076	4.0	435	0.0111
0.0047	4.0460	440	0.0115
0.0053	4.0920	445	0.0119
0.0053	4.1379	450	0.0120
0.0055	4.1839	455	0.0119
0.0053	4.2299	460	0.0117
0.0053	4.2759	465	0.0117
0.0053	4.3218	470	0.0117
0.0058	4.3678	475	0.0116
0.0053	4.4138	480	0.0116
0.0053	4.4598	485	0.0118
0.0051	4.5057	490	0.0117
0.0053	4.5517	495	0.0117
0.0059	4.5977	500	0.0117
0.0055	4.6437	505	0.0117
0.0054	4.6897	510	0.0116
0.0055	4.7356	515	0.0117
0.0056	4.7816	520	0.0116
0.0048	4.8276	525	0.0116
0.0049	4.8736	530	0.0116
0.0043	4.9195	535	0.0116
0.0046	4.9655	540	0.0116

Framework versions

PEFT 0.12.0
Transformers 4.46.1
Pytorch 2.4.0+cu121
Datasets 3.1.0
Tokenizers 0.20.3

sizhkhy
/

asianpaints

llm3br256

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for sizhkhy/asianpaints

Evaluation results