llm3br256

This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on the centime dataset. It achieves the following results on the evaluation set:

Loss: 0.0123

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5.0

Training results

Training Loss	Epoch	Step	Validation Loss
0.0612	0.0449	5	0.0560
0.0411	0.0898	10	0.0357
0.0353	0.1347	15	0.0301
0.0286	0.1796	20	0.0264
0.0282	0.2245	25	0.0239
0.0223	0.2694	30	0.0224
0.0242	0.3143	35	0.0209
0.0211	0.3591	40	0.0203
0.0178	0.4040	45	0.0201
0.0206	0.4489	50	0.0196
0.0196	0.4938	55	0.0193
0.0173	0.5387	60	0.0193
0.0184	0.5836	65	0.0193
0.0194	0.6285	70	0.0191
0.0182	0.6734	75	0.0185
0.0169	0.7183	80	0.0183
0.0176	0.7632	85	0.0178
0.0158	0.8081	90	0.0176
0.02	0.8530	95	0.0172
0.0165	0.8979	100	0.0173
0.0181	0.9428	105	0.0168
0.0176	0.9877	110	0.0168
0.0184	1.0348	115	0.0183
0.0162	1.0797	120	0.0179
0.017	1.1246	125	0.0168
0.0143	1.1695	130	0.0167
0.0177	1.2144	135	0.0166
0.0138	1.2593	140	0.0161
0.0149	1.3042	145	0.0157
0.0162	1.3490	150	0.0160
0.0148	1.3939	155	0.0156
0.0168	1.4388	160	0.0154
0.0148	1.4837	165	0.0153
0.0146	1.5286	170	0.0154
0.0137	1.5735	175	0.0150
0.0144	1.6184	180	0.0150
0.0129	1.6633	185	0.0148
0.0139	1.7082	190	0.0145
0.013	1.7531	195	0.0145
0.013	1.7980	200	0.0144
0.0124	1.8429	205	0.0144
0.0135	1.8878	210	0.0143
0.0128	1.9327	215	0.0147
0.0149	1.9776	220	0.0143
0.0138	2.0247	225	0.0144
0.0127	2.0696	230	0.0143
0.0116	2.1145	235	0.0142
0.0128	2.1594	240	0.0143
0.0145	2.2043	245	0.0141
0.0147	2.2492	250	0.0139
0.0114	2.2941	255	0.0139
0.0114	2.3389	260	0.0139
0.0112	2.3838	265	0.0137
0.0105	2.4287	270	0.0138
0.0129	2.4736	275	0.0136
0.014	2.5185	280	0.0135
0.0124	2.5634	285	0.0136
0.0128	2.6083	290	0.0133
0.0106	2.6532	295	0.0129
0.0099	2.6981	300	0.0129
0.0111	2.7430	305	0.0129
0.0129	2.7879	310	0.0129
0.0088	2.8328	315	0.0129
0.0092	2.8777	320	0.0130
0.0086	2.9226	325	0.0129
0.0132	2.9675	330	0.0126
0.0126	3.0146	335	0.0130
0.0117	3.0595	340	0.0133
0.0102	3.1044	345	0.0132
0.0074	3.1493	350	0.0132
0.0105	3.1942	355	0.0129
0.0117	3.2391	360	0.0129
0.0107	3.2840	365	0.0127
0.0098	3.3288	370	0.0128
0.0092	3.3737	375	0.0127
0.0114	3.4186	380	0.0126
0.0118	3.4635	385	0.0125
0.0108	3.5084	390	0.0123
0.0092	3.5533	395	0.0123
0.0085	3.5982	400	0.0123
0.0088	3.6431	405	0.0126
0.0095	3.6880	410	0.0124
0.0072	3.7329	415	0.0124
0.0105	3.7778	420	0.0123
0.0115	3.8227	425	0.0122
0.007	3.8676	430	0.0121
0.0112	3.9125	435	0.0121
0.0103	3.9574	440	0.0121
0.0162	4.0045	445	0.0122
0.0079	4.0494	450	0.0125
0.0102	4.0943	455	0.0126
0.0087	4.1392	460	0.0126
0.0107	4.1841	465	0.0126
0.0105	4.2290	470	0.0125
0.0089	4.2738	475	0.0124
0.0061	4.3187	480	0.0125
0.0074	4.3636	485	0.0126
0.008	4.4085	490	0.0126
0.0092	4.4534	495	0.0125
0.0092	4.4983	500	0.0125
0.0061	4.5432	505	0.0124
0.0089	4.5881	510	0.0124
0.01	4.6330	515	0.0124
0.0081	4.6779	520	0.0124
0.0072	4.7228	525	0.0124
0.0078	4.7677	530	0.0124
0.009	4.8126	535	0.0124
0.0106	4.8575	540	0.0124
0.0079	4.9024	545	0.0124
0.0082	4.9473	550	0.0124
0.0082	4.9921	555	0.0124

Framework versions

PEFT 0.12.0
Transformers 4.46.1
Pytorch 2.4.0+cu121
Datasets 3.1.0
Tokenizers 0.20.3

sizhkhy
/

centime

llm3br256

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for sizhkhy/centime

Evaluation results