llama_3b_step2_batch_v1

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.5060

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 1
eval_batch_size: 40
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 4
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss
1.0531	0.0170	50	1.2007
1.0336	0.0341	100	1.1242
0.9428	0.0511	150	1.0800
1.4386	0.0682	200	1.0408
0.8375	0.0852	250	1.0127
0.9193	0.1023	300	0.9817
1.0368	0.1193	350	0.9573
1.2018	0.1364	400	0.9319
1.2749	0.1534	450	0.9072
0.9881	0.1704	500	0.8820
0.9707	0.1875	550	0.8599
1.2377	0.2045	600	0.8412
0.9024	0.2216	650	0.8180
0.5889	0.2386	700	0.8024
0.8046	0.2557	750	0.7899
0.83	0.2727	800	0.7710
0.6852	0.2898	850	0.7548
0.8512	0.3068	900	0.7422
0.8377	0.3238	950	0.7345
0.5361	0.3409	1000	0.7220
0.7696	0.3579	1050	0.7105
0.8175	0.3750	1100	0.7013
0.6144	0.3920	1150	0.6886
0.3598	0.4091	1200	0.6809
0.7176	0.4261	1250	0.6692
0.5281	0.4432	1300	0.6644
0.3555	0.4602	1350	0.6547
0.9024	0.4772	1400	0.6471
0.7713	0.4943	1450	0.6386
0.6172	0.5113	1500	0.6322
0.6325	0.5284	1550	0.6266
0.7503	0.5454	1600	0.6206
0.349	0.5625	1650	0.6136
0.7	0.5795	1700	0.6085
0.5014	0.5966	1750	0.6023
0.6441	0.6136	1800	0.5975
0.5066	0.6306	1850	0.5921
0.6036	0.6477	1900	0.5883
0.6549	0.6647	1950	0.5840
0.3903	0.6818	2000	0.5789
0.8864	0.6988	2050	0.5754
0.7164	0.7159	2100	0.5709
0.5504	0.7329	2150	0.5687
0.4216	0.7500	2200	0.5646
0.4241	0.7670	2250	0.5618
0.6452	0.7840	2300	0.5590
0.7067	0.8011	2350	0.5558
0.4536	0.8181	2400	0.5537
0.8657	0.8352	2450	0.5508
0.7452	0.8522	2500	0.5483
0.3444	0.8693	2550	0.5458
0.2889	0.8863	2600	0.5437
0.2415	0.9034	2650	0.5401
0.5393	0.9204	2700	0.5385
0.4866	0.9374	2750	0.5372
0.9233	0.9545	2800	0.5347
0.4623	0.9715	2850	0.5318
0.4211	0.9886	2900	0.5299
0.4308	1.0056	2950	0.5283
0.618	1.0227	3000	0.5285
0.7693	1.0397	3050	0.5262
0.2893	1.0568	3100	0.5266
0.461	1.0738	3150	0.5273
0.3648	1.0908	3200	0.5230
0.4981	1.1079	3250	0.5253
0.5005	1.1249	3300	0.5222
0.4117	1.1420	3350	0.5217
0.3319	1.1590	3400	0.5188
0.2549	1.1761	3450	0.5190
0.3758	1.1931	3500	0.5186
0.2889	1.2102	3550	0.5173
0.6341	1.2272	3600	0.5167
0.3217	1.2442	3650	0.5155
0.4406	1.2613	3700	0.5150
0.7445	1.2783	3750	0.5148
0.5511	1.2954	3800	0.5133
0.3933	1.3124	3850	0.5125
0.39	1.3295	3900	0.5134
0.3015	1.3465	3950	0.5126
0.8124	1.3636	4000	0.5118
0.6512	1.3806	4050	0.5111
0.7011	1.3976	4100	0.5106
0.4556	1.4147	4150	0.5103
0.4563	1.4317	4200	0.5100
0.2651	1.4488	4250	0.5100
0.5674	1.4658	4300	0.5090
0.2869	1.4829	4350	0.5093
0.5327	1.4999	4400	0.5088
0.726	1.5170	4450	0.5086
0.2619	1.5340	4500	0.5084
0.6597	1.5510	4550	0.5081
0.4848	1.5681	4600	0.5083
0.412	1.5851	4650	0.5080
0.6712	1.6022	4700	0.5077
0.5523	1.6192	4750	0.5076
0.5105	1.6363	4800	0.5077
0.5315	1.6533	4850	0.5071
0.4166	1.6704	4900	0.5069
0.4081	1.6874	4950	0.5065
0.3154	1.7044	5000	0.5063
0.396	1.7215	5050	0.5063
0.6121	1.7385	5100	0.5064
0.379	1.7556	5150	0.5063
0.4534	1.7726	5200	0.5061
0.5572	1.7897	5250	0.5060
0.3847	1.8067	5300	0.5059
0.3751	1.8238	5350	0.5060
0.4346	1.8408	5400	0.5061
0.4928	1.8578	5450	0.5061
0.5215	1.8749	5500	0.5060
0.6156	1.8919	5550	0.5060
0.4041	1.9090	5600	0.5060
0.5604	1.9260	5650	0.5059
0.424	1.9431	5700	0.5060
0.1856	1.9601	5750	0.5060
0.3701	1.9772	5800	0.5061
0.4201	1.9942	5850	0.5060

Framework versions

Transformers 4.46.1
Pytorch 2.1.0+cu118
Datasets 3.0.2
Tokenizers 0.20.1

danielgombas
/

llama_3b_step2_batch_v1

llama_3b_step2_batch_v1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results