llama_3b_step2_batch_v4

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 3e-05
train_batch_size: 6
eval_batch_size: 40
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 24
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 2

Training Loss	Epoch	Step	Validation Loss
0.897	0.1022	50	0.8837
0.6522	0.2045	100	0.6790
0.4811	0.3067	150	0.5709
0.4821	0.4090	200	0.4997
0.4577	0.5112	250	0.4469
0.5176	0.6135	300	0.4128
0.3274	0.7157	350	0.3879
0.3363	0.8180	400	0.3672
0.3491	0.9202	450	0.3495
0.2589	1.0225	500	0.3429
0.2783	1.1247	550	0.3360
0.1976	1.2270	600	0.3314
0.2332	1.3292	650	0.3274
0.2571	1.4315	700	0.3237
0.2596	1.5337	750	0.3210
0.2918	1.6360	800	0.3169
0.1956	1.7382	850	0.3167
0.2186	1.8405	900	0.3159
0.2477	1.9427	950	0.3157