llama_1b_step2_batch_grad_v4

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 40
seed: 42
gradient_accumulation_steps: 12
total_train_batch_size: 96
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 2

Training Loss	Epoch	Step	Validation Loss
0.6188	0.4090	50	0.6047
0.418	0.8180	100	0.4198
0.2486	1.2270	150	0.3708
0.244	1.6360	200	0.3535