0f5d021e-a052-4299-81ee-2bb9522213bc

This model is a fine-tuned version of EleutherAI/pythia-14m on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.000217
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 50
training_steps: 500

Training Loss	Epoch	Step	Validation Loss
No log	0.0001	1	9.0324
15.6705	0.0048	50	8.4389
15.5777	0.0097	100	7.8442
16.1316	0.0145	150	8.3373
19.6493	0.0194	200	10.1151
14.9912	0.0242	250	7.2463
16.2995	0.0291	300	8.8719
14.4535	0.0339	350	6.8123
14.8802	0.0388	400	8.5803
14.6041	0.0436	450	6.8007
14.6507	0.0485	500	6.7863