palige_original_lora_256_epo_12_projectyFalse_stop_addAnswer_RandomHoriz

This model is a fine-tuned version of google/paligemma-3b-pt-224 on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 10
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 40
optimizer: Use OptimizerNames.ADAMW_HF with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 2
num_epochs: 12

Training Loss	Epoch	Step	Validation Loss
5.6085	0.3125	100	3.2640
2.8826	0.625	200	2.2101
2.2473	0.9375	300	1.8418
2.0435	1.25	400	1.6720
1.9624	1.5625	500	1.5790
1.8529	1.875	600	1.5086
1.7503	2.1875	700	1.4470
1.7355	2.5	800	1.3995
1.6527	2.8125	900	1.3512
1.4952	3.125	1000	1.3288
1.5409	3.4375	1100	1.3100
1.5392	3.75	1200	1.3230
1.4226	4.0625	1300	1.2632
1.4012	4.375	1400	1.2446
1.3743	4.6875	1500	1.2588
1.3847	5.0	1600	1.2155
1.2789	5.3125	1700	1.2231
1.2922	5.625	1800	1.2446
1.287	5.9375	1900	1.2017
1.2198	6.25	2000	1.2427
1.2159	6.5625	2100	1.1734
1.2539	6.875	2200	1.1791
1.1409	7.1875	2300	1.1934
1.1515	7.5	2400	1.2092
1.1247	7.8125	2500	1.1974
1.105	8.125	2600	1.2207
1.0672	8.4375	2700	1.2250
1.0948	8.75	2800	1.1644
1.0782	9.0625	2900	1.1800
1.0119	9.375	3000	1.1500
1.0223	9.6875	3100	1.2175
1.0268	10.0	3200	1.1652
0.9807	10.3125	3300	1.1436
0.9774	10.625	3400	1.1823
0.9952	10.9375	3500	1.1792
0.9386	11.25	3600	1.2162
0.9539	11.5625	3700	1.1952
0.9249	11.875	3800	1.1370