paligemma_vqav2_warnup

This model is a fine-tuned version of google/paligemma-3b-pt-224 on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 10
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 40
optimizer: Use OptimizerNames.ADAMW_HF with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1000
num_epochs: 12

Training Loss	Epoch	Step	Validation Loss
7.0325	0.3125	100	7.1622
6.5146	0.625	200	5.6221
4.4912	0.9375	300	3.4696
3.2824	1.25	400	2.6386
2.6828	1.5625	500	2.1185
2.2276	1.875	600	1.7020
1.9293	2.1875	700	1.4613
1.7957	2.5	800	1.3059
1.6281	2.8125	900	1.1815
1.4253	3.125	1000	1.1008
1.4404	3.4375	1100	1.0513
1.4239	3.75	1200	1.0156
1.2985	4.0625	1300	0.9709
1.243	4.375	1400	0.9393
1.2105	4.6875	1500	0.9237
1.2035	5.0	1600	0.9088
1.0704	5.3125	1700	0.8927
1.0891	5.625	1800	0.8735
1.0861	5.9375	1900	0.8598
0.9853	6.25	2000	0.8530
0.9866	6.5625	2100	0.8392
1.0206	6.875	2200	0.8399
0.8914	7.1875	2300	0.8293
0.9062	7.5	2400	0.8325
0.8579	7.8125	2500	0.8147
0.828	8.125	2600	0.8267
0.7969	8.4375	2700	0.8321
0.8175	8.75	2800	0.8179
0.7948	9.0625	2900	0.8356
0.7221	9.375	3000	0.8104
0.7124	9.6875	3100	0.8266
0.7199	10.0	3200	0.8143
0.6601	10.3125	3300	0.8399
0.6517	10.625	3400	0.8415
0.6509	10.9375	3500	0.8221
0.5981	11.25	3600	0.8582
0.6011	11.5625	3700	0.8481
0.5756	11.875	3800	0.8534