finetuned_paligemma_beans_vqa_final

This model is a fine-tuned version of google/paligemma-3b-pt-224 on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 4
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 16
optimizer: Use adamw_hf with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 2
num_epochs: 5

Training Loss	Epoch	Step	Validation Loss
0.2394	0.1530	500	0.2471
0.1979	0.3060	1000	0.1885
0.1526	0.4590	1500	0.1759
0.1458	0.6119	2000	0.1574
0.1337	0.7649	2500	0.1320
0.1381	0.9179	3000	0.1273
0.1119	1.0707	3500	0.1276
0.0939	1.2237	4000	0.1121
0.1133	1.3767	4500	0.0998
0.0974	1.5299	5000	0.0964
0.0889	1.6829	5500	0.0930
0.0808	1.8359	6000	0.0889
0.0743	1.9889	6500	0.0822
0.0746	2.1420	7000	0.0818
0.0686	2.2950	7500	0.0783
0.0734	2.4479	8000	0.0772
0.0702	2.6009	8500	0.0772
0.0661	2.7539	9000	0.0722
0.0603	2.9069	9500	0.0706
0.0505	3.0597	10000	0.0718
0.0467	3.2127	10500	0.0718
0.0469	3.3656	11000	0.0703
0.0473	3.5186	11500	0.0689
0.0389	3.6716	12000	0.0676
0.0452	3.8246	12500	0.0680
0.0487	3.9776	13000	0.0632
0.0324	4.1303	13500	0.0690
0.0259	4.2833	14000	0.0724
0.0386	4.4363	14500	0.0687