metadata

base_model: google/paligemma-3b-pt-224
library_name: peft
license: gemma
tags:
  - generated_from_trainer
model-index:
  - name: palige_original_lora_4_epo_12
    results: []

palige_original_lora_4_epo_12

This model is a fine-tuned version of google/paligemma-3b-pt-224 on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.8228

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 10
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 40
optimizer: Use OptimizerNames.ADAMW_HF with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 2
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss
4.1549	0.3125	100	2.3889
2.1833	0.625	200	1.5255
1.7497	0.9375	300	1.2796
1.5795	1.25	400	1.1571
1.5347	1.5625	500	1.0845
1.4316	1.875	600	1.0333
1.3262	2.1875	700	0.9880
1.316	2.5	800	0.9567
1.2616	2.8125	900	0.9213
1.1021	3.125	1000	0.9030
1.1571	3.4375	1100	0.8888
1.1552	3.75	1200	0.8808
1.0565	4.0625	1300	0.8556
1.0083	4.375	1400	0.8517
1.0044	4.6875	1500	0.8358
0.9978	5.0	1600	0.8301
0.8846	5.3125	1700	0.8302
0.8989	5.625	1800	0.8113
0.9068	5.9375	1900	0.8169
0.8205	6.25	2000	0.8218
0.8175	6.5625	2100	0.8142
0.854	6.875	2200	0.8109
0.7448	7.1875	2300	0.8105
0.7399	7.5	2400	0.8207
0.7113	7.8125	2500	0.8008
0.688	8.125	2600	0.8174
0.6532	8.4375	2700	0.8210
0.6666	8.75	2800	0.8139
0.6447	9.0625	2900	0.8422
0.5704	9.375	3000	0.8392
0.5682	9.6875	3100	0.8321
0.5814	10.0	3200	0.8228

Framework versions

PEFT 0.13.0
Transformers 4.46.0.dev0
Pytorch 2.4.1+cu121
Datasets 3.0.1
Tokenizers 0.20.0