metadata

tags:
  - generated_from_trainer
datasets:
  - coco
metrics:
  - rouge
  - bleu
model-index:
  - name: vit-swin-base-224-gpt2-image-captioning
    results: []

vit-swin-base-224-gpt2-image-captioning

This model is a fine-tuned version of on the coco dataset. It achieves the following results on the evaluation set:

Loss: 0.7923
Rouge1: 41.8451
Rouge2: 16.3493
Rougel: 38.0288
Rougelsum: 38.049
Bleu: 10.2776
Gen Len: 11.2946

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 64
eval_batch_size: 64
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Bleu	Gen Len
1.0018	0.38	2000	0.8860	38.6537	13.8145	35.3932	35.3935	8.2448	11.2946
0.8827	0.75	4000	0.8395	40.0458	14.8829	36.5321	36.5366	9.1169	11.2946
0.8378	1.13	6000	0.8140	41.2736	15.9576	37.5504	37.5512	9.871	11.2946
0.7913	1.51	8000	0.8012	41.6642	16.1987	37.8786	37.8891	10.0786	11.2946
0.7794	1.89	10000	0.7933	41.9119	16.3738	38.1062	38.1292	10.288	11.2946

Framework versions

Transformers 4.26.0
Pytorch 1.13.1+cu116
Datasets 2.9.0
Tokenizers 0.13.2