laicsiifes
/

swin-gportuguese-2

vision-encoder-decoder

image-text-to-text

Inference Endpoints

Model card Files Files and versions Community

gabrielmotablima commited on Sep 2, 2024

Commit

4ca27c2

·

verified ·

1 Parent(s): 83a35d1

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -23,7 +23,7 @@ at resolution 224x224 and max sequence length of 1024 tokens.
 ## 🤖 Model Description
 The Swin-GPorTuguese-2 is a type of Vision Encoder Decoder which leverage the checkpoints of the [Swin Transformer](https://huggingface.co/microsoft/swin-base-patch4-window7-224)
-as encoder and the checkpoints of the [GPorTuguese-2](pierreguillou/gpt2-small-portuguese) as decoder.
 The encoder checkpoints come from Swin Trasnformer version pre-trained on ImageNet-1k at resolution 224x224.
 The code used for training and evaluation is available at: https://github.com/laicsiifes/ved-transformer-caption-ptbr. In this work, Swin-GPorTuguese-2

 ## 🤖 Model Description
 The Swin-GPorTuguese-2 is a type of Vision Encoder Decoder which leverage the checkpoints of the [Swin Transformer](https://huggingface.co/microsoft/swin-base-patch4-window7-224)
+as encoder and the checkpoints of the [GPorTuguese-2](https://huggingface.co/pierreguillou/gpt2-small-portuguese) as decoder.
 The encoder checkpoints come from Swin Trasnformer version pre-trained on ImageNet-1k at resolution 224x224.
 The code used for training and evaluation is available at: https://github.com/laicsiifes/ved-transformer-caption-ptbr. In this work, Swin-GPorTuguese-2