--- language: - fa library_name: hezar tags: - image-to-text - hezar metrics: - wer pipeline_tag: image-to-text datasets: - hezarai/flickr30k-fa --- A Persian image captioning model constructed from a ViT + GPT2 architecture trained on [flickr30k-fa](https://www.kaggle.com/datasets/sajjadayobi360/flickrfa) (created by Sajjad Ayoubi). The encoder (ViT) was initialized from https://huggingface.co./google/vit-base-patch16-224 and the decoder (GPT2) was initialized from https://huggingface.co./HooshvareLab/gpt2-fa . ## Usage ``` pip install hezar ``` ```python from hezar.models import Model model = Model.load("hezarai/vit-gpt2-fa-image-captioning-flickr30k") captions = model.predict("example_image.jpg") print(captions) ```