--- library_name: transformers license: cc-by-nc-4.0 datasets: - TheFusion21/PokemonCards language: - en pipeline_tag: image-to-text --- ## Model Details ### Model Description - **Developed by:** [https://huggingface.co./Mit1208] - **Finetuned from model:** [microsoft/kosmos-2-patch14-224] ## Training Details https://github.com/mit1280/fined-tuning/blob/main/Kosmos_2_fine_tune_PokemonCards_trl.ipynb ## Inference Details https://github.com/mit1280/fined-tuning/blob/main/kosmos2_fine_tuned_inference.ipynb ### How to Use ```python from transformers import AutoProcessor, Kosmos2ForConditionalGeneration import torch from io import BytesIO import requests from PIL import Image processor = AutoProcessor.from_pretrained("microsoft/kosmos-2-patch14-224") my_model = Kosmos2ForConditionalGeneration.from_pretrained("Mit1208/Kosmos-2-PokemonCards-trl-merged", device_map="auto",low_cpu_mem_usage=True) # load image image_url = "https://images.pokemontcg.io/sm9/24_hires.png" response = requests.get(image_url) # Read the image from the response content image = Image.open(BytesIO(response.content)) prompt = "Pokemon name is" inputs = processor(text=prompt, images=image, return_tensors="pt").to("cuda:0") with torch.no_grad(): # autoregressively generate completion generated_ids = my_model.generate(**inputs, max_new_tokens=30,) # convert generated token IDs back to strings generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] print(generated_text.split("")[-1].split(" and")[0] + ".") ''' Output: Pokemon name is Wartortle. ''' ``` ### Limitation This model was fine-tuned using free colab version so only used 300 samples in training for **85** epochs. Model is hallucinating very frequently so need to do post-processing. Another approach to handle this issue is update training data - use conversation data *and/or* update tokenizer padding token to tokenizer eos token.