Image-Text-to-Text
Transformers
Safetensors
English
MLLM
Inference Endpoints
Parrot-7B / README.md
liyang's picture
Update README.md
0069ff2 verified
metadata
license: apache-2.0
datasets:
  - AIDC-AI/Parrot-dataset
library_name: transformers
tags:
  - MLLM
pipeline_tag: image-text-to-text
language:
  - en

Model Card

Parrot is a multi-language and multi-modal large language model capable of achieving excellent performance. For a comprehensive introduction, please refer to Parrot Paper and Parrot GitHub.

Model Details

Performance

Usage

Below is a code snippet to run Parrot with multimodal inputs. For additional usage instructions, including inference wrapper and Gradio UI, please refer to Parrot GitHub.

pip install torch==2.1.2 transformers==4.43.2 pillow==10.3.0
import torch
from PIL import Image
from transformers import AutoModelForCausalLM

Citation

If you find Parrot useful, please cite the paper

@article{sun2024parrot,
  title={Parrot: Multilingual Visual Instruction Tuning},
  author={Sun, Hai-Long and Zhou, Da-Wei and Li, Yang and Lu, Shiyin and Yi, Chao and Chen, Qing-Guo and Xu, Zhao and Luo, Weihua and Zhang, Kaifu and Zhan, De-Chuan and others},
  journal={arXiv preprint arXiv:2406.02539},
  year={2024}
}

License

The project is licensed under Apache License Version 2.0 and is restricted to uses that comply with the license agreements of Qwen and Clip.