ASM-FT Model Card

Model details

Model type: ASM is a unified vision-language foundation model for open-world panoptic visual recognition and understanding. Aligning with LLMs, it supports versatile generation tasks, demonstrating impressive region comprehension capability.

Model date: ASM was trained in July 2023.

Paper or resources for more information: https://github.com/OpenGVLab/all-seeing

License

ASM is open-sourced under the Apache License 2.0.

Where to send questions or comments about the model: https://github.com/OpenGVLab/all-seeing/issues

Intended use

Primary intended uses: The primary use of ASM is research on large multimodal models and chatbots.

Primary intended users: The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.

Training dataset

The pretrain phase employs AS-1B and Laion-COCO.

The finetuning phase employs AS-Core, RefCOCOg, VG, LLaVA-150K, COCO Caption, TextCaps, VQAv2, and GQA.

Evaluation dataset

A collection of 4 benchmarks, including 2 image captioning benchmarks, and 2 region captioning benchmarks.

Downloads last month
11
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Collection including OpenGVLab/ASM-FT