Visor - Natural language Anime Tagging

Visor is a natural-language-based image tagging model based on the BLIP model architecture.

Potential Use cases can be to caption anime images for training diffusion models

Downloads last month: 26

Safetensors

Model size

470M params

Tensor type

BF16

Inference Examples

Image-to-Text

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

shadowlilac
/

visor

Visor - Natural language Anime Tagging

Space using shadowlilac/visor 1