Visor is a natural-language-based image tagging model based on the BLIP model architecture.
Potential Use cases can be to caption anime images for training diffusion models