Category Search from External Databases (CaSED)

Disclaimer: The model card is taken and modified from the official repository, which can be found here. The paper can be found here.

Intended uses & limitations

You can use the model for vocabulary-free image classification, i.e. classification with CLIP-like models without a pre-defined list of class names.

How to use

Here is how to use this model:

import requests
from PIL import Image
from transformers import AutoModel, CLIPProcessor

# download an image from the internet
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

# load the model and the processor
model = AutoModel.from_pretrained("altndrr/cased", trust_remote_code=True)
processor = CLIPProcessor.from_pretrained("openai/clip-vit-large-patch14")

# get the model outputs
images = processor(images=[image], return_tensors="pt", padding=True)
outputs = model(images, alpha=0.7)
labels, scores = outputs["vocabularies"][0], outputs["scores"][0]

# print the top 5 most likely labels for the image
values, indices = scores.topk(3)
print("\nTop predictions:\n")
for value, index in zip(values, indices):
    print(f"{labels[index]:>16s}: {100 * value.item():.2f}%")

The model depends on some libraries you have to install manually before execution:

pip install torch faiss-cpu flair inflect nltk pyarrow transformers

Citation

@article{conti2023vocabularyfree,
      title={Vocabulary-free Image Classification},
      author={Alessandro Conti and Enrico Fini and Massimiliano Mancini and Paolo Rota and Yiming Wang and Elisa Ricci},
      year={2023},
      journal={NeurIPS},
}
Downloads last month
150
Safetensors
Model size
428M params
Tensor type
I64
ยท
F32
ยท
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model authors have turned it off explicitly.

Space using altndrr/cased 1