PyTorch
clip
CLIP
LLM2CLIP
custom_code

how to infer text-img pair demo?

#2
by WinstonDeng - opened

Using openai official text model, text embedding dim is 768, mismatching with llm2clip img embedding dim 1280.

text_model = CLIPModel.from_pretrained("openai/clip-vit-large-patch14-336")
inputs = tokenizer(text=texts, padding=True, return_tensors="pt").to(device)
text_features = text_model.get_text_features(**inputs) # [1, 768]

Sign up or log in to comment