Model Card

Model Details

Architecture: ViT-Large with patch size 14
Training Data: stanford cars dataset

Training Details

Adam Optimizer with a constant learning rate 1e-5 for 4000 steps training (batch_size=32). Only the vision encoder is fine-tuned.

Evaluation Results

pre-trained: 0.7770098447799683
fine-tuned: 0.92734694480896

Downloads last month: 448

Safetensors

Model size

303M params

Tensor type

F32

Inference Providers NEW

Feature Extraction

This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for tanganke/clip-vit-large-patch14_stanford-cars

Base model

openai/clip-vit-large-patch14

Finetuned

(53)

this model

Dataset used to train tanganke/clip-vit-large-patch14_stanford-cars

Collection including tanganke/clip-vit-large-patch14_stanford-cars

CLIP-ViT-L/14 on the eight image classification tasks

Collection

if you find these models helpful, consider citing [our paper](https://arxiv.org/abs/2406.03280) • 9 items • Updated Aug 27, 2024