nielsr HF staff commited on
Commit
90bfc73
·
1 Parent(s): 212b600
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -1,5 +1,8 @@
1
  ---
2
  license: apache-2.0
 
 
 
3
  datasets:
4
  - cifar10
5
  - cifar100
@@ -10,7 +13,7 @@ datasets:
10
  - vtab
11
  ---
12
 
13
- # Vision Transformer base model
14
 
15
  Vision Transformer (ViT) model pre-trained on ImageNet-21k (14 million images, 21,843 classes) at resolution 224x224, and fine-tuned on ImageNet 2012 (1 million images, 1,000 classes) at resolution 384x384. It was introduced in the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Dosovitskiy et al. and first released in [this repository](https://github.com/google-research/vision_transformer). However, the weights were converted from the [timm repository](https://github.com/rwightman/pytorch-image-models) by Ross Wightman, who already converted the weights from JAX to PyTorch. Credits go to him.
16
 
 
1
  ---
2
  license: apache-2.0
3
+ tags:
4
+ - image-classification
5
+ - timm
6
  datasets:
7
  - cifar10
8
  - cifar100
 
13
  - vtab
14
  ---
15
 
16
+ # Vision Transformer (base-sized model)
17
 
18
  Vision Transformer (ViT) model pre-trained on ImageNet-21k (14 million images, 21,843 classes) at resolution 224x224, and fine-tuned on ImageNet 2012 (1 million images, 1,000 classes) at resolution 384x384. It was introduced in the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Dosovitskiy et al. and first released in [this repository](https://github.com/google-research/vision_transformer). However, the weights were converted from the [timm repository](https://github.com/rwightman/pytorch-image-models) by Ross Wightman, who already converted the weights from JAX to PyTorch. Credits go to him.
19