File size: 1,974 Bytes
fe29c5d abba15c fe29c5d abba15c fe29c5d abba15c fe29c5d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
---
datasets:
- imagenet-1k
library_name: transformers
pipeline_tag: image-classification
---
# SwiftFormer (swiftformer-s)
## Model description
The SwiftFormer model was proposed in [SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications](https://arxiv.org/abs/2303.15446) by Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan.
SwiftFormer paper introduces a novel efficient additive attention mechanism that effectively replaces the quadratic matrix multiplication operations in the self-attention computation with linear element-wise multiplications. A series of models called 'SwiftFormer' is built based on this, which achieves state-of-the-art performance in terms of both accuracy and mobile inference speed. Even their small variant achieves 78.5% top-1 ImageNet1K accuracy with only 0.8 ms latency on iPhone 14, which is more accurate and 2× faster compared to MobileViT-v2.
## Intended uses & limitations
## How to use
import requests
from PIL import Image
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)
from transformers import ViTImageProcessor
processor = ViTImageProcessor.from_pretrained('shehan97/swiftformer-s')
inputs = processor(images=image, return_tensors="pt")
from transformers.models.swiftformer import SwiftFormerForImageClassification
new_model = SwiftFormerForImageClassification.from_pretrained('shehan97/swiftformer-s')
output = new_model(inputs['pixel_values'], output_hidden_states=True)
logits = output.logits
predicted_class_idx = logits.argmax(-1).item()
print("Predicted class:", new_model.config.id2label[predicted_class_idx])
## Limitations and bias
## Training data
The classification model is trained on the ImageNet-1K dataset.
## Training procedure
## Evaluation results
|