Collection shoaib6174/video_swin_transformer/1
Collection of Video Swin Transformers feature extractor models.
Overview
This collection contains different Video Swin Transformer [1] models. The original model weights are provided from [2]. There were ported to Keras models
(tf.keras.Model
) and then serialized as TensorFlow SavedModels. The porting steps are available in [3].
About the models
These models can be directly used to extract features from videos. These models are accompanied by Colab Notebooks with fine-tuning steps for action-recognition task and video-classification.
The table below provides a performance summary:
model_name | pre-train dataset | fine-tune dataset | acc@1(%) | acc@5(%) |
---|---|---|---|---|
swin_tiny_patch244_window877_kinetics400_1k | ImageNet-1K | Kinetics 400(1k | 78.8 | 93.6 |
swin_small_patch244_window877_kinetics400_1k | ImageNet-1K | Kinetics 400(1k) | 80.6 | 94.5 |
swin_base_patch244_window877_kinetics400_1k | ImageNet-1K | Kinetics 400(1k) | 80.6 | 96.6 |
swin_base_patch244_window877_kinetics400_22k | ImageNet-12K | Kinetics 400(1k) | 82.7 | 95.5 |
swin_base_patch244_window877_kinetics600_22k | ImageNet-1K | Kinetics 600(1k) | 84.0 | 96.5 |
swin_base_patch244_window1677_sthv2 | Kinetics 400 | Something-Something V2 | 69.6 | 92.7 |
These scores for all the models are taken from [2].
Video Swin Transformer Feature extractors Models
- swin_tiny_patch244_window877_kinetics400_1k
- swin_small_patch244_window877_kinetics400_1k
- swin_base_patch244_window877_kinetics400_1k
- swin_base_patch244_window877_kinetics400_22k
- swin_base_patch244_window877_kinetics600_22k
- swin_base_patch244_window1677_sthv2
Notes
The input shape for these models are [None, 3, 32, 224, 224]
representing [batch_size, channels, frames, height, width]
. To create models with different input shape use this notebook.
References
[1] Video Swin Transformer Ze et al. [2] Video Swin Transformers GitHub [3] GSOC-22-Video-Swin-Transformers GitHub
Acknowledgements
- Downloads last month
- 9