|
--- |
|
library_name: transformers |
|
metrics: |
|
- accuracy |
|
base_model: |
|
- microsoft/swin-large-patch4-window12-384-in22k |
|
license: apache-2.0 |
|
tags: |
|
- vision |
|
- image-classification |
|
- vt |
|
model-index: |
|
- name: cub-200-bird-classifier-swin |
|
results: |
|
- task: |
|
type: image-classification |
|
name: Image Classification |
|
dataset: |
|
name: cub-200-2011 |
|
type: cub-200-2011 |
|
args: default |
|
metrics: |
|
- type: accuracy |
|
value: 0.8653 |
|
name: validation_accuracy |
|
- type: accuracy |
|
value: 0.8795 |
|
name: test_accuracy |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/624d888b0ce29222ad64c3d6/X7cXpayiKgUCUycIen22S.png) |
|
|
|
|
|
### Model Description |
|
|
|
|
|
This model was finetuned for the "Feather in Focus!" Kaggle competition of the Information Studies Master's Applied Machine Learning course at the University of Amsterdam. |
|
The goal of the competition was to apply novel approaches to achieve the highest possible accuracy on a bird classification task with 200 classes. |
|
We were given a labeled dataset of 3926 images and an unlabeled dataset of 4000 test images. |
|
Out of 32 groups and 1083 submissions, we achieved the #1 accuracy on the test set with a score of 0.87950. |
|
|
|
### Training Details |
|
The model we are finetuning, microsoft/swin-large-patch4-window12-384-in22k, was pre-trained on imagenet-21k, see https://huggingface.co./microsoft/swin-large-patch4-window12-384-in22k. |
|
|
|
#### Preprocessing |
|
|
|
Data augmentation was applied to the training data in a custom Torch dataset class. Because of the size of the dataset, images were not replaced but were duplicated and augmented. |
|
The only augmentations applied were HorizontalFlips and Rotations (10 degrees) to align with the relatively homogeneous dataset. |
|
|
|
# Finetuning |
|
Finetuning was done on some 50 different models including different VTs and CNNs. All models were trained for 10 epochs with the best model, based on the evaluation acccuracy, |
|
saved every epoch. |
|
### Finetuning Data |
|
|
|
The finetuning data is a subset of the cub-200-2011 dataset, http://www.vision.caltech.edu/datasets/cub_200_2011/. |
|
We finetuned the model on 3533 samples of the labeled dataset we were given, stratified on the label (7066 including augmented images). |
|
|
|
|
|
#### Finetuning Hyperparameters |
|
|
|
| Hyperparameter | Value | |
|
|-----------------------|----------------------------| |
|
| Optimizer | AdamW | |
|
| Learning Rate | 1e-4 | |
|
| Batch Size | 32 | |
|
| Epochs | 2 | |
|
| Weight Decay | * | |
|
| Class Weight | * | |
|
| Label Smoothing | * | |
|
| Scheduler | * | |
|
| Mixed Precision | Torch AMP | |
|
|
|
|
|
*parameters were intentionally not set because of poor results |
|
|
|
### Evaluation Data |
|
The evaluation data is a subset of the cub-200-2011 dataset, http://www.vision.caltech.edu/datasets/cub_200_2011/. |
|
We evaluated the model on 393 samples of the labeled dataset we were given, stratified on the label. |
|
|
|
#### Testing Data |
|
|
|
The testing data is a subset of an unlabeled subset of the cub-200-2011 dataset, http://www.vision.caltech.edu/datasets/cub_200_2011/ of 4000 images. After model finetuning |
|
the best model, based on the evaluation data, would be loaded. This model would then be used to predict the labels of the unlabeled test set. |
|
These predicted labels were submitted to the Kaggle competition via CSV which returned the test accuracy. |
|
|
|
### Poster |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/624d888b0ce29222ad64c3d6/XbH4M6aL8iE4Hy75xfaHc.png) |
|
|
|
*novel approaches were not applied when finetuning the final model as they did not improve accuracy. |