--- library_name: transformers metrics: - accuracy base_model: - microsoft/swin-large-patch4-window12-384-in22k license: apache-2.0 tags: - vision - image-classification - vt model-index: - name: cub-200-bird-classifier-swin results: - task: type: image-classification name: Image Classification dataset: name: cub-200-2011 type: cub-200-2011 args: default metrics: - type: accuracy value: 0.8653 name: validation_accuracy - type: accuracy value: 0.8795 name: test_accuracy --- # Model Card for Model ID ![image/png](https://cdn-uploads.huggingface.co/production/uploads/624d888b0ce29222ad64c3d6/X7cXpayiKgUCUycIen22S.png) ### Model Description This model was finetuned for the "Feather in Focus!" Kaggle competition of the Information Studies Master's Applied Machine Learning course at the University of Amsterdam. The goal of the competition was to apply novel approaches to achieve the highest possible accuracy on a bird classification task with 200 classes. We were given a labeled dataset of 3926 images and an unlabeled dataset of 4000 test images. Out of 32 groups and 1083 submissions, we achieved the #1 accuracy on the test set with a score of 0.87950. ### Training Details The model we are finetuning, microsoft/swin-large-patch4-window12-384-in22k, was pre-trained on imagenet-21k, see https://huggingface.co./microsoft/swin-large-patch4-window12-384-in22k. #### Preprocessing Data augmentation was applied to the training data in a custom Torch dataset class. Because of the size of the dataset, images were not replaced but were duplicated and augmented. The only augmentations applied were HorizontalFlips and Rotations (10 degrees) to align with the relatively homogeneous dataset. # Finetuning Finetuning was done on some 50 different models including different VTs and CNNs. All models were trained for 10 epochs with the best model, based on the evaluation acccuracy, saved every epoch. ### Finetuning Data The finetuning data is a subset of the cub-200-2011 dataset, http://www.vision.caltech.edu/datasets/cub_200_2011/. We finetuned the model on 3533 samples of the labeled dataset we were given, stratified on the label (7066 including augmented images). #### Finetuning Hyperparameters | Hyperparameter | Value | |-----------------------|----------------------------| | Optimizer | AdamW | | Learning Rate | 1e-4 | | Batch Size | 32 | | Epochs | 2 | | Weight Decay | * | | Class Weight | * | | Label Smoothing | * | | Scheduler | * | | Mixed Precision | Torch AMP | *parameters were intentionally not set because of poor results ### Evaluation Data The evaluation data is a subset of the cub-200-2011 dataset, http://www.vision.caltech.edu/datasets/cub_200_2011/. We evaluated the model on 393 samples of the labeled dataset we were given, stratified on the label. #### Testing Data The testing data is a subset of an unlabeled subset of the cub-200-2011 dataset, http://www.vision.caltech.edu/datasets/cub_200_2011/ of 4000 images. After model finetuning the best model, based on the evaluation data, would be loaded. This model would then be used to predict the labels of the unlabeled test set. These predicted labels were submitted to the Kaggle competition via CSV which returned the test accuracy. ### Poster ![image/png](https://cdn-uploads.huggingface.co/production/uploads/624d888b0ce29222ad64c3d6/XbH4M6aL8iE4Hy75xfaHc.png) *novel approaches were not applied when finetuning the final model as they did not improve accuracy.