File size: 3,852 Bytes
caed5c7
 
dfc9128
 
 
 
1d05bb9
fdecdfe
fb429e9
 
c1edd55
991bb1a
 
 
 
 
62a3980
991bb1a
19cd796
 
991bb1a
 
62a3980
b225284
62a3980
 
b225284
62a3980
caed5c7
 
 
 
3299a6c
1106202
caed5c7
 
 
 
8b14b23
02909d3
485a8ec
 
caed5c7
77fac97
b225284
caed5c7
12a834f
caed5c7
dd3c31d
96e139d
caed5c7
f4ca1c3
 
 
2cb3bed
 
69bc783
2cb3bed
 
 
b225284
caed5c7
12a834f
 
 
 
 
 
0a2834d
 
 
 
12a834f
caed5c7
 
74b6c23
0a2834d
77fac97
69bc783
b225284
caed5c7
 
 
69bc783
 
 
77fac97
 
 
 
 
c1edd55
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
---
library_name: transformers
metrics:
- accuracy
base_model:
- microsoft/swin-large-patch4-window12-384-in22k
license: apache-2.0
tags:
- vision
- image-classification
- vt
model-index:
- name: cub-200-bird-classifier-swin
  results:
  - task:
      type: image-classification
      name: Image Classification
    dataset:
      name: cub-200-2011
      type: cub-200-2011
      args: default
    metrics:
    - type: accuracy
      value: 0.8653
      name: validation_accuracy
    - type: accuracy
      value: 0.8795
      name: test_accuracy
---

# Model Card for Model ID

![image/png](https://cdn-uploads.huggingface.co/production/uploads/624d888b0ce29222ad64c3d6/X7cXpayiKgUCUycIen22S.png)


### Model Description


This model was finetuned for the "Feather in Focus!" Kaggle competition of the Information Studies Master's Applied Machine Learning course at the University of Amsterdam.
The goal of the competition was to apply novel approaches to achieve the highest possible accuracy on a bird classification task with 200 classes.
We were given a labeled dataset of 3926 images and an unlabeled dataset of 4000 test images.
Out of 32 groups and 1083 submissions, we achieved the #1 accuracy on the test set with a score of 0.87950.

### Training Details
The model we are finetuning, microsoft/swin-large-patch4-window12-384-in22k, was pre-trained on imagenet-21k, see https://huggingface.co./microsoft/swin-large-patch4-window12-384-in22k.

#### Preprocessing

Data augmentation was applied to the training data in a custom Torch dataset class. Because of the size of the dataset, images were not replaced but were duplicated and augmented.
The only augmentations applied were HorizontalFlips and Rotations (10 degrees) to align with the relatively homogeneous dataset.

# Finetuning
Finetuning was done on some 50 different models including different VTs and CNNs. All models were trained for 10 epochs with the best model, based on the evaluation acccuracy,
saved every epoch.
### Finetuning Data

The finetuning data is a subset  of the cub-200-2011 dataset, http://www.vision.caltech.edu/datasets/cub_200_2011/.
We finetuned the model on 3533 samples of the labeled dataset we were given, stratified on the label (7066 including augmented images).


#### Finetuning Hyperparameters

| Hyperparameter        | Value                      |
|-----------------------|----------------------------|
| Optimizer             | AdamW                     |
| Learning Rate         | 1e-4                     |
| Batch Size            | 32                        |
| Epochs                | 2                        |
| Weight Decay          | *                      |
| Class Weight          | *                      |
| Label Smoothing          | *                      |
| Scheduler             | *         |
| Mixed Precision       | Torch AMP                      |


*parameters were intentionally not set because of poor results

### Evaluation Data
The evaluation data is a subset of the cub-200-2011 dataset, http://www.vision.caltech.edu/datasets/cub_200_2011/.
We evaluated the model on 393 samples of the labeled dataset we were given, stratified on the label.

#### Testing Data

The testing data is a subset of an unlabeled subset of the cub-200-2011 dataset, http://www.vision.caltech.edu/datasets/cub_200_2011/ of 4000 images. After model finetuning
the best model, based on the evaluation data, would be loaded. This model would then be used to predict the labels of the unlabeled test set.
These predicted labels were submitted to the Kaggle competition via CSV which returned the test accuracy.

### Poster

![image/png](https://cdn-uploads.huggingface.co/production/uploads/624d888b0ce29222ad64c3d6/XbH4M6aL8iE4Hy75xfaHc.png)

*novel approaches were not applied when finetuning the final model as they did not improve accuracy.