Emiel
/

cub-200-bird-classifier-swin

Image Classification

Inference Endpoints

Model card Files Files and versions Community

cub-200-bird-classifier-swin / README.md

Emiel's picture

Update README.md

c1edd55 verified 21 days ago

|

history blame contribute delete

3.85 kB

	---
	library_name: transformers
	metrics:
	- accuracy
	base_model:
	- microsoft/swin-large-patch4-window12-384-in22k
	license: apache-2.0
	tags:
	- vision
	- image-classification
	- vt
	model-index:
	- name: cub-200-bird-classifier-swin
	results:
	- task:
	type: image-classification
	name: Image Classification
	dataset:
	name: cub-200-2011
	type: cub-200-2011
	args: default
	metrics:
	- type: accuracy
	value: 0.8653
	name: validation_accuracy
	- type: accuracy
	value: 0.8795
	name: test_accuracy
	---

	# Model Card for Model ID

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/624d888b0ce29222ad64c3d6/X7cXpayiKgUCUycIen22S.png)


	### Model Description


	This model was finetuned for the "Feather in Focus!" Kaggle competition of the Information Studies Master's Applied Machine Learning course at the University of Amsterdam.
	The goal of the competition was to apply novel approaches to achieve the highest possible accuracy on a bird classification task with 200 classes.
	We were given a labeled dataset of 3926 images and an unlabeled dataset of 4000 test images.
	Out of 32 groups and 1083 submissions, we achieved the #1 accuracy on the test set with a score of 0.87950.

	### Training Details
	The model we are finetuning, microsoft/swin-large-patch4-window12-384-in22k, was pre-trained on imagenet-21k, see https://huggingface.co./microsoft/swin-large-patch4-window12-384-in22k.

	#### Preprocessing

	Data augmentation was applied to the training data in a custom Torch dataset class. Because of the size of the dataset, images were not replaced but were duplicated and augmented.
	The only augmentations applied were HorizontalFlips and Rotations (10 degrees) to align with the relatively homogeneous dataset.

	# Finetuning
	Finetuning was done on some 50 different models including different VTs and CNNs. All models were trained for 10 epochs with the best model, based on the evaluation acccuracy,
	saved every epoch.
	### Finetuning Data

	The finetuning data is a subset of the cub-200-2011 dataset, http://www.vision.caltech.edu/datasets/cub_200_2011/.
	We finetuned the model on 3533 samples of the labeled dataset we were given, stratified on the label (7066 including augmented images).


	#### Finetuning Hyperparameters

	\| Hyperparameter \| Value \|
	\|-----------------------\|----------------------------\|
	\| Optimizer \| AdamW \|
	\| Learning Rate \| 1e-4 \|
	\| Batch Size \| 32 \|
	\| Epochs \| 2 \|
	\| Weight Decay \| * \|
	\| Class Weight \| * \|
	\| Label Smoothing \| * \|
	\| Scheduler \| * \|
	\| Mixed Precision \| Torch AMP \|


	*parameters were intentionally not set because of poor results

	### Evaluation Data
	The evaluation data is a subset of the cub-200-2011 dataset, http://www.vision.caltech.edu/datasets/cub_200_2011/.
	We evaluated the model on 393 samples of the labeled dataset we were given, stratified on the label.

	#### Testing Data

	The testing data is a subset of an unlabeled subset of the cub-200-2011 dataset, http://www.vision.caltech.edu/datasets/cub_200_2011/ of 4000 images. After model finetuning
	the best model, based on the evaluation data, would be loaded. This model would then be used to predict the labels of the unlabeled test set.
	These predicted labels were submitted to the Kaggle competition via CSV which returned the test accuracy.

	### Poster

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/624d888b0ce29222ad64c3d6/XbH4M6aL8iE4Hy75xfaHc.png)

	*novel approaches were not applied when finetuning the final model as they did not improve accuracy.