beltrewilton
/

ast-finetuned-audioset-10-10-0.4593-finetuned-gtzan

Audio Classification

audio-spectrogram-transformer

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

beltrewilton commited on Nov 20, 2023

Commit

5fc7741

·

1 Parent(s): 79f2bd8

Update README.md

Files changed (1) hide show

README.md +10 -1

README.md CHANGED Viewed

@@ -37,7 +37,9 @@ It achieves the following results on the evaluation set:
 ## Model description
-More information needed
 ## Intended uses & limitations
@@ -61,6 +63,13 @@ The following hyperparameters were used during training:
 - lr_scheduler_warmup_ratio: 0.1
 - num_epochs: 10
 - mixed_precision_training: Native AMP
 ### Training results

 ## Model description
+This model was generated as part of the HF Audio course, I enjoyed it and currently this architecture achieves an amazing accuracy of 0.9 on the audio classification task.
+The Audio Spectrogram Transformer is equivalent to [ViT](https://huggingface.co/docs/transformers/model_doc/vit), but applied on audio. Audio is first turned into an image (as a spectrogram), after which a Vision Transformer is applied. The model gets state-of-the-art results on several audio classification benchmarks.
 ## Intended uses & limitations
 - lr_scheduler_warmup_ratio: 0.1
 - num_epochs: 10
 - mixed_precision_training: Native AMP
+- global_step: 2250
+- training_loss: 0.23970948094350752
+- train_runtime: 1982.7909
+- train_samples_per_second: 4.534
+- train_steps_per_second: 1.135
+- total_flos: 6.094112254328832e+17
+- train_loss: 0.23970948094350752
 ### Training results