beltrewilton
commited on
Commit
·
5fc7741
1
Parent(s):
79f2bd8
Update README.md
Browse files
README.md
CHANGED
@@ -37,7 +37,9 @@ It achieves the following results on the evaluation set:
|
|
37 |
|
38 |
## Model description
|
39 |
|
40 |
-
|
|
|
|
|
41 |
|
42 |
## Intended uses & limitations
|
43 |
|
@@ -61,6 +63,13 @@ The following hyperparameters were used during training:
|
|
61 |
- lr_scheduler_warmup_ratio: 0.1
|
62 |
- num_epochs: 10
|
63 |
- mixed_precision_training: Native AMP
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
64 |
|
65 |
### Training results
|
66 |
|
|
|
37 |
|
38 |
## Model description
|
39 |
|
40 |
+
This model was generated as part of the HF Audio course, I enjoyed it and currently this architecture achieves an amazing accuracy of 0.9 on the audio classification task.
|
41 |
+
|
42 |
+
The Audio Spectrogram Transformer is equivalent to [ViT](https://huggingface.co/docs/transformers/model_doc/vit), but applied on audio. Audio is first turned into an image (as a spectrogram), after which a Vision Transformer is applied. The model gets state-of-the-art results on several audio classification benchmarks.
|
43 |
|
44 |
## Intended uses & limitations
|
45 |
|
|
|
63 |
- lr_scheduler_warmup_ratio: 0.1
|
64 |
- num_epochs: 10
|
65 |
- mixed_precision_training: Native AMP
|
66 |
+
- global_step: 2250
|
67 |
+
- training_loss: 0.23970948094350752
|
68 |
+
- train_runtime: 1982.7909
|
69 |
+
- train_samples_per_second: 4.534
|
70 |
+
- train_steps_per_second: 1.135
|
71 |
+
- total_flos: 6.094112254328832e+17
|
72 |
+
- train_loss: 0.23970948094350752
|
73 |
|
74 |
### Training results
|
75 |
|