luisotorres
commited on
Commit
•
77a9118
1
Parent(s):
ee669e4
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,65 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
datasets:
|
3 |
+
- marsyas/gtzan
|
4 |
+
metrics:
|
5 |
+
- accuracy
|
6 |
+
pipeline_tag: audio-classification
|
7 |
+
tags:
|
8 |
+
- music
|
9 |
+
- audio
|
10 |
+
---
|
11 |
+
|
12 |
+
# Description
|
13 |
+
|
14 |
+
This model is a specialized version of the <b>distilhubert</b> model fine-tuned on the <b>gtzan</b> dataset for the task of Music Genre Classification.
|
15 |
+
|
16 |
+
## Development
|
17 |
+
- Kaggle Notebook: [Audio Data: Music Genre Classification](https://www.kaggle.com/code/lusfernandotorres/audio-data-music-genre-classification)
|
18 |
+
|
19 |
+
|
20 |
+
## Training Parameters
|
21 |
+
```python
|
22 |
+
evaluation_strategy = 'epoch',
|
23 |
+
save_strategy = 'epoch',
|
24 |
+
load_best_model_at_end = True,
|
25 |
+
metric_for_best_model = 'accuracy',
|
26 |
+
learning_rate = 5e-5,
|
27 |
+
seed = 42,
|
28 |
+
per_device_train_batch_size = 8,
|
29 |
+
per_device_eval_batch_size = 8,
|
30 |
+
gradient_accumulation_steps = 1,
|
31 |
+
num_train_epochs = 15,
|
32 |
+
warmup_ratio = 0.1,
|
33 |
+
fp16 = True,
|
34 |
+
save_total_limit = 2,
|
35 |
+
report_to = 'none'
|
36 |
+
```
|
37 |
+
|
38 |
+
## Training and Validation Results
|
39 |
+
|
40 |
+
```python
|
41 |
+
Epoch Training Loss Validation Loss Accuracy
|
42 |
+
1 No log 2.050576 0.395000
|
43 |
+
2 No log 1.387915 0.565000
|
44 |
+
3 No log 1.141497 0.665000
|
45 |
+
4 No log 1.052763 0.675000
|
46 |
+
5 1.354600 0.846402 0.745000
|
47 |
+
6 1.354600 0.858698 0.750000
|
48 |
+
7 1.354600 0.864531 0.730000
|
49 |
+
8 1.354600 0.765039 0.775000
|
50 |
+
9 1.354600 0.790847 0.785000
|
51 |
+
10 0.250100 0.873926 0.785000
|
52 |
+
11 0.250100 0.928275 0.770000
|
53 |
+
12 0.250100 0.851429 0.780000
|
54 |
+
13 0.250100 0.922214 0.770000
|
55 |
+
14 0.250100 0.916481 0.780000
|
56 |
+
15 0.028000 0.946075 0.770000
|
57 |
+
TrainOutput(global_step=1500, training_loss=0.5442592652638754,
|
58 |
+
metrics={'train_runtime': 12274.2966, 'train_samples_per_second': 0.976,
|
59 |
+
'train_steps_per_second': 0.122, 'total_flos': 8.177513845536e+17, 'train_loss': 0.5442592652638754, 'epoch': 15.0})
|
60 |
+
```
|
61 |
+
|
62 |
+
## Reference
|
63 |
+
This model is based on the original <b>HuBERT</b> architecture, as detailed in:
|
64 |
+
|
65 |
+
Hsu et al. (2021). HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units. [arXiv:2106.07447](https://arxiv.org/pdf/2106.07447.pdf)
|