hubert_gtzan / README.md
luisotorres's picture
Create README.md
77a9118
metadata
datasets:
  - marsyas/gtzan
metrics:
  - accuracy
pipeline_tag: audio-classification
tags:
  - music
  - audio

Description

This model is a specialized version of the distilhubert model fine-tuned on the gtzan dataset for the task of Music Genre Classification.

Development

Training Parameters

evaluation_strategy = 'epoch',
save_strategy = 'epoch',
load_best_model_at_end = True,
metric_for_best_model = 'accuracy',
learning_rate = 5e-5,
seed = 42,
per_device_train_batch_size = 8,
per_device_eval_batch_size = 8,
gradient_accumulation_steps = 1,
num_train_epochs = 15,
warmup_ratio = 0.1,
fp16 = True,
save_total_limit = 2,
report_to = 'none'

Training and Validation Results

Epoch	Training Loss	Validation Loss	Accuracy
1	      No log	        2.050576	0.395000
2	      No log	        1.387915	0.565000
3	      No log	        1.141497	0.665000
4	      No log	        1.052763	0.675000
5	      1.354600	        0.846402	0.745000
6	      1.354600	        0.858698	0.750000
7	      1.354600	        0.864531	0.730000
8	      1.354600	        0.765039	0.775000
9	      1.354600	        0.790847	0.785000
10	      0.250100	        0.873926	0.785000
11	      0.250100	        0.928275	0.770000
12	      0.250100	        0.851429	0.780000
13	      0.250100	        0.922214	0.770000
14	      0.250100	        0.916481	0.780000
15	      0.028000	        0.946075	0.770000
TrainOutput(global_step=1500, training_loss=0.5442592652638754,
metrics={'train_runtime': 12274.2966, 'train_samples_per_second': 0.976,
'train_steps_per_second': 0.122, 'total_flos': 8.177513845536e+17, 'train_loss': 0.5442592652638754, 'epoch': 15.0})

Reference

This model is based on the original HuBERT architecture, as detailed in:

Hsu et al. (2021). HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units. arXiv:2106.07447