--- datasets: - marsyas/gtzan metrics: - accuracy pipeline_tag: audio-classification tags: - music - audio --- # Description This model is a specialized version of the distilhubert model fine-tuned on the gtzan dataset for the task of Music Genre Classification. ## Development - Kaggle Notebook: [Audio Data: Music Genre Classification](https://www.kaggle.com/code/lusfernandotorres/audio-data-music-genre-classification) ## Training Parameters ```python evaluation_strategy = 'epoch', save_strategy = 'epoch', load_best_model_at_end = True, metric_for_best_model = 'accuracy', learning_rate = 5e-5, seed = 42, per_device_train_batch_size = 8, per_device_eval_batch_size = 8, gradient_accumulation_steps = 1, num_train_epochs = 15, warmup_ratio = 0.1, fp16 = True, save_total_limit = 2, report_to = 'none' ``` ## Training and Validation Results ```python Epoch Training Loss Validation Loss Accuracy 1 No log 2.050576 0.395000 2 No log 1.387915 0.565000 3 No log 1.141497 0.665000 4 No log 1.052763 0.675000 5 1.354600 0.846402 0.745000 6 1.354600 0.858698 0.750000 7 1.354600 0.864531 0.730000 8 1.354600 0.765039 0.775000 9 1.354600 0.790847 0.785000 10 0.250100 0.873926 0.785000 11 0.250100 0.928275 0.770000 12 0.250100 0.851429 0.780000 13 0.250100 0.922214 0.770000 14 0.250100 0.916481 0.780000 15 0.028000 0.946075 0.770000 TrainOutput(global_step=1500, training_loss=0.5442592652638754, metrics={'train_runtime': 12274.2966, 'train_samples_per_second': 0.976, 'train_steps_per_second': 0.122, 'total_flos': 8.177513845536e+17, 'train_loss': 0.5442592652638754, 'epoch': 15.0}) ``` ## Reference This model is based on the original HuBERT architecture, as detailed in: Hsu et al. (2021). HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units. [arXiv:2106.07447](https://arxiv.org/pdf/2106.07447.pdf)