File size: 2,563 Bytes
f6d9c78 31cda27 f6d9c78 fbc0053 efdf75b f6d9c78 11ef092 482a80a 11ef092 3756451 11ef092 31cda27 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
---
tags:
- transformers
- stable-diffusion
- diffusers
- template:sd-lora
base_model: luisotorres/hubert_gtzan
instance_prompt: DistilHuBERT, Audio Encoder, Transfer Learning
license: mit
---
# HuBERT-Genre-Clf
<Gallery />
## Model description
<!-- <img src="assets/img.webp"></img> -->
<img src="assets/img.jpg"></img>
This model is a fine-tuned version of DistilHuBERT for audio genre classification tasks. DistilHuBERT is a distilled variant of the HuBERT model, optimized for efficient and effective audio processing. This classifier is capable of categorizing audio files into various musical genres, leveraging the powerful representations learned by DistilHuBERT.
## Model Details:
- **Architecture:** DistilHuBERT
- **Task:** Audio Genre Classification
- **Genres:** [List the genres your model can classify, e.g., Blues, Classical, Country, Electronic, Hip-Hop, Jazz, Pop, Rock, etc.]
- **Dataset:** [GTZAN test dataset](https://www.kaggle.com/andradaolteanu/gtzan-dataset-music-genre-classification)
- **Training:** The model was fine-tuned on a diverse set of audio tracks, encompassing various genres to ensure robust classification performance.
**Usage:**
To use this model, you can load it with the `transformers` library as follows:
```python
from transformers import AutoModelForAudioClassification, AutoFeatureExtractor
model_name = "danilotpnta/HuBERT-Genre-Clf"
model = AutoModelForAudioClassification.from_pretrained(model_name)
feature_extractor = AutoFeatureExtractor.from_pretrained(model_name)
# Example usage for an audio file
import torch
import librosa
audio_file = "path_to_your_audio_file.wav"
audio, sr = librosa.load(audio_file, sr=feature_extractor.sampling_rate)
inputs = feature_extractor(audio, sampling_rate=sr, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
predicted_class = logits.argmax(dim=-1).item()
print(f"Predicted genre: {model.config.id2label[predicted_class]}")
```
**Performance:**
The model achieves an impressive **80.63%** accuracy on the [GTZAN test dataset](https://www.kaggle.com/andradaolteanu/gtzan-dataset-music-genre-classification) for genre classification tasks, demonstrating its efficacy and reliability. This high level of performance makes it a valuable asset for various applications, including music recommendation systems and audio analysis tools.
## Download model
Weights for this model are available in Safetensors,PyTorch format.
[Download](/danilotpnta/HuBERT-Genre-Clf/tree/main) them in the Files & versions tab.
**License: MIT** |