wav2vec2-base-one-shot-hip-hop-drums-clf
This model is a fine-tuned version of facebook/wav2vec2-base on yojul/one-shot-hip-hop-drums. It achieves the following results on the evaluation set:
- Loss: 0.2463
- Accuracy: 0.9243
Model description
This a model is a classifier of one-shot drum sample, it has been trained on 17k hip-hop drum samples. It is able to classify samples within 7 classes : Kicks, Snares, Cymbals, Open-hats, Hi-hats, 808s, Claps.
Intended uses & limitations
It might be used to automatically sort large number of drum samples when there are no prior knowledge on metadata. The model can take any audio file as input, but note that it has been trained on audio files downsampled at 16kHz.
Training and evaluation data
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 10
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
0.8432 | 1.0 | 123 | 0.7449 | 0.8523 |
0.4692 | 2.0 | 246 | 0.4199 | 0.8894 |
0.3478 | 3.0 | 369 | 0.3122 | 0.9148 |
0.3054 | 4.0 | 492 | 0.2771 | 0.9156 |
0.2522 | 5.0 | 615 | 0.2676 | 0.9217 |
0.2221 | 6.0 | 738 | 0.2495 | 0.9217 |
0.2256 | 7.0 | 861 | 0.2588 | 0.9184 |
0.1949 | 8.0 | 984 | 0.2525 | 0.9232 |
0.1837 | 9.0 | 1107 | 0.2505 | 0.9237 |
0.1644 | 10.0 | 1230 | 0.2463 | 0.9243 |
Framework versions
- Transformers 4.41.1
- Pytorch 2.3.0+cu121
- Datasets 2.19.0
- Tokenizers 0.19.1
- Downloads last month
- 18
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.