File size: 2,515 Bytes
cea1856
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11738e9
cea1856
6c1cd6b
 
cea1856
 
 
fc78769
cea1856
 
 
fc78769
cea1856
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6c1cd6b
 
 
 
 
 
 
 
 
 
cea1856
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
license: apache-2.0
base_model: facebook/wav2vec2-base
tags:
- generated_from_trainer
metrics:
- accuracy
model-index:
- name: wav2vec2-base-one-shot-hip-hop-drums-clf
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# wav2vec2-base-one-shot-hip-hop-drums-clf

This model is a fine-tuned version of [facebook/wav2vec2-base](https://huggingface.co./facebook/wav2vec2-base) on [yojul/one-shot-hip-hop-drums](https://huggingface.co./datasets/yojul/one-shot-hip-hop-drums).
It achieves the following results on the evaluation set:
- Loss: 0.2463
- Accuracy: 0.9243

## Model description

This a model is a classifier of one-shot drum sample, it has been trained on 17k hip-hop drum samples. It is able to classify samples within 7 classes : Kicks, Snares, Cymbals, Open-hats, Hi-hats, 808s, Claps.

## Intended uses & limitations

It might be used to automatically sort large number of drum samples when there are no prior knowledge on metadata. The model can take any audio file as input, but note that it has been trained on audio files downsampled at 16kHz.

## Training and evaluation data

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 3e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 10

### Training results

| Training Loss | Epoch | Step | Validation Loss | Accuracy |
|:-------------:|:-----:|:----:|:---------------:|:--------:|
| 0.8432        | 1.0   | 123  | 0.7449          | 0.8523   |
| 0.4692        | 2.0   | 246  | 0.4199          | 0.8894   |
| 0.3478        | 3.0   | 369  | 0.3122          | 0.9148   |
| 0.3054        | 4.0   | 492  | 0.2771          | 0.9156   |
| 0.2522        | 5.0   | 615  | 0.2676          | 0.9217   |
| 0.2221        | 6.0   | 738  | 0.2495          | 0.9217   |
| 0.2256        | 7.0   | 861  | 0.2588          | 0.9184   |
| 0.1949        | 8.0   | 984  | 0.2525          | 0.9232   |
| 0.1837        | 9.0   | 1107 | 0.2505          | 0.9237   |
| 0.1644        | 10.0  | 1230 | 0.2463          | 0.9243   |


### Framework versions

- Transformers 4.41.1
- Pytorch 2.3.0+cu121
- Datasets 2.19.0
- Tokenizers 0.19.1