Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,44 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- agkphysics/AudioSet
|
5 |
+
pipeline_tag: audio-classification
|
6 |
+
---
|
7 |
+
# Model Details
|
8 |
+
This is a CRNN sound event detection model pre-trained on [AudioSet](https://research.google.com/audioset/download.html) and then finetuned on [AudioSet-strong](https://research.google.com/audioset/download_strong.html).
|
9 |
+
It contains 8 convolution layers and a GRU, with a time resolution of 40ms and a total of about 6.4 million parameters.
|
10 |
+
|
11 |
+
# Usage
|
12 |
+
```python
|
13 |
+
import torch
|
14 |
+
from transformers import AutoModel
|
15 |
+
import torchaudio
|
16 |
+
|
17 |
+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
18 |
+
model = AutoModel.from_pretrained(
|
19 |
+
"wsntxxn/cnn8rnn-audioset-sed",
|
20 |
+
trust_remote_code=True
|
21 |
+
).to(device)
|
22 |
+
|
23 |
+
wav1, sr1 = torchaudio.load("/path/to/file1.wav")
|
24 |
+
wav1 = torchaudio.functional.resample(wav1, sr1, model.config.sample_rate)
|
25 |
+
wav1 = wav1.mean(0) if wav1.size(0) > 1 else wav1[0]
|
26 |
+
|
27 |
+
wav2, sr2 = torchaudio.load("/path/to/file2.wav")
|
28 |
+
wav2 = torchaudio.functional.resample(wav2, sr2, model.config.sample_rate)
|
29 |
+
wav2 = wav2.mean(0) if wav2.size(0) > 1 else wav2[0]
|
30 |
+
|
31 |
+
wav_batch = torch.nn.utils.rnn.pad_sequence([wav1, wav2], batch_first=True)
|
32 |
+
|
33 |
+
with torch.no_grad():
|
34 |
+
output = model(waveform=wav_batch)
|
35 |
+
# output: {
|
36 |
+
# "framewise_output": (2, 447, n_frames),
|
37 |
+
# "clipwise_output": (2, 447)
|
38 |
+
# }
|
39 |
+
|
40 |
+
# classes is in `model.classes`
|
41 |
+
# for example, the probability sequence of male speech is:
|
42 |
+
male_speech_prob = output[:, model.classes.index("Male speech, man speaking"), :]
|
43 |
+
|
44 |
+
```
|