wsntxxn
/

cnn8rnn-audioset-sed

Audio Classification

Model card Files Files and versions Community

wsntxxn commited on Aug 13, 2024

Commit

cc8ed6e

·

verified ·

1 Parent(s): ef9c1d9

Create README.md

Files changed (1) hide show

README.md +44 -0

README.md ADDED Viewed

	@@ -0,0 +1,44 @@

+---
+license: apache-2.0
+datasets:
+- agkphysics/AudioSet
+pipeline_tag: audio-classification
+---
+# Model Details
+This is a CRNN sound event detection model pre-trained on [AudioSet](https://research.google.com/audioset/download.html) and then finetuned on [AudioSet-strong](https://research.google.com/audioset/download_strong.html).
+It contains 8 convolution layers and a GRU, with a time resolution of 40ms and a total of about 6.4 million parameters.
+# Usage
+```python
+import torch
+from transformers import AutoModel
+import torchaudio
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+model = AutoModel.from_pretrained(
+    "wsntxxn/cnn8rnn-audioset-sed",
+    trust_remote_code=True
+).to(device)
+wav1, sr1 = torchaudio.load("/path/to/file1.wav")
+wav1 = torchaudio.functional.resample(wav1, sr1, model.config.sample_rate)
+wav1 = wav1.mean(0) if wav1.size(0) > 1 else wav1[0]
+wav2, sr2 = torchaudio.load("/path/to/file2.wav")
+wav2 = torchaudio.functional.resample(wav2, sr2, model.config.sample_rate)
+wav2 = wav2.mean(0) if wav2.size(0) > 1 else wav2[0]
+wav_batch = torch.nn.utils.rnn.pad_sequence([wav1, wav2], batch_first=True)
+with torch.no_grad():
+    output = model(waveform=wav_batch)
+    # output: {
+    #     "framewise_output": (2, 447, n_frames),
+    #     "clipwise_output": (2, 447)
+    # }
+# classes is in `model.classes`
+# for example, the probability sequence of male speech is:
+male_speech_prob = output[:, model.classes.index("Male speech, man speaking"), :]
+```