|
--- |
|
datasets: |
|
- davidrrobinson/AnimalSpeak |
|
--- |
|
|
|
# Model card for BioLingual |
|
|
|
Model card for BioLingual: Transferable Models for bioacoustics with Human Language Supervision |
|
|
|
An audio-text model for bioacoustics based on contrastive language-audio pretraining. |
|
|
|
# Usage |
|
|
|
You can use this model for bioacoustic zero shot audio classification, or for fine-tuning on bioacoustic tasks. |
|
|
|
# Uses |
|
|
|
## Perform zero-shot audio classification |
|
|
|
### Using `pipeline` |
|
|
|
```python |
|
from datasets import load_dataset |
|
from transformers import pipeline |
|
|
|
dataset = load_dataset("ashraq/esc50") |
|
audio = dataset["train"]["audio"][-1]["array"] |
|
|
|
audio_classifier = pipeline(task="zero-shot-audio-classification", model="davidrrobinson/BioLingual") |
|
output = audio_classifier(audio, candidate_labels=["Sound of a sperm whale", "Sound of a sea lion"]) |
|
print(output) |
|
>>> [{"score": 0.999, "label": "Sound of a dog"}, {"score": 0.001, "label": "Sound of vaccum cleaner"}] |
|
``` |
|
|
|
## Run the model: |
|
|
|
You can also get the audio and text embeddings using `ClapModel` |
|
|
|
### Run the model on CPU: |
|
|
|
```python |
|
from datasets import load_dataset |
|
from transformers import ClapModel, ClapProcessor |
|
|
|
librispeech_dummy = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation") |
|
audio_sample = librispeech_dummy[0] |
|
|
|
model = ClapModel.from_pretrained("laion/clap-htsat-unfused") |
|
processor = ClapProcessor.from_pretrained("laion/clap-htsat-unfused") |
|
|
|
inputs = processor(audios=audio_sample["audio"]["array"], return_tensors="pt") |
|
audio_embed = model.get_audio_features(**inputs) |
|
``` |
|
|
|
### Run the model on GPU: |
|
|
|
```python |
|
from datasets import load_dataset |
|
from transformers import ClapModel, ClapProcessor |
|
|
|
librispeech_dummy = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation") |
|
audio_sample = librispeech_dummy[0] |
|
|
|
model = ClapModel.from_pretrained("laion/clap-htsat-unfused").to(0) |
|
processor = ClapProcessor.from_pretrained("laion/clap-htsat-unfused") |
|
|
|
inputs = processor(audios=audio_sample["audio"]["array"], return_tensors="pt").to(0) |
|
audio_embed = model.get_audio_features(**inputs) |