File size: 1,256 Bytes
1f4e791 dc3781c 1f4e791 dc3781c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
---
license: mit
language:
- ru
library_name: pyannote-audio
tags:
- code
---
# Segmentation model
This model was trained on AMI-MixHeadset and my own synthetic dataset of Russian speech.
Training time: 5 hours on GTX 3060
This model can be used for diarization model from [pyannote/speaker-diarization](https://huggingface.co./pyannote/speaker-diarization)
| Benchmark | DER% |
| --------- |------|
| [AMI (*headset mix,*](https://groups.inf.ed.ac.uk/ami/corpus/) [*only_words*)](https://github.com/BUTSpeechFIT/AMI-diarization-setup) | 38.8 |
## Usage example
```python
import yaml
from yaml.loader import SafeLoader
import torch
from pyannote.audio import Model
from pyannote.audio.pipelines import SpeakerDiarization
segm_model = torch.load('model/segm_model.pth', map_location=torch.device('cpu'))
embed_model = Model.from_pretrained("pyannote/embedding", use_auth_token='ACCESS_TOKEN_GOES_HERE')
diar_pipeline = SpeakerDiarization(
segmentation=segm_model,
segmentation_batch_size=16,
clustering="AgglomerativeClustering",
embedding=embed_model
)
with open('model/config.yaml', 'r') as f:
diar_config = yaml.load(f, Loader=SafeLoader)
diar_pipeline.instantiate(diar_config)
annotation = diar_pipeline('audio.wav')
```
|