Update README.md
Browse files
README.md
CHANGED
@@ -11,6 +11,9 @@ tags:
|
|
11 |
- speech
|
12 |
- xlsr-fine-tuning-week
|
13 |
license: apache-2.0
|
|
|
|
|
|
|
14 |
model-index:
|
15 |
- name: XLSR Wav2Vec2 Russian by Ivan Bondarenko
|
16 |
results:
|
@@ -50,6 +53,33 @@ The Wav2Vec2 model is based on [facebook/wav2vec2-large-xlsr-53](https://hugging
|
|
50 |
|
51 |
When using this model, make sure that your speech input is sampled at 16kHz.
|
52 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
53 |
## Citation
|
54 |
If you want to cite this model you can use this:
|
55 |
|
|
|
11 |
- speech
|
12 |
- xlsr-fine-tuning-week
|
13 |
license: apache-2.0
|
14 |
+
widget:
|
15 |
+
- example_title: test sound with Russian speech
|
16 |
+
src: https://huggingface.co/bond005/wav2vec2-large-ru-golos/blob/main/test_sound_ru.wav
|
17 |
model-index:
|
18 |
- name: XLSR Wav2Vec2 Russian by Ivan Bondarenko
|
19 |
results:
|
|
|
53 |
|
54 |
When using this model, make sure that your speech input is sampled at 16kHz.
|
55 |
|
56 |
+
## Usage
|
57 |
+
|
58 |
+
To transcribe audio files the model can be used as a standalone acoustic model as follows:
|
59 |
+
|
60 |
+
```python
|
61 |
+
from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
|
62 |
+
from datasets import load_dataset
|
63 |
+
import torch
|
64 |
+
|
65 |
+
# load model and tokenizer
|
66 |
+
processor = Wav2Vec2Processor.from_pretrained("bond005/wav2vec2-large-ru-golos")
|
67 |
+
model = Wav2Vec2ForCTC.from_pretrained("bond005/wav2vec2-large-ru-golos")
|
68 |
+
|
69 |
+
# load dummy dataset and read soundfiles
|
70 |
+
ds = load_dataset("bond005/sberdevices_golos_10h_crowd", split="test")
|
71 |
+
|
72 |
+
# tokenize
|
73 |
+
processed = processor(ds[0]["audio"]["array"], return_tensors="pt", padding="longest") # Batch size 1
|
74 |
+
|
75 |
+
# retrieve logits
|
76 |
+
logits = model(processed.input_values, attention_mask=processed.attention_mask).logits
|
77 |
+
|
78 |
+
# take argmax and decode
|
79 |
+
predicted_ids = torch.argmax(logits, dim=-1)
|
80 |
+
transcription = processor.batch_decode(predicted_ids)
|
81 |
+
```
|
82 |
+
|
83 |
## Citation
|
84 |
If you want to cite this model you can use this:
|
85 |
|