bayartsogt commited on
Commit
dcf1ab6
·
1 Parent(s): b7ec1f5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -0
README.md CHANGED
@@ -32,6 +32,32 @@ Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav
32
 
33
  When using this model, make sure that your speech input is sampled at 16kHz.
34
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
  ## Evaluation
36
 
37
  The model can be evaluated as follows on the Mongolian test data of Common Voice.
 
32
 
33
  When using this model, make sure that your speech input is sampled at 16kHz.
34
 
35
+ ## Usage
36
+ The model can be used directly (without a language model) as follows:
37
+ ```python
38
+ import torch
39
+ import torchaudio
40
+ from datasets import load_dataset
41
+ from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
42
+ test_dataset = load_dataset("common_voice", "mn", split="test[:2%]")
43
+ processor = Wav2Vec2Processor.from_pretrained("bayartsogt/wav2vec2-large-xlsr-mongolian")
44
+ model = Wav2Vec2ForCTC.from_pretrained("bayartsogt/wav2vec2-large-xlsr-mongolian")
45
+ resampler = torchaudio.transforms.Resample(48_000, 16_000)
46
+ # Preprocessing the datasets.
47
+ # We need to read the aduio files as arrays
48
+ def speech_file_to_array_fn(batch):
49
+ speech_array, sampling_rate = torchaudio.load(batch["path"])
50
+ batch["speech"] = resampler(speech_array).squeeze().numpy()
51
+ return batch
52
+ test_dataset = test_dataset.map(speech_file_to_array_fn)
53
+ inputs = processor(test_dataset["speech"][:2], sampling_rate=16_000, return_tensors="pt", padding=True)
54
+ with torch.no_grad():
55
+ logits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits
56
+ predicted_ids = torch.argmax(logits, dim=-1)
57
+ print("Prediction:", processor.batch_decode(predicted_ids))
58
+ print("Reference:", test_dataset["sentence"][:2])
59
+ ```
60
+
61
  ## Evaluation
62
 
63
  The model can be evaluated as follows on the Mongolian test data of Common Voice.