sanchit-gandhi commited on
Commit
934c622
·
1 Parent(s): bdeaacd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -10
README.md CHANGED
@@ -55,15 +55,22 @@ To transcribe audio files the model can be used as a standalone acoustic model a
55
 
56
  ## Evaluation
57
 
58
- This code snippet shows how to evaluate **facebook/wav2vec2-large-960h** on LibriSpeech's "clean" and "other" test data.
59
-
 
 
 
 
 
 
 
 
 
60
  ```python
 
61
  from datasets import load_dataset
62
  from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
63
- import soundfile as sf
64
- import torch
65
- from jiwer import wer
66
-
67
 
68
  librispeech_eval = load_dataset("librispeech_asr", "clean", split="test")
69
 
@@ -71,18 +78,21 @@ model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-large-960h").to("cuda"
71
  processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-large-960h")
72
 
73
  def map_to_pred(batch):
74
- input_values = processor(batch["audio"]["array"], return_tensors="pt", padding="longest").input_values
 
 
75
  with torch.no_grad():
76
  logits = model(input_values.to("cuda")).logits
77
 
78
  predicted_ids = torch.argmax(logits, dim=-1)
79
  transcription = processor.batch_decode(predicted_ids)
80
- batch["transcription"] = transcription
81
  return batch
82
 
83
- result = librispeech_eval.map(map_to_pred, batched=True, batch_size=1, remove_columns=["speech"])
 
84
 
85
- print("WER:", wer(result["text"], result["transcription"]))
86
  ```
87
 
88
  *Result (WER)*:
 
55
 
56
  ## Evaluation
57
 
58
+ First, ensure the required Python packages are installed. We'll require `transformers` for running the Wav2Vec2 model,
59
+ `datasets` for loading the LibriSpeech dataset, and `evaluate` plus `jiwer` for computing the word-error rate (WER):
60
+
61
+ ```
62
+ pip install --upgrade pip
63
+ pip install --upgrade transformers datasets evaluate jiwer
64
+ ```
65
+
66
+ The following code snippet shows how to evaluate **facebook/wav2vec2-large-960h** on LibriSpeech's "clean" and "other" test data.
67
+ The batch size can be set according to your device, and is set to `8` by default:
68
+
69
  ```python
70
+ import torch
71
  from datasets import load_dataset
72
  from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
73
+ from evaluate import load
 
 
 
74
 
75
  librispeech_eval = load_dataset("librispeech_asr", "clean", split="test")
76
 
 
78
  processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-large-960h")
79
 
80
  def map_to_pred(batch):
81
+ audios = [audio["array"] for audio in batch["audio"]]
82
+ sampling_rate = batch["audio"][0]["sampling_rate"]
83
+ input_values = processor(audios, sampling_rate=sampling_rate, return_tensors="pt", padding="longest").input_values
84
  with torch.no_grad():
85
  logits = model(input_values.to("cuda")).logits
86
 
87
  predicted_ids = torch.argmax(logits, dim=-1)
88
  transcription = processor.batch_decode(predicted_ids)
89
+ batch["transcription"] = [t for t in transcription]
90
  return batch
91
 
92
+ result = librispeech_eval.map(map_to_pred, batched=True, batch_size=8, remove_columns=["audio"])
93
+ wer = load("wer")
94
 
95
+ print("WER:", wer.compute(references=result["text"], predictions=result["transcription"]))
96
  ```
97
 
98
  *Result (WER)*: