Versions:
- CUDA: 12.1
- cuDNN Version: 8.9.2.26_1.0-1_amd64
- tensorflow Version: 2.12.0
- torch Version: 2.1.0.dev20230606+cu12135
- transformers Version: 4.30.2
- accelerate Version: 0.20.3
Model Benchmarks:
RAM: 3 GB (Original_Model: 6GB)
VRAM: 3.7 GB (Original_Model: 11GB)
test.wav: 23 s (Multilingual Speech i.e. English+Hindi)
- Time in seconds for Processing by each device
Device Name float32 (Original) float16 CudaCores TensorCores 3060 2.2 1.3 3,584 112 1660 Super OOM 6 1,408 N/A Collab (Tesla T4) - - 2,560 320 Collab (CPU) - N/A N/A N/A M1 (CPU) - - N/A N/A M1 (GPU -> 'mps') - - N/A N/A - NOTE: TensorCores are efficient in mixed-precision calculations
- CPU -> torch.float16 not supported on CPU (AMD Ryzen 5 3600 or Collab CPU)
Punchuation: Sometimes False ('I don't know the exact reason why this is happening')
Model Error Benchmarks:
- WER: Word Error Rate
- MER: Match Error Rate
- WIL: Word Information Lost
- WIP: Word Information Preserved
- CER: Character Error Rate
Hindi to Hindi (test.tsv) Common Voice 14.0
Test done on RTX 3060 on 1000 Samples
WER | MER | WIL | WIP | CER | |
---|---|---|---|---|---|
Original_Model (30 min) | 43.99 | 41.65 | 59.47 | 40.52 | 16.23 |
This_Model (20 min) | 44.64 | 41.69 | 59.53 | 40.46 | 16.80 |
Hindi to English (test.csv) Custom Dataset
Test done on RTX 3060 on 1000 Samples
WER | MER | WIL | WIP | CER | |
---|---|---|---|---|---|
Original_Model (30 min) | - | - | - | - | - |
This_Model (20 min) | - | - | - | - | - |
English (LibriSpeech -> test-clean)
Test done on RTX 3060 on ___ Samples
WER | MER | WIL | WIP | CER | |
---|---|---|---|---|---|
Original_Model | - | - | - | - | - |
This_Model | - | - | - | - | - |
English (LibriSpeech -> test-other)
Test done on RTX 3060 on ___ Samples
WER | MER | WIL | WIP | CER | |
---|---|---|---|---|---|
Original_Model | - | - | - | - | - |
This_Model | - | - | - | - | - |
- 'jiwer' library is used for calculations
Code for conversion:
Usage
A file __init__.py
is contained inside this repo which contains all the code to use this model.
Firstly, clone this repo and place all the files inside a folder.
Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co./devasheeshG/whisper_large_v2_fp16_transformers
Please try in jupyter notebook
# Import the Model
from whisper_large_v2_fp16_transformers import Model, load_audio, pad_or_trim
# Initilise the model
model = Model(
model_name_or_path='whisper_large_v2_fp16_transformers',
cuda_visible_device="0",
device='cuda',
)
# Load Audio
audio = load_audio('whisper_large_v2_fp16_transformers/test.wav')
audio = pad_or_trim(audio)
# Transcribe (First transcription takes time)
model.transcribe(audio)
Credits
It is fp16 version of openai/whisper-large-v2
- Downloads last month
- 5
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Evaluation results
- Test WER on LibriSpeech (clean)test set self-reported0.000
- Test MER on LibriSpeech (clean)test set self-reported0.000
- Test WIL on LibriSpeech (clean)test set self-reported0.000
- Test WIP on LibriSpeech (clean)test set self-reported0.000
- Test CER on LibriSpeech (clean)test set self-reported0.000
- Test WER on LibriSpeech (other)test set self-reported0.000
- Test MER on LibriSpeech (other)test set self-reported0.000
- Test WIL on LibriSpeech (other)test set self-reported0.000
- Test WIP on LibriSpeech (other)test set self-reported0.000
- Test CER on LibriSpeech (other)test set self-reported0.000