kotoba-tech
/

kotoba-whisper-bilingual-v1.0-faster

Automatic Speech Recognition

Model card Files Files and versions Community

asahi417 commited on Sep 29, 2024

Commit

aa89fea

·

verified ·

1 Parent(s): 0137a8d

Update README.md

Files changed (1) hide show

README.md +6 -5

README.md CHANGED Viewed

@@ -20,12 +20,12 @@ Install library and download sample audio.
 pip install faster-whisper
 wget https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0-ggml/resolve/main/sample_ja_speech.wav
 ```
-Inference with the kotoba-whisper-v2.0-faster.
 ```python
 from faster_whisper import WhisperModel
-model = WhisperModel("kotoba-tech/kotoba-whisper-v2.0-faster")
 segments, info = model.transcribe("sample_ja_speech.wav", language="ja", chunk_length=15, condition_on_previous_text=False)
 for segment in segments:
@@ -47,9 +47,10 @@ We measure the inference speed of different kotoba-whisper-v2.0 implementations
 |audio 4 | 5.6  | 35  | 126  | 69  |
 Scripts to re-run the experiment can be found bellow:
-* [whisper.cpp](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0-ggml/blob/main/benchmark.sh)
-* [faster-whisper](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0-faster/blob/main/benchmark.sh)
-* [hf pipeline](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0/blob/main/benchmark.sh)
 Also, currently whisper.cpp and faster-whisper support the [sequential long-form decoding](https://huggingface.co/distil-whisper/distil-large-v3#sequential-long-form),
 and only Huggingface pipeline supports the [chunked long-form decoding](https://huggingface.co/distil-whisper/distil-large-v3#chunked-long-form), which we empirically
  found better than the sequnential long-form decoding.

 pip install faster-whisper
 wget https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0-ggml/resolve/main/sample_ja_speech.wav
 ```
+Inference with the kotoba-whisper-bilingual-v1.0-faster.
 ```python
 from faster_whisper import WhisperModel
+model = WhisperModel("kotoba-tech/kotoba-whisper-bilingual-v1.0-faster")
 segments, info = model.transcribe("sample_ja_speech.wav", language="ja", chunk_length=15, condition_on_previous_text=False)
 for segment in segments:
 |audio 4 | 5.6  | 35  | 126  | 69  |
 Scripts to re-run the experiment can be found bellow:
+* [whisper.cpp](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0-ggml/blob/main/benchmark.sh)
+* [faster-whisper](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0-faster/blob/main/benchmark.sh)
+* [hf pipeline](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0/blob/main/benchmark.sh)
 Also, currently whisper.cpp and faster-whisper support the [sequential long-form decoding](https://huggingface.co/distil-whisper/distil-large-v3#sequential-long-form),
 and only Huggingface pipeline supports the [chunked long-form decoding](https://huggingface.co/distil-whisper/distil-large-v3#chunked-long-form), which we empirically
  found better than the sequnential long-form decoding.