|
--- |
|
license: mit |
|
language: |
|
- vi |
|
base_model: |
|
- vinai/PhoWhisper-large |
|
pipeline_tag: automatic-speech-recognition |
|
--- |
|
# PhoWhisper-large-ct2 |
|
|
|
This repository contains the PhoWhisper-large model converted to use CTranslate2 for faster inference. This allows for significant performance improvements, especially on CPU. |
|
|
|
## Usage |
|
|
|
1. **Installation:** |
|
Ensure you have the necessary libraries installed: |
|
```bash |
|
pip install transformers ctranslate2 faster-whisper |
|
``` |
|
|
|
2. **Conversion (only needed once):** |
|
This step converts the original Hugging Face model to the CTranslate2 format. |
|
|
|
```bash |
|
ct2-transformers-converter --model vinai/PhoWhisper-large --output_dir PhoWhisper-large-ct2 --copy_files tokenizer_config.json --quantization float16 |
|
``` |
|
|
|
3. **Transcription:** |
|
|
|
```python |
|
import os |
|
from faster_whisper import WhisperModel |
|
|
|
model_size = "kiendt/PhoWhisper-large-ct2" |
|
# Run on GPU with FP16 |
|
#model = WhisperModel(model_size, device="cuda", compute_type="float16") |
|
|
|
# or run on GPU with INT8 |
|
# model = WhisperModel(model_size, device="cuda", compute_type="int8_float16") |
|
# or run on CPU with INT8 |
|
model = WhisperModel(model_size, device="cpu", compute_type="int8") |
|
|
|
segments, info = model.transcribe("audio.wav", beam_size=5) # Replace audio.wav with your audio file |
|
|
|
print("Detected language '%s' with probability %f" % (info.language, info.language_probability)) |
|
|
|
for segment in segments: |
|
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text)) |
|
``` |
|
|
|
|
|
## Model Details |
|
|
|
* Based on the `vinai/PhoWhisper-large` model. |
|
* Converted using `ct2-transformers-converter`. |
|
* Optimized for faster inference with CTranslate2. |
|
|
|
|
|
## Contributing |
|
|
|
Contributions are welcome! Please open an issue or submit a pull request. |
|
|
|
|
|
## License |
|
|
|
MIT |