File size: 1,841 Bytes

---
license: mit
language:
- vi
base_model:
- vinai/PhoWhisper-large
pipeline_tag: automatic-speech-recognition
---
# PhoWhisper-large-ct2

This repository contains the PhoWhisper-large model converted to use CTranslate2 for faster inference.  This allows for significant performance improvements, especially on CPU.

## Usage

1. **Installation:**
   Ensure you have the necessary libraries installed:
   ```bash
   pip install transformers ctranslate2 faster-whisper
   ```

2. **Conversion (only needed once):**
    This step converts the original Hugging Face model to the CTranslate2 format.

   ```bash
   ct2-transformers-converter --model vinai/PhoWhisper-large --output_dir PhoWhisper-large-ct2 --copy_files tokenizer_config.json --quantization float16
   ```

3. **Transcription:**

    ```python
    import os
    from faster_whisper import WhisperModel

    model_size = "kiendt/PhoWhisper-large-ct2"
    # Run on GPU with FP16
    #model = WhisperModel(model_size, device="cuda", compute_type="float16")
    
    # or run on GPU with INT8
    # model = WhisperModel(model_size, device="cuda", compute_type="int8_float16")
    # or run on CPU with INT8
    model = WhisperModel(model_size, device="cpu", compute_type="int8")

    segments, info = model.transcribe("audio.wav", beam_size=5) # Replace audio.wav with your audio file

    print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

    for segment in segments:
        print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
    ```


## Model Details

* Based on the `vinai/PhoWhisper-large` model.
* Converted using `ct2-transformers-converter`.
* Optimized for faster inference with CTranslate2.


## Contributing

Contributions are welcome! Please open an issue or submit a pull request.


## License

MIT