--- license: mit language: - vi base_model: - vinai/PhoWhisper-large pipeline_tag: automatic-speech-recognition --- # PhoWhisper-large-ct2 This repository contains the PhoWhisper-large model converted to use CTranslate2 for faster inference. This allows for significant performance improvements, especially on CPU. ## Usage 1. **Installation:** Ensure you have the necessary libraries installed: ```bash pip install transformers ctranslate2 faster-whisper ``` 2. **Conversion (only needed once):** This step converts the original Hugging Face model to the CTranslate2 format. ```bash ct2-transformers-converter --model vinai/PhoWhisper-large --output_dir PhoWhisper-large-ct2 --copy_files tokenizer_config.json --quantization float16 ``` 3. **Transcription:** ```python import os from faster_whisper import WhisperModel model_size = "kiendt/PhoWhisper-large-ct2" # Run on GPU with FP16 model = WhisperModel(model_size, device="cuda", compute_type="float16") # or run on GPU with INT8 # model = WhisperModel(model_size, device="cuda", compute_type="int8_float16") # or run on CPU with INT8 model = WhisperModel(model_size, device="cpu", compute_type="int8") segments, info = model.transcribe("audio.wav", beam_size=5) # Replace audio.wav with your audio file print("Detected language '%s' with probability %f" % (info.language, info.language_probability)) for segment in segments: print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text)) ``` ## Model Details * Based on the `vinai/PhoWhisper-large` model. * Converted using `ct2-transformers-converter`. * Optimized for faster inference with CTranslate2. ## Contributing Contributions are welcome! Please open an issue or submit a pull request. ## License MIT