kiendt
/

PhoWhisper-large-ct2

Automatic Speech Recognition

Model card Files Files and versions Community

PhoWhisper-large-ct2 / README.md

kiendt's picture

Update README.md

abceb09 verified 3 months ago

|

history blame contribute delete

1.84 kB

	---
	license: mit
	language:
	- vi
	base_model:
	- vinai/PhoWhisper-large
	pipeline_tag: automatic-speech-recognition
	---
	# PhoWhisper-large-ct2

	This repository contains the PhoWhisper-large model converted to use CTranslate2 for faster inference. This allows for significant performance improvements, especially on CPU.

	## Usage

	1. Installation:
	Ensure you have the necessary libraries installed:
	```bash
	pip install transformers ctranslate2 faster-whisper
	```

	2. Conversion (only needed once):
	This step converts the original Hugging Face model to the CTranslate2 format.

	```bash
	ct2-transformers-converter --model vinai/PhoWhisper-large --output_dir PhoWhisper-large-ct2 --copy_files tokenizer_config.json --quantization float16
	```

	3. Transcription:

	```python
	import os
	from faster_whisper import WhisperModel

	model_size = "kiendt/PhoWhisper-large-ct2"
	# Run on GPU with FP16
	#model = WhisperModel(model_size, device="cuda", compute_type="float16")

	# or run on GPU with INT8
	# model = WhisperModel(model_size, device="cuda", compute_type="int8_float16")
	# or run on CPU with INT8
	model = WhisperModel(model_size, device="cpu", compute_type="int8")

	segments, info = model.transcribe("audio.wav", beam_size=5) # Replace audio.wav with your audio file

	print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

	for segment in segments:
	print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
	```


	## Model Details

	* Based on the `vinai/PhoWhisper-large` model.
	* Converted using `ct2-transformers-converter`.
	* Optimized for faster inference with CTranslate2.


	## Contributing

	Contributions are welcome! Please open an issue or submit a pull request.


	## License

	MIT