Is it possible to add faster-whisper as a backend?

#18
by FlippFuzz - opened

Faster-whisper is around 4x faster on GPU. https://github.com/guillaumekln/faster-whisper

Is it possible to add this as a backend?
Perhaps a drop-down/CLI flag where users can choose between the default whisper and faster-whisper?

That's quite an optimization - I was able to run the large-v2 model on my RTX 2080 Super 8GB, unlike the default Whisper implementation. The lower memory requirements alone can really help make Whisper easier to deploy. It also appears to be a lot faster, the 4x figure is probably not far off.

So I spent some time adding it as a backend to the WebUI, and it is now done. Though in order to run it, it is recommended that you create a new virtual environment, install CUDA and cuDNN, and then the requirements for fast-whisper:

pip install -r requirements-fasterWhisper.txt

Then you can switch to fast-whisper in the UI/CLI using a command line argument:

python app.py --whisper_implementation faster-whisper --input_audio_max_duration -1 --server_name 0.0.0.0 --auto_parallel True

You can also use the environment variable WHISPER_IMPLEMENTATION, or change the field whisper_implementation in config.json5.

Finally, I've also published this as a Docker container at registry.gitlab.com/aadnk/whisper-webui:latest-fastest:

sudo docker run -d --gpus all -p 7860:7860 \
--mount type=bind,source=/home/administrator/.cache/whisper,target=/root/.cache/whisper \
--mount type=bind,source=/home/administrator/.cache/huggingface,target=/root/.cache/huggingface \
--restart=on-failure:15 registry.gitlab.com/aadnk/whisper-webui:latest-fastest \
app.py --input_audio_max_duration -1 --server_name 0.0.0.0 --auto_parallel True \
--default_vad silero-vad --default_model_name large-v2

EDIT: Changed to faster-whisper

aadnk changed discussion status to closed
aadnk changed discussion status to open

It looks great! Thanks for implementing!

No problem. ๐Ÿ˜€

I've also made a separate space for Faster Whisper, so people can try it out directly:

The only difference is that I've set "whisper_implementation" to "faster-whisper" in the config, and also updated the README and requirements.txt.

Sorry, is it also possible to add float32 to --compute_type?
Float32 is the default and int8 is the less precise version for CPU.

I just added float32 to the CLI options - try upgrading your Git repository.

It's set to "auto" by default, however, so it should pick the correct compute type depending on the hardware. But yeah, it will likely downgrade to int8 when running on CPU.

python app.py --whisper_implementation fast-whisper --input_audio_max_duration -1 --server_name 0.0.0.0 --auto_parallel True

I think it has to be --whisper_implementation faster-whisper, instead of fast-whisper, right?

Trying it out now ๐Ÿ˜„Thanks for implementing so quickly

EDIT:
I'm not sure if i did something incorrectly, but i got this:

Repository Not Found for url: https://huggingface.co./api/models/guillaumekln/faster-whisper-large/revision/main.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.

EDIT 2:
Using Large-V2, it works. (SUPER FAST ๐Ÿ˜„)

It's all good. Able to get float32 too.
Thanks again @aadnk .

FlippFuzz changed discussion status to closed

Trying it out now ๐Ÿ˜„Thanks for implementing so quickly

No problem: ๐Ÿ˜€

I think it has to be --whisper_implementation faster-whisper, instead of fast-whisper, right?

Ah, sorry, I initially called it "fast-whisper" by mistake, but I've since renamed it to "faster-whisper". I must have forgotten to update the command line in my comment.

Sign up or log in to comment