Conversions to other formats
Hi,
Thank you so much for your work! 🤗
In order to use the model in different settings, I have already started converting it to optimised formats:
- GGML to use it with MacWhisper (by far the best way to use a Whisper model, unfortunately only on Macs)
- ONNX to use it with transformers.js in browsers (like this example with WebGPU)
- CT2 to use it with Speaches and deploy it in an organisation through an API
I think these different formats could be useful for other people so I've started to publish them on HuggingFace. Do you have any guidelines regarding this? Or would you prefer to publish these variants yourself?
I think it would also be great to have quantified or distilled versions (using faster-whisper or distil-whisper) at some point but I'm not sure how relevant they are and I don't have the competence to do it myself. Is it something that you are considering or are you welcoming community initiatives?
__Another huge MacWhisper fan here - but I thought about it a bit different, so I wrote to Jordi; __
I love Macwhisper. I have even convinced the IT department of [redacted] that I should be allowed to run it on my work Mac. That's no small feat ;)
... and on that same Mac, now traveling, I want to try and convert KB-Whisper Large, on Hugging Face, to GGML so I can install it in Macwhisper - but I'm not allowed to pip install anything on this machine - so I can't run the conversion :(
But I think it would be fairly easy for a genius like you to add a field on the model page where I can add https://huggingface.co./KBLab/kb-whisper-large and you'll fix the rest for me (and once it's done once, every other (Swedish) user of MacWhisper can also use it...)
- GGML to use it with MacWhisper (by far the best way to use a Whisper model, unfortunately only on Macs)
Surely this is the whisper.cpp ggml format? That's all platforms, not just macOS :)
https://github.com/ggerganov/whisper.cpp/
(And yes, that's how I use whisper models as well)
@troed
What in our comments made you assume we thought GGML was exclusively for MacOS?
MacWhisper, however, is an app that's only avaible on MacOS, and it was used as an example.
@PierreMesure , I can't see the GGML versions posted in your repository? Pretty please, would love to have them.
I think it would also be great to have quantified or distilled versions (using faster-whisper or distil-whisper) at some point but I'm not sure how relevant they are [...]
faster-whisper is used in rhasspy/wyoming-faster-whisper which powers the Home Assistant speech pipeline.
The conversion seems to be rather easy to do. I got it working by running the command below in a Docker container with ctranslate2 installed.
ct2-transformers-converter --model KBLab/kb-whisper-tiny --output_dir /var/data/kb-whisper-tiny-ct2 --copy_files tokenizer.json preprocessor_config.json --quantization float16
I'm also interested in hearing how KB wants to handle "offsprings" to these models. If its up to the community to upload the variants?
We prefer to host these formats ourselves in order to better be able to track usage statistics of the models.
I have just updated this repo to include faster-whisper
, onnx
and whisper.cpp
compatible versions of the model. Will look at updating the README tomorrow with usage examples for each format.
faster-whisper
should now work out of the box by just specifying this repo:
from faster_whisper import WhisperModel
model = WhisperModel(
"KBLab/kb-whisper-large",
device="cuda",
compute_type="float16",
download_root="cache-faster-whisper", # cache_dir
)
# Transcribe audio.wav
segments, info = model.transcribe("audio_mono.wav", condition_on_previous_text=False)
print("Detected language '%s' with probability %f" % (info.language, info.language_probability))
for segment in segments:
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
Onnx usage:
from optimum.onnxruntime import ORTModelForSpeechSeq2Seq
from transformers import AutoProcessor
processor = AutoProcessor.from_pretrained(
"KBLab/kb-whisper-large", cache_dir="cache", subfolder="onnx"
)
model = ORTModelForSpeechSeq2Seq.from_pretrained(
"KBLab/kb-whisper-large", cache_dir="cache", subfolder="onnx"
)
import soundfile as sf
audio = sf.read("audio.wav")
inputs = processor.feature_extractor(audio[0], sampling_rate=16000, return_tensors="pt")
model._supports_cache_class = False
gen_tokens = model.generate(**inputs)
processor.decode(gen_tokens[0])
Whisper.cpp requires downloading files with wget which I guess won't count in usage statistics. Will update tomorrow with example.
Wow, @Lauler this is great! I'll try to use your ONNX variants instead of my own in whisper.mesu.re. I agree with you, I think it's better to keep all formats under your organisation for consistency and statistics.
I'm not sure if all software will be able to handle the many files harmoniously though, if they are all in the same folder. But I guess that's easy to test.
Regarding ONNX, you might want to use the convert script in the transformers.js library and to include quants. In addition, you should use the code from this PR or the model won't work with transformers.js.
python -m scripts.convert --quantize --model_id KBLab/kb-whisper-tiny
Was inspired by NbAiLab/nb-whisper-large, that adds multiple formats to the same repo. transformers
and faster-whisper
seem clever enough to only download the relevant files depending on the backend you are using.
I converted the onnx using optimum
with the following settings:
os.makedirs("onnx", exist_ok=True)
subprocess.run(
[
"optimum-cli",
"export",
"onnx",
"--model",
model_path,
"--task",
"automatic-speech-recognition-with-past",
"onnx",
]
)
Seemed beneficial to be able to run inference with KV cache (automatic-speech-recognition-with-past). However, I guess this does not work with transformers.js.
It seems transformers.js
should work if there's a subfolder called onnx
in a repo where .onnx
files are located.
Transformers.js supports loading any model hosted on the Hugging Face Hub, provided it has ONNX weights (located in a subfolder called
onnx
). For more information on how to convert your PyTorch, TensorFlow, or JAX model to ONNX, see the conversion section. source.
I need to test a bit whether the transformer.js
conversion script also works with regular transformers
before pushing changes.
I struggled several hours with optimum-cli and with transformers.js's conversion script. Make sure you use the code in the PR to have working models.
I would like to release whisper.mesu.re in the coming days so please convert tiny, base, small (or I could do it for you if you want?) so I can point to your repos. 😊
Thanks @Lauler for adding the ggml model! I'd also be happy to have access to the other model sizes as ggml, e.g. for mobile applications or slower computers.
@PierreMesure this website is extremely cool! Do you have the same already for other languages? :O
@mbroedl It’s just a fork of the whisper-web project by @Xenova , you can find the source code at the bottom of the page.
Xenova’s project was downloading OpenAI’s models (ONNX versions), the main thing I did was pointing it to KB’s models and translating it. I made a couple of other improvements which I’ll try to submit as PRs if @Xenova is interested (the project seems a bit stale). I now enabled GPU support but it’s disabled by default as it seems to be hit and miss with the quants.
By the way, the app is now fetching the models from KB’s repos so I can confirm the ONNX versions work just as well as mine. I’ll delete mine soon.
I have added usage examples for faster-whisper
, WhisperX
, whisper.cpp
and onnx
in the README.
Every model has two GGML checkpoints. One without quantization and one with q5_0
. I'm not too knowledgeable about what quantized versions are popular and perform well, so if you have specific requests let me know.
Try the different libraries/formats that are supported: https://huggingface.co./KBLab/kb-whisper-large#usage
This is great!
It would be great to add library_name: ctranslate2
, this is how Speaches lists) what models can be used (so this is the way to make the models available in Speaches). Unfortunately, it doesn't seem to be possible to list several entries in library_name
so I'm not sure what solution there is. I opened an issue on their repo.
I have added ctranslate2
as a tag (see https://huggingface.co./docs/hub/en/model-cards#specifying-a-library). Maybe it can help them, if they perform a secondary check whether ctranslate2
exists among the tags of a repo.