Whisper Basque (eu) - CTranslate2 Conversion (int8)

This is a CTranslate2 conversion of xezpeleta/whisper-tiny-eu designed for use with faster-whisper.

Model Details

  • Base Model: OpenAI Whisper Tiny (original model card: whisper-tiny)
  • Finetuned for: Basque (eu) speech recognition
  • Dataset: asierhv/composite_corpus_eu_v2.1 (Mozilla Common Voice 18.0 + Basque Parliament + OpenSLR)
  • Conversion Format: CTranslate2 (optimized for inference)
  • Quantization: int8 (optimized for CPU inference)
  • Compatibility: Designed for use with faster-whisper
  • WER: 13.56 on Mozilla Common Voice 18.0

Usage with faster-whisper

First install required packages:

pip install faster-whisper

Then use the following code snippet:

from faster_whisper import WhisperModel

# Load the model (FP16 precision)
model = WhisperModel("xezpeleta/whisper-tiny-eu-ct2", device="cuda", compute_type="float16")

# Transcribe audio file
segments, info = model.transcribe("audio.mp3", language="eu")

# Print transcription
for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

Evaluation

The model achieves 13.56% Word Error Rate (WER) on the Basque test split of Mozilla Common Voice 18.0.

Conversion details

Converted from the original HuggingFace model using:

ct2-transformers-converter --model xezpeleta/whisper-tiny-eu \
                           --output_dir whisper-tiny-eu-ct2 \
                           --copy_files tokenizer.json preprocessor_config.json \
                           --quantization float16
Downloads last month
0
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for xezpeleta/whisper-tiny-eu-ct2-int8

Finetuned
(2)
this model

Dataset used to train xezpeleta/whisper-tiny-eu-ct2-int8

Collection including xezpeleta/whisper-tiny-eu-ct2-int8

Evaluation results