Xabi Ezpeleta
Add int8 model v25.02-r1
371456c
metadata
license: apache-2.0
datasets:
  - asierhv/composite_corpus_eu_v2.1
language:
  - eu
metrics:
  - wer
model-index:
  - name: Whisper Base Basque
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice 18.0
          type: mozilla-foundation/common_voice_18_0
          config: eu
          split: test
          args:
            language: eu
        metrics:
          - name: Test WER
            type: wer
            value: 10.78
base_model:
  - xezpeleta/whisper-base-eu

Whisper Basque (eu) - CTranslate2 Conversion (int8)

This is a CTranslate2 conversion of xezpeleta/whisper-base-eu designed for use with faster-whisper.

Model Details

  • Base Model: OpenAI Whisper Base (original model card: whisper-base)
  • Finetuned for: Basque (eu) speech recognition
  • Dataset: asierhv/composite_corpus_eu_v2.1 (Mozilla Common Voice 18.0 + Basque Parliament + OpenSLR)
  • Conversion Format: CTranslate2 (optimized for inference)
  • Compatibility: Designed for use with faster-whisper
  • Quantization: int8 (ready for CPU inference)
  • WER: 10.78% on Mozilla Common Voice 18.0

Usage with faster-whisper

First install required packages:

pip install faster-whisper

Then use the following code snippet:

from faster_whisper import WhisperModel

# Load the model (FP16 precision)
model = WhisperModel("xezpeleta/whisper-base-eu-ct2", device="cuda", compute_type="float16")

# Transcribe audio file
segments, info = model.transcribe("audio.mp3", language="eu")

# Print transcription
for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

Evaluation

The model achieves 10.78% Word Error Rate (WER) on the Basque test split of Mozilla Common Voice 18.0.

Conversion details

Converted from the original HuggingFace model using:

ct2-transformers-converter --model xezpeleta/whisper-base-eu \
                           --output_dir whisper-base-eu-ct2 \
                           --copy_files tokenizer.json preprocessor_config.json \
                           --quantization float16