Note: If you are looking for our latest dataset and model, please refer to the main README here: https://huggingface.co./ivrit-ai.

Background

This ASR model was trained on a private dataset containing approximately 310 hours of high-quality Hebrew data. Data was transcribed using professional transcription services.

Model name decoding:

<model name>-<size>-<dataset>-<epoch>

This specific model is a faster-whisper variant, large-v2 variant, trained on version 1 of our private dataset (pd1), and saved after one epoch.

Running the model

# Initialize the model
import faster_whisper
model = faster_whisper.WhisperModel('ivrit-ai/faster-whisper-v2-pd1-e1')

# Transcribe a media file
segs, _ = model.transcribe(mp3_file, language='he')
for seg in segs:
    print(seg.text)

The segment object contains more data such as timestamps. Feel free to explore them.

Downloads last month
9
Inference API
Unable to determine this model’s pipeline type. Check the docs .