Note: If you are looking for our latest dataset and model, please refer to the main README here: https://huggingface.co./ivrit-ai.
Background
This ASR model was trained on a private dataset containing approximately 310 hours of high-quality Hebrew data. Data was transcribed using professional transcription services.
Model name decoding:
<model name>-<size>-<dataset>-<epoch>
This specific model is a faster-whisper variant, large-v2 variant, trained on version 1 of our private dataset (pd1), and saved after one epoch.
Running the model
# Initialize the model
import faster_whisper
model = faster_whisper.WhisperModel('ivrit-ai/faster-whisper-v2-pd1-e1')
# Transcribe a media file
segs, _ = model.transcribe(mp3_file, language='he')
for seg in segs:
print(seg.text)
The segment object contains more data such as timestamps. Feel free to explore them.
- Downloads last month
- 9