metadata
language:
- ms
- en
- zh
- ta
datasets:
- mesolitica/Malaysian-STT-Whisper
- malaysia-ai/STT-Whisper
base_model:
- openai/whisper-large-v3-turbo
Malaysian Finetune Whisper Large V3 Turbo
Finetune Whisper Large V3 Turbo on Malaysian context.
Improvement
- Distilled from Whisper Large V3 on Malaysian and Science context.
- Better translation for Malay, Manglish, Mandarin, Tamil and Science context.
- Word level timestamp, introduced
<|transcribeprecise|>
token, a new task!
how we finetuned it?
We done 2 phases,
- Finetune on mesolitica/Malaysian-STT-Whisper
- WanDB at https://wandb.ai/huseinzol05/malaysian-whisper-large-v3-turbo-v3?nw=nwuserhuseinzol05, still on training
- Annealing on 5% from mesolitica/Malaysian-STT-Whisper and 100% from malaysia-ai/STT-Whisper, still on training