--- language: - ms - en - zh - ta datasets: - mesolitica/Malaysian-STT-Whisper - malaysia-ai/STT-Whisper base_model: - openai/whisper-large-v3-turbo --- # Malaysian Finetune Whisper Large V3 Turbo Finetune Whisper Large V3 Turbo on Malaysian context. ## Improvement 1. Distilled from Whisper Large V3 on Malaysian and Science context. 2. Better translation for Malay, Manglish, Mandarin, Tamil and Science context. 3. Word level timestamp, introduced `<|transcribeprecise|>` token, **a new task!** ## how we finetuned it? We done 2 phases, 1. Finetune on [mesolitica/Malaysian-STT-Whisper](https://huggingface.co./datasets/mesolitica/Malaysian-STT-Whisper) - WanDB at https://wandb.ai/huseinzol05/malaysian-whisper-large-v3-turbo-v3?nw=nwuserhuseinzol05, **still on training** 2. Annealing on 5% from [mesolitica/Malaysian-STT-Whisper](https://huggingface.co./datasets/mesolitica/Malaysian-STT-Whisper) and 100% from [malaysia-ai/STT-Whisper](https://huggingface.co./datasets/malaysia-ai/STT-Whisper), **still on training**