Safetensors
whisper
huseinzol05's picture
Update README.md
feb1828 verified
|
raw
history blame
1.03 kB
---
language:
- ms
- en
- zh
- ta
datasets:
- mesolitica/Malaysian-STT-Whisper
- malaysia-ai/STT-Whisper
base_model:
- openai/whisper-large-v3-turbo
---
# Malaysian Finetune Whisper Large V3 Turbo
Finetune Whisper Large V3 Turbo on Malaysian context.
## Improvement
1. Distilled from Whisper Large V3 on Malaysian and Science context.
2. Better translation for Malay, Manglish, Mandarin, Tamil and Science context.
3. Word level timestamp, introduced `<|transcribeprecise|>` token, **a new task!**
## how we finetuned it?
We done 2 phases,
1. Finetune on [mesolitica/Malaysian-STT-Whisper](https://huggingface.co./datasets/mesolitica/Malaysian-STT-Whisper)
- WanDB at https://wandb.ai/huseinzol05/malaysian-whisper-large-v3-turbo-v3?nw=nwuserhuseinzol05, **still on training**
2. Annealing on 5% from [mesolitica/Malaysian-STT-Whisper](https://huggingface.co./datasets/mesolitica/Malaysian-STT-Whisper) and 100% from [malaysia-ai/STT-Whisper](https://huggingface.co./datasets/malaysia-ai/STT-Whisper), **still on training**