|
--- |
|
language: |
|
- ms |
|
- en |
|
- zh |
|
- ta |
|
datasets: |
|
- mesolitica/Malaysian-STT-Whisper |
|
- malaysia-ai/STT-Whisper |
|
base_model: |
|
- openai/whisper-large-v3-turbo |
|
--- |
|
|
|
# Malaysian Finetune Whisper Large V3 Turbo |
|
|
|
Finetune Whisper Large V3 Turbo on Malaysian context. |
|
|
|
## Improvement |
|
|
|
1. Distilled from Whisper Large V3 on Malaysian and Science context. |
|
2. Better translation for Malay, Manglish, Mandarin, Tamil and Science context. |
|
3. Word level timestamp, introduced `<|transcribeprecise|>` token, **a new task!** |
|
|
|
## how we finetuned it? |
|
|
|
We done 2 phases, |
|
|
|
1. Finetune on [mesolitica/Malaysian-STT-Whisper](https://huggingface.co./datasets/mesolitica/Malaysian-STT-Whisper) |
|
- WanDB at https://wandb.ai/huseinzol05/malaysian-whisper-large-v3-turbo-v3?nw=nwuserhuseinzol05, **still on training** |
|
2. Annealing on 5% from [mesolitica/Malaysian-STT-Whisper](https://huggingface.co./datasets/mesolitica/Malaysian-STT-Whisper) and 100% from [malaysia-ai/STT-Whisper](https://huggingface.co./datasets/malaysia-ai/STT-Whisper), **still on training** |