Safetensors
whisper
huseinzol05's picture
Update README.md
feb1828 verified
|
raw
history blame
1.03 kB
metadata
language:
  - ms
  - en
  - zh
  - ta
datasets:
  - mesolitica/Malaysian-STT-Whisper
  - malaysia-ai/STT-Whisper
base_model:
  - openai/whisper-large-v3-turbo

Malaysian Finetune Whisper Large V3 Turbo

Finetune Whisper Large V3 Turbo on Malaysian context.

Improvement

  1. Distilled from Whisper Large V3 on Malaysian and Science context.
  2. Better translation for Malay, Manglish, Mandarin, Tamil and Science context.
  3. Word level timestamp, introduced <|transcribeprecise|> token, a new task!

how we finetuned it?

We done 2 phases,

  1. Finetune on mesolitica/Malaysian-STT-Whisper
  1. Annealing on 5% from mesolitica/Malaysian-STT-Whisper and 100% from malaysia-ai/STT-Whisper, still on training