Fine tuning Dhivehi language.
I have tried to fine tune based on your guide using common voice data 17 Dhivehi.
I am unable to get the tokens right. The metrics show that wer is lower than before, however when tried to transcribe using the fine tuned model, it throws garbage. Am I missing something? A little help would be highly appreciated.
Below is a sample transcribed data.
00:00:00,000 --> 00:00:05,000: The
00:00:05,000 --> 00:00:10,000: The
00:00:10,000 --> 00:00:15,000: F
00:00:15,000 --> 00:00:20,000: އެއްމެކޭ ގައުމެއްގެ އަމަށް ނަމާންކަން ބެލެހެއްޓުމެ�
00:00:20,000 --> 00:00:25,000: d
00:00:25,000 --> 00:00:30,000: އެޅޭކަތި ފިޔަވަޅުކުން ދުނިޔައި ބަދްނާން ވާކަށް އަހަ
00:00:30,000 --> 00:00:35,000: އެއްވެން އެއްވެން އެހެން ހީވާ ހުރިހާ އެއްޗެއް އަހަރ�
00:00:35,000 --> 00:00:40,000: n
00:00:40,000 --> 00:00:45,000: K
00:00:45,000 --> 00:00:50,000: އެއްގޭގެ އިންނާތީ އަހަރެ ހުންނަނީ ހިތްހަމަ ޖެހިންގެ
00:00:50,000 --> 00:00:55,000: އެއްގެނީ ކުންކަހަލަ ކަމެއްކަމެ އަހަންނަކަށް އޭރަކު
00:00:55,000 --> 00:01:00,000: އޮކާ ވައިބޯޔަކީ އަނާގެ ހުސްވަގުތު ފުރާލަދޭ ބައިވެރި
00:01:00,000 --> 00:01:05,000: �
00:01:05,000 --> 00:01:10,000: އެއްވެއް އެކުމެ ކާލައިގެން އަނާ ވަންނާނީ ކޮޓަރިއަށް
00:01:10,000 --> 00:01:15,000: d
00:01:15,000 --> 00:01:20,000: அண்ணா கே திரீ உழுமா பதுளேன் அண்ணணக்கண் அந்ணக்கண் என அகரணணணணணணணணணணணணணணணணணணணணணணணணணணணணணணணணணணணணணணணணணணணணணணணணண
00:01:20,000 --> 00:01:25,000: ބަސްމަދުވާކަށް ފާހަކަވާން ފެށޭ އަހަރެންނާ ވާހަކަ ދެ
00:01:25,000 --> 00:01:30,000: N
Thanks