Malay Parler TTS Mini V1

Finetuned https://huggingface.co./parler-tts/parler-tts-mini-v1 on Malay TTS dataset https://huggingface.co./datasets/mesolitica/tts-combine-annotated

Source code at https://github.com/mesolitica/malaya-speech/tree/master/session/parler-tts

Wandb at https://wandb.ai/huseinzol05/parler-speech?nw=nwuserhuseinzol05

how-to

import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf

device = "cuda:0" if torch.cuda.is_available() else "cpu"

model = ParlerTTSForConditionalGeneration.from_pretrained("mesolitica/malay-parler-tts-mini-v1").to(device)
tokenizer = AutoTokenizer.from_pretrained("mesolitica/malay-parler-tts-mini-v1")

speakers = [
    'Yasmin',
    'Osman',
    'Bunga',
    'Ariff',
    'Ayu',
    'Kamarul',
    'Danial',
    'Elina',
]

prompt = 'Husein zolkepli sangat comel dan kacak suka makan cendol'

for s in speakers:
    description = f"{s}'s voice, delivers a slightly expressive and animated speech with a moderate speed and pitch. The recording is of very high quality, with the speaker's voice sounding clear and very close up."

    input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device)
    prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

    generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
    audio_arr = generation.cpu()
    sf.write(f'{s}.mp3', audio_arr.numpy().squeeze(), 44100)
Downloads last month
89
Safetensors
Model size
878M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including mesolitica/malay-parler-tts-mini-v1