ByT5 Song Lyrics

This is a Seq2Seq model trained on a karaoke dataset to predict syllables with pitch and timing from song lyrics.

As of writing, the model has only been trained on 1/2 of the full dataset. Expect the quality to improve later.

The Huggingface demo seems to produce outputs with a small sequence length. So what you see on the right will only make a prediction for the first two syllables.

Downloads last month
14
Safetensors
Model size
1.23B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.