This is a basic audio diffusion model using Unet. I've uploaded the weights and training code. The sample method of the model is used to generate whatever spoken digit you want. I used the awesome code provided by HuggingFace audio diffusers to generate Mel-spectrograms which were then used to train the model. For the model code I used the denoising-diffusion-pytorch repo found at https://github.com/lucidrains/denoising-diffusion-pytorch alt text alt text alt text alt text

The images found in the files are sample{epoch}{sample#}{digit}.jpg. They also have corresponding audio files. The audio is VERY quiet, so turn up the speakers to hear better. (Just don't forget to turn it down after!)

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Dataset used to train irow/conditional-audio-diffusion