Fast Diffusion GAN Model for Symbolic Music Generation Controlled by Emotions
Abstract
Diffusion models have shown promising results for a wide range of generative tasks with <PRE_TAG>continuous data</POST_TAG>, such as image and audio synthesis. However, little progress has been made on using <PRE_TAG>diffusion models</POST_TAG> to generate discrete symbolic music because this new class of generative models are not well suited for discrete data while its <PRE_TAG>iterative sampling</POST_TAG> process is computationally expensive. In this work, we propose a diffusion model combined with a <PRE_TAG>Generative Adversarial Network</POST_TAG>, aiming to (i) alleviate one of the remaining challenges in <PRE_TAG>algorithmic music generation</POST_TAG> which is the control of generation towards a <PRE_TAG>target emotion</POST_TAG>, and (ii) mitigate the slow sampling drawback of <PRE_TAG>diffusion models</POST_TAG> applied to <PRE_TAG>symbolic music generation</POST_TAG>. We first used a trained <PRE_TAG>Variational Autoencoder</POST_TAG> to obtain embeddings of a <PRE_TAG>symbolic music dataset</POST_TAG> with <PRE_TAG>emotion labels</POST_TAG> and then used those to train a diffusion model. Our results demonstrate the successful control of our diffusion model to generate symbolic music with a desired emotion. Our model achieves several orders of magnitude improvement in <PRE_TAG>computational cost</POST_TAG>, requiring merely four time steps to <PRE_TAG>denoise</POST_TAG> while the steps required by current <PRE_TAG>state-of-the-art <PRE_TAG><PRE_TAG>diffusion models</POST_TAG></POST_TAG></POST_TAG> for <PRE_TAG>symbolic music generation</POST_TAG> is in the order of thousands.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper