Building a MusicGen API to Generate Custom Music Tracks Locally

Community Article Published December 4, 2024

The way we create and experience music is evolving, thanks to generative AI. Tools like MusicGen are at the forefront of this revolution, enabling developers and creators to produce unique audio from textual descriptions. Imagine generating an inspiring soundtrack or a soothing melody tailored to your exact needs—all with a simple API.

In this article, I’ll guide you through building a MusicGen API using facebook/musicgen-large, combining technical instruction with insights into why generative audio is reshaping the creative landscape.


The Power of MusicGen

MusicGen, developed by Meta, is a powerful text-to-audio model capable of creating diverse musical compositions based on prompts like "relaxing piano music" or "energetic dance beats." Its versatility makes it ideal for:

  • Personalized soundtracks for video content.
  • Ambient music for apps, games, or experiences.
  • Rapid prototyping for music producers.

Unlike traditional music creation, MusicGen doesn’t require advanced composition skills. This democratization of creativity is why generative AI is so transformative.


Setting Up Your Environment

Before we dive into code, let's ensure we have the right tools.

Requirements

To run the API locally, you’ll need:

  • Python 3.9+
  • A CUDA-compatible GPU for faster processing (though CPU works too).
  • Libraries: torch, transformers, FastAPI, uvicorn, scipy

Installation

Install the required libraries:

pip install torch transformers fastapi uvicorn scipy

This setup prepares your machine to run the facebook/musicgen-large model and handle audio processing seamlessly.


Introducing the MusicGen API

What Does the API Do?

The API:

  1. Accepts a prompt describing the desired music style.
  2. Allows users to specify the duration of the generated audio.
  3. Returns two unique audio tracks for variety.

We use FastAPI to manage the API endpoints, leveraging its high performance and automatic validation capabilities.


The Code: Building the MusicGen API

Here’s the full implementation of the API:

from fastapi import FastAPI, HTTPException, BackgroundTasks
from pydantic import BaseModel
import uvicorn
import os
import scipy.io.wavfile
import torch
from transformers import pipeline
import random
import traceback

app = FastAPI()

class MusicRequest(BaseModel):
    prompt: str
    duration: int  # Duration for each track

# Disable tokenizers parallelism warning
os.environ["TOKENIZERS_PARALLELISM"] = "false"

@app.post("/generate-music/")
async def generate_music(request: MusicRequest, background_tasks: BackgroundTasks):
    if request.duration <= 0:
        raise HTTPException(status_code=400, detail="Duration must be greater than zero")

    synthesiser = None

    try:
        # Set device (GPU or CPU)
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        print(f"Using device: {'CUDA' if device.type == 'cuda' else 'CPU'}")

        # Optionally limit GPU memory usage
        if device.type == 'cuda':
            try:
                torch.cuda.set_per_process_memory_fraction(0.8, device=0)
                print("Limited GPU memory usage to 80%")
            except Exception as mem_error:
                print(f"Failed to limit GPU memory: {mem_error}")

        # Load MusicGen Large model
        synthesiser = pipeline("text-to-audio", model="facebook/musicgen-large", device=0 if device.type == 'cuda' else -1)
        print("Model loaded successfully")

        # Generate two audio tracks using a random seed
        random_seed = random.randint(0, 2**32 - 1)
        torch.manual_seed(random_seed)
        if device.type == 'cuda':
            torch.cuda.manual_seed_all(random_seed)

        music1 = synthesiser(request.prompt, forward_params={"do_sample": True, "max_length": request.duration * 50})
        random_seed += 1
        torch.manual_seed(random_seed)
        if device.type == 'cuda':
            torch.cuda.manual_seed_all(random_seed)
        music2 = synthesiser(request.prompt, forward_params={"do_sample": True, "max_length": request.duration * 50})

        # Save audio files
        output1 = os.path.join(os.getcwd(), "song1.wav")
        scipy.io.wavfile.write(output1, rate=music1["sampling_rate"], data=music1["audio"])

        output2 = os.path.join(os.getcwd(), "song2.wav")
        scipy.io.wavfile.write(output2, rate=music2["sampling_rate"], data=music2["audio"])

        return {"song1": output1, "song2": output2}

    except Exception as e:
        traceback.print_exc()
        raise HTTPException(status_code=500, detail=f"Error generating music: {e}")

    finally:
        if synthesiser:
            del synthesiser
        torch.cuda.empty_cache()
        print("Cleaned up resources")

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)

Challenges and Solutions

GPU Memory Management

Problem: Long audio tracks can cause GPU memory issues.
Solution: Limit GPU memory usage to 80%:

torch.cuda.set_per_process_memory_fraction(0.8, device=0)

Model Initialization Overhead

Problem: Loading large models can delay response times.
Solution: Use FastAPI’s background tasks to handle operations asynchronously.


Running the API Locally

Start the API server:

uvicorn app:app --host 0.0.0.0 --port 8000

Example Request

To generate music, send a POST request to /generate-music/:

{
  "prompt": "calm and meditative music",
  "duration": 30
}

The API will return paths to two generated audio files (song1.wav and song2.wav).


Why Generative Audio Matters

Generative AI like MusicGen empowers creators to experiment with music in ways that were once unimaginable. Whether you’re prototyping a film score or adding background music to your game, this technology removes barriers to entry.

This democratization of music production enables anyone—from hobbyists to professionals—to create something unique and personal.


Next Steps: Elevating Your API

Consider extending this API by:

  1. Adding Post-Processing: Enhance audio with normalization and filters using libraries like Pydub.
  2. Frontend Integration: Build an interface for non-technical users to interact with the API.
  3. Cloud Deployment: Host the API on platforms like AWS or Azure for broader accessibility.

Conclusion

Generative AI is redefining the boundaries of music production. By combining models like MusicGen with tools like FastAPI, we’re not just building APIs—we’re creating new ways to express creativity.

If you’re interested in exploring how AI can enhance your workflow or want to build custom APIs, feel free to connect with me. Together, we can build the future of music and AI.

image/webp