iris2c commited on
Commit
0939bba
Β·
verified Β·
1 Parent(s): 74cec88

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -16
README.md CHANGED
@@ -29,6 +29,12 @@ tags:
29
  <a href="https://modelscope.cn/models/iic/InspireMusic-1.5B-Long" target="_blank">
30
  <img alt="Model" src="https://img.shields.io/badge/InspireMusic-Model-green"></a>
31
 
 
 
 
 
 
 
32
  <a href="https://arxiv.org/abs/" target="_blank">
33
  <img alt="Paper" src="https://img.shields.io/badge/arXiv-Paper-lightgrey"></a>
34
  <a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">
@@ -77,13 +83,14 @@ InspireMusic is a fundamental AIGC toolkit and models designed for music, song,
77
  ## Highlights
78
  **InspireMusic** focuses on music generation, song generation and audio generation.
79
  - A unified framework for music/song/audio generation. Controllable with text prompts, music genres, music structures, etc.
80
- - Support text-to-music, music continuation, audio super-resolution, audio reconstruction tasks with high audio quality, with available sampling rates of 24kHz, 48kHz.
81
- - Support long audio generation in multiple output audio formats, i.e., wav, flac, mp3, m4a.
82
  - Convenient fine-tuning and inference. Support mixed precision training (FP16, FP32). Provide convenient fine-tuning and inference scripts and strategies, allowing users to easily fine-tune their music generation models.
83
 
84
  <a name="What's News"></a>
85
  ## What's New πŸ”₯
86
 
 
87
  - 2025/01: Open-source [InspireMusic-Base](https://modelscope.cn/models/iic/InspireMusic/summary), [InspireMusic-Base-24kHz](https://modelscope.cn/models/iic/InspireMusic-Base-24kHz/summary), [InspireMusic-1.5B](https://modelscope.cn/models/iic/InspireMusic-1.5B/summary), [InspireMusic-1.5B-24kHz](https://modelscope.cn/models/iic/InspireMusic-1.5B-24kHz/summary), [InspireMusic-1.5B-Long](https://modelscope.cn/models/iic/InspireMusic-1.5B-Long/summary) models for music generation. Models are available on both ModelScope and HuggingFace.
88
  - 2024/12: Support to generate 48kHz audio with super resolution flow matching.
89
  - 2024/11: Welcome to preview πŸ‘‰πŸ» [**InspireMusic Demos**](https://iris2c.github.io/InspireMusic) πŸ‘ˆπŸ». We're excited to share this with you and are working hard to bring even more features and models soon. Your support and feedback mean a lot to us!
@@ -94,10 +101,27 @@ InspireMusic is a fundamental AIGC toolkit and models designed for music, song,
94
  > This repo contains the algorithm infrastructure and some simple examples. Currently only support English text prompts.
95
 
96
  > [!Tip]
97
- > To explore the performance, please refer to [InspireMusic Demo Page](https://iris2c.github.io/InspireMusic). We will open-source better & larger models and demo space soon.
98
 
99
  InspireMusic is a unified music, song and audio generation framework through the audio tokenization and detokenization process integrated with a large autoregressive transformer. The original motive of this toolkit is to empower the common users to innovate soundscapes and enhance euphony in research through music, song, and audio crafting. The toolkit provides both inference and training code for AI generative models that create high-quality music. Featuring a unified framework, InspireMusic incorporates autoregressive Transformer and conditional flow-matching modeling (CFM), allowing for the controllable generation of music, songs, and audio with both textual and structural music conditioning, as well as neural audio tokenizers. Currently, the toolkit supports text-to-music generation and plans to expand its capabilities to include text-to-song and text-to-audio generation in the future.
100
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
101
  ## Installation
102
 
103
  ### Clone
@@ -122,7 +146,7 @@ cd InspireMusic
122
  # pynini is required by WeTextProcessing, use conda to install it as it can be executed on all platforms.
123
  conda install -y -c conda-forge pynini==2.1.5
124
  pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com
125
- # install flash attention to speedup training
126
  pip install flash-attn --no-build-isolation
127
  ```
128
  Currently support on CUDA Version 11.x.
@@ -242,18 +266,18 @@ git clone https://www.modelscope.cn/iic/InspireMusic-1.5B-Long.git pretrained_mo
242
  Currently, we open source the music generation models support 24KHz mono and 48KHz stereo audio.
243
  The table below presents the links to the ModelScope and Huggingface model hub. More models will be available soon.
244
 
245
- | Model name | Model Links | Remarks |
246
- |---------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------|
247
- | InspireMusic-Base-24kHz | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-Base-24kHz/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-Base-24kHz) | Pre-trained Music Generation Model, 24kHz mono, 30s |
248
- | InspireMusic-Base | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-Base) | Pre-trained Music Generation Model, 48kHz, 30s |
249
- | InspireMusic-1.5B-24kHz | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-1.5B-24kHz/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-1.5B-24kHz) | Pre-trained Music Generation 1.5B Model, 24kHz mono, 30s |
250
- | InspireMusic-1.5B | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-1.5B/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-1.5B) | Pre-trained Music Generation 1.5B Model, 48kHz, 30s |
251
- | InspireMusic-1.5B-Long ⭐ | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-1.5B-Long/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-1.5B-Long) | Pre-trained Music Generation 1.5B Model, 48kHz, support long-form music generation |
252
- | InspireSong-1.5B | [![model](https://img.shields.io/badge/ModelScope-Model-lightgrey.svg)]() [![model](https://img.shields.io/badge/HuggingFace-Model-lightgrey.svg)]() | Pre-trained Song Generation 1.5B Model, 48kHz stereo |
253
- | InspireAudio-1.5B | [![model](https://img.shields.io/badge/ModelScope-Model-lightgrey.svg)]() [![model](https://img.shields.io/badge/HuggingFace-Model-lightgrey.svg)]() | Pre-trained Audio Generation 1.5B Model, 48kHz stereo |
254
- | Wavtokenizer[<sup>[1]</sup>](https://openreview.net/forum?id=yBlVlS2Fd9) (75Hz) | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-1.5B-Long/file/view/master?fileName=wavtokenizer%252Fmodel.pt) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-1.5B-Long/tree/main/wavtokenizer) | An extreme low bitrate audio tokenizer for music with one codebook at 24kHz audio. |
255
- | Music_tokenizer (75Hz) | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-1.5B-24kHz/file/view/master?fileName=music_tokenizer%252Fmodel.pt) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-1.5B-24kHz/tree/main/music_tokenizer) | A music tokenizer based on HifiCodec<sup>[2]</sup> at 24kHz audio. |
256
- | Music_tokenizer (150Hz) | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-1.5B-Long/file/view/master?fileName=music_tokenizer%252Fmodel.pt) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-1.5B-Long/tree/main/music_tokenizer) | A music tokenizer based on HifiCodec at 48kHz audio. |
257
 
258
  ## Basic Usage
259
 
 
29
  <a href="https://modelscope.cn/models/iic/InspireMusic-1.5B-Long" target="_blank">
30
  <img alt="Model" src="https://img.shields.io/badge/InspireMusic-Model-green"></a>
31
 
32
+ <a href="https://huggingface.co/spaces/FunAudioLLM/InspireMusic" target="_blank">
33
+ <img alt="Space" src="https://img.shields.io/badge/Spaces-ModelScope-pink?labelColor=%20%237b8afb&label=Spaces&color=%20%230a5af8"></a>
34
+
35
+ <a href="https://huggingface.co/spaces/FunAudioLLM/InspireMusic" target="_blank">
36
+ <img alt="Space" src="https://img.shields.io/badge/HuggingFace-Spaces?labelColor=%20%239b8afb&label=Spaces&color=%20%237a5af8"></a>
37
+
38
  <a href="https://arxiv.org/abs/" target="_blank">
39
  <img alt="Paper" src="https://img.shields.io/badge/arXiv-Paper-lightgrey"></a>
40
  <a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">
 
83
  ## Highlights
84
  **InspireMusic** focuses on music generation, song generation and audio generation.
85
  - A unified framework for music/song/audio generation. Controllable with text prompts, music genres, music structures, etc.
86
+ - Support music generation tasks with high audio quality, with available sampling rates of 24kHz, 48kHz.
87
+ - Support long-form audio generation.
88
  - Convenient fine-tuning and inference. Support mixed precision training (FP16, FP32). Provide convenient fine-tuning and inference scripts and strategies, allowing users to easily fine-tune their music generation models.
89
 
90
  <a name="What's News"></a>
91
  ## What's New πŸ”₯
92
 
93
+ - 2025/02: InspireMusic demo is available on [ModelScope Space](https://modelscope.cn/studios/iic/InspireMusic/summary) and [HuggingFace Space](https://huggingface.co/spaces/FunAudioLLM/InspireMusic).
94
  - 2025/01: Open-source [InspireMusic-Base](https://modelscope.cn/models/iic/InspireMusic/summary), [InspireMusic-Base-24kHz](https://modelscope.cn/models/iic/InspireMusic-Base-24kHz/summary), [InspireMusic-1.5B](https://modelscope.cn/models/iic/InspireMusic-1.5B/summary), [InspireMusic-1.5B-24kHz](https://modelscope.cn/models/iic/InspireMusic-1.5B-24kHz/summary), [InspireMusic-1.5B-Long](https://modelscope.cn/models/iic/InspireMusic-1.5B-Long/summary) models for music generation. Models are available on both ModelScope and HuggingFace.
95
  - 2024/12: Support to generate 48kHz audio with super resolution flow matching.
96
  - 2024/11: Welcome to preview πŸ‘‰πŸ» [**InspireMusic Demos**](https://iris2c.github.io/InspireMusic) πŸ‘ˆπŸ». We're excited to share this with you and are working hard to bring even more features and models soon. Your support and feedback mean a lot to us!
 
101
  > This repo contains the algorithm infrastructure and some simple examples. Currently only support English text prompts.
102
 
103
  > [!Tip]
104
+ > To explore the performance, please refer to [InspireMusic Demo Page](https://iris2c.github.io/InspireMusic). We will open-source better & larger models soon.
105
 
106
  InspireMusic is a unified music, song and audio generation framework through the audio tokenization and detokenization process integrated with a large autoregressive transformer. The original motive of this toolkit is to empower the common users to innovate soundscapes and enhance euphony in research through music, song, and audio crafting. The toolkit provides both inference and training code for AI generative models that create high-quality music. Featuring a unified framework, InspireMusic incorporates autoregressive Transformer and conditional flow-matching modeling (CFM), allowing for the controllable generation of music, songs, and audio with both textual and structural music conditioning, as well as neural audio tokenizers. Currently, the toolkit supports text-to-music generation and plans to expand its capabilities to include text-to-song and text-to-audio generation in the future.
107
 
108
+ ## InspireMusic
109
+ <p align="center">
110
+ <table>
111
+ <tr>
112
+ <td style="text-align:center;">
113
+ <img alt="Light" src="asset/InspireMusic.png" width="100%" />
114
+ </tr>
115
+ <tr>
116
+ <td style="text-align:center;">
117
+ <b>Figure 1.</b> An overview of the InspireMusic framework.
118
+
119
+ We introduce InspireMusic, a unified framework for music, song and audio generation, capable of producing 48kHz long-form audio. InspireMusic employs an autoregressive transformer to generate music tokens in response to textual input. Complementing this, an ODE-based diffusion model, specifically flow matching, is utilized to reconstruct latent features from these generated music tokens. Then a vocoder generates audio waveforms from the reconstructed features. for input text, an ODE-based diffusion model, flow matching, to reconstruct latent features from the generated music tokens, and a vocoder to generate audio waveforms. InspireMusic is capable of text-to-music, music continuation, music reconstruction, and music super resolution tasks. It employs WavTokenizer as an audio tokenizer to convert 24kHz audio into 75Hz discrete tokens, while HifiCodec serves as a music tokenizer, transforming 48kHz audio into 150Hz latent features compatible with the flow matching model.
120
+ </td>
121
+ </tr>
122
+ </table>
123
+ </p>
124
+
125
  ## Installation
126
 
127
  ### Clone
 
146
  # pynini is required by WeTextProcessing, use conda to install it as it can be executed on all platforms.
147
  conda install -y -c conda-forge pynini==2.1.5
148
  pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com
149
+ # install flash attention to speedup training, support version 2.6.3
150
  pip install flash-attn --no-build-isolation
151
  ```
152
  Currently support on CUDA Version 11.x.
 
266
  Currently, we open source the music generation models support 24KHz mono and 48KHz stereo audio.
267
  The table below presents the links to the ModelScope and Huggingface model hub. More models will be available soon.
268
 
269
+ | Model name | Model Links | Remarks |
270
+ |---------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------|
271
+ | InspireMusic-Base-24kHz | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-Base-24kHz/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-Base-24kHz) | Pre-trained Music Generation Model, 24kHz mono, 30s |
272
+ | InspireMusic-Base | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-Base) | Pre-trained Music Generation Model, 48kHz, 30s |
273
+ | InspireMusic-1.5B-24kHz | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-1.5B-24kHz/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-1.5B-24kHz) | Pre-trained Music Generation 1.5B Model, 24kHz mono, 30s |
274
+ | InspireMusic-1.5B | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-1.5B/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-1.5B) | Pre-trained Music Generation 1.5B Model, 48kHz, 30s |
275
+ | InspireMusic-1.5B-Long ⭐ | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-1.5B-Long/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-1.5B-Long) | Pre-trained Music Generation 1.5B Model, 48kHz, support long-form music generation more than 5mins |
276
+ | InspireSong-1.5B | [![model](https://img.shields.io/badge/ModelScope-Model-lightgrey.svg)]() [![model](https://img.shields.io/badge/HuggingFace-Model-lightgrey.svg)]() | Pre-trained Song Generation 1.5B Model, 48kHz stereo |
277
+ | InspireAudio-1.5B | [![model](https://img.shields.io/badge/ModelScope-Model-lightgrey.svg)]() [![model](https://img.shields.io/badge/HuggingFace-Model-lightgrey.svg)]() | Pre-trained Audio Generation 1.5B Model, 48kHz stereo |
278
+ | Wavtokenizer[<sup>[1]</sup>](https://openreview.net/forum?id=yBlVlS2Fd9) (75Hz) | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-1.5B-Long/file/view/master?fileName=wavtokenizer%252Fmodel.pt) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-1.5B-Long/tree/main/wavtokenizer) | An extreme low bitrate audio tokenizer for music with one codebook at 24kHz audio. |
279
+ | Music_tokenizer (75Hz) | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-1.5B-24kHz/file/view/master?fileName=music_tokenizer%252Fmodel.pt) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-1.5B-24kHz/tree/main/music_tokenizer) | A music tokenizer based on HifiCodec<sup>[2]</sup> at 24kHz audio. |
280
+ | Music_tokenizer (150Hz) | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-1.5B-Long/file/view/master?fileName=music_tokenizer%252Fmodel.pt) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-1.5B-Long/tree/main/music_tokenizer) | A music tokenizer based on HifiCodec at 48kHz audio. |
281
 
282
  ## Basic Usage
283