PyTorch
music
text-to-music
symbolic-music
dorienh commited on
Commit
5a91258
·
verified ·
1 Parent(s): ee1cbd9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -17
README.md CHANGED
@@ -17,7 +17,7 @@ tags:
17
  [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/amaai-lab/text2midi)
18
  </div>
19
 
20
- **text2midi** is the first end-to-end model for generating MIDI files from textual descriptions. By leveraging pretrained large language models and a powerful autoregressive transformer decoder, **text2midi** allows users to create symbolic music that aligns with detailed textual prompts, including musical attributes like chords, tempo, and style.
21
 
22
  🔥 Live demo available on [HuggingFace Spaces](https://huggingface.co/spaces/amaai-lab/text2midi).
23
 
@@ -99,7 +99,29 @@ pip install -r requirements-mac.txt
99
  ```
100
 
101
  ## Datasets
102
- The MidiCaps dataset is a large-scale dataset of 168k MIDI files paired with rich text captions. These captions contain musical attributes such as key, tempo, style, and mood, making it ideal for text-to-MIDI generation tasks.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
103
 
104
  ## Results of the Listening Study
105
 
@@ -154,20 +176,5 @@ accelerate launch train.py \
154
  --epochs=40 \
155
  ```
156
 
157
- ## Inference
158
- We spport inference on CUDA, MPS and cpu. Please make sure you have pip installed the correct requirement file (requirments.txt for CUDA, requirements-mac.txt for MPS)
159
- ```bash
160
- python model/transformer_model.py --caption <your intended descriptions>
161
- ```
162
 
163
- ## Citation
164
- If you use text2midi in your research, please cite:
165
- ```
166
- @inproceedings{bhandari2025text2midi,
167
- title={text2midi: Generating Symbolic Music from Captions},
168
- author={Keshav Bhandari and Abhinaba Roy and Kyra Wang and Geeta Puri and Simon Colton and Dorien Herremans},
169
- booktitle={Proceedings of the 39th AAAI Conference on Artificial Intelligence (AAAI 2025)},
170
- year={2025}
171
- }
172
- ```
173
 
 
17
  [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/amaai-lab/text2midi)
18
  </div>
19
 
20
+ **text2midi** is the first end-to-end model for generating MIDI files from textual descriptions. By leveraging pretrained large language models and a powerful autoregressive transformer decoder, **text2midi** allows users to create symbolic music that aligns with detailed textual prompts, including musical attributes like chords, tempo, and style. The details of the model are described in [this paper](https://arxiv.org/abs/2412.16526).
21
 
22
  🔥 Live demo available on [HuggingFace Spaces](https://huggingface.co/spaces/amaai-lab/text2midi).
23
 
 
99
  ```
100
 
101
  ## Datasets
102
+
103
+ The model was trained using two datasets: [SymphonyNet](https://symphonynet.github.io/) for semi-supervised pretraining and MidiCaps for finetuning towards MIDI generation from captions.
104
+ The [MidiCaps dataset](https://huggingface.co/datasets/amaai-lab/MidiCaps) is a large-scale dataset of 168k MIDI files paired with rich text captions. These captions contain musical attributes such as key, tempo, style, and mood, making it ideal for text-to-MIDI generation tasks as described in [this paper](https://arxiv.org/abs/2406.02255).
105
+
106
+
107
+ ## Inference
108
+
109
+ We spport inference on CUDA, MPS and cpu. Please make sure you have pip installed the correct requirement file (requirments.txt for CUDA, requirements-mac.txt for MPS)
110
+ ```bash
111
+ python model/transformer_model.py --caption <your intended descriptions>
112
+ ```
113
+
114
+ ## Citation
115
+
116
+ If you use text2midi in your research, please cite:
117
+ ```
118
+ @inproceedings{bhandari2025text2midi,
119
+ title={text2midi: Generating Symbolic Music from Captions},
120
+ author={Keshav Bhandari and Abhinaba Roy and Kyra Wang and Geeta Puri and Simon Colton and Dorien Herremans},
121
+ booktitle={Proceedings of the 39th AAAI Conference on Artificial Intelligence (AAAI 2025)},
122
+ year={2025}
123
+ }
124
+ ```
125
 
126
  ## Results of the Listening Study
127
 
 
176
  --epochs=40 \
177
  ```
178
 
 
 
 
 
 
179
 
 
 
 
 
 
 
 
 
 
 
180