Update README.md
Browse files
README.md
CHANGED
@@ -17,7 +17,7 @@ tags:
|
|
17 |
[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/amaai-lab/text2midi)
|
18 |
</div>
|
19 |
|
20 |
-
**text2midi** is the first end-to-end model for generating MIDI files from textual descriptions. By leveraging pretrained large language models and a powerful autoregressive transformer decoder, **text2midi** allows users to create symbolic music that aligns with detailed textual prompts, including musical attributes like chords, tempo, and style.
|
21 |
|
22 |
🔥 Live demo available on [HuggingFace Spaces](https://huggingface.co/spaces/amaai-lab/text2midi).
|
23 |
|
@@ -99,7 +99,29 @@ pip install -r requirements-mac.txt
|
|
99 |
```
|
100 |
|
101 |
## Datasets
|
102 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
103 |
|
104 |
## Results of the Listening Study
|
105 |
|
@@ -154,20 +176,5 @@ accelerate launch train.py \
|
|
154 |
--epochs=40 \
|
155 |
```
|
156 |
|
157 |
-
## Inference
|
158 |
-
We spport inference on CUDA, MPS and cpu. Please make sure you have pip installed the correct requirement file (requirments.txt for CUDA, requirements-mac.txt for MPS)
|
159 |
-
```bash
|
160 |
-
python model/transformer_model.py --caption <your intended descriptions>
|
161 |
-
```
|
162 |
|
163 |
-
## Citation
|
164 |
-
If you use text2midi in your research, please cite:
|
165 |
-
```
|
166 |
-
@inproceedings{bhandari2025text2midi,
|
167 |
-
title={text2midi: Generating Symbolic Music from Captions},
|
168 |
-
author={Keshav Bhandari and Abhinaba Roy and Kyra Wang and Geeta Puri and Simon Colton and Dorien Herremans},
|
169 |
-
booktitle={Proceedings of the 39th AAAI Conference on Artificial Intelligence (AAAI 2025)},
|
170 |
-
year={2025}
|
171 |
-
}
|
172 |
-
```
|
173 |
|
|
|
17 |
[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/amaai-lab/text2midi)
|
18 |
</div>
|
19 |
|
20 |
+
**text2midi** is the first end-to-end model for generating MIDI files from textual descriptions. By leveraging pretrained large language models and a powerful autoregressive transformer decoder, **text2midi** allows users to create symbolic music that aligns with detailed textual prompts, including musical attributes like chords, tempo, and style. The details of the model are described in [this paper](https://arxiv.org/abs/2412.16526).
|
21 |
|
22 |
🔥 Live demo available on [HuggingFace Spaces](https://huggingface.co/spaces/amaai-lab/text2midi).
|
23 |
|
|
|
99 |
```
|
100 |
|
101 |
## Datasets
|
102 |
+
|
103 |
+
The model was trained using two datasets: [SymphonyNet](https://symphonynet.github.io/) for semi-supervised pretraining and MidiCaps for finetuning towards MIDI generation from captions.
|
104 |
+
The [MidiCaps dataset](https://huggingface.co/datasets/amaai-lab/MidiCaps) is a large-scale dataset of 168k MIDI files paired with rich text captions. These captions contain musical attributes such as key, tempo, style, and mood, making it ideal for text-to-MIDI generation tasks as described in [this paper](https://arxiv.org/abs/2406.02255).
|
105 |
+
|
106 |
+
|
107 |
+
## Inference
|
108 |
+
|
109 |
+
We spport inference on CUDA, MPS and cpu. Please make sure you have pip installed the correct requirement file (requirments.txt for CUDA, requirements-mac.txt for MPS)
|
110 |
+
```bash
|
111 |
+
python model/transformer_model.py --caption <your intended descriptions>
|
112 |
+
```
|
113 |
+
|
114 |
+
## Citation
|
115 |
+
|
116 |
+
If you use text2midi in your research, please cite:
|
117 |
+
```
|
118 |
+
@inproceedings{bhandari2025text2midi,
|
119 |
+
title={text2midi: Generating Symbolic Music from Captions},
|
120 |
+
author={Keshav Bhandari and Abhinaba Roy and Kyra Wang and Geeta Puri and Simon Colton and Dorien Herremans},
|
121 |
+
booktitle={Proceedings of the 39th AAAI Conference on Artificial Intelligence (AAAI 2025)},
|
122 |
+
year={2025}
|
123 |
+
}
|
124 |
+
```
|
125 |
|
126 |
## Results of the Listening Study
|
127 |
|
|
|
176 |
--epochs=40 \
|
177 |
```
|
178 |
|
|
|
|
|
|
|
|
|
|
|
179 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
180 |
|