Feedback and suggestions

#18
by asigalov61 - opened

Hey @skytnt

I wanted to stop by and give you some feedback and suggestions about your updated model...

First of all, thank you for continuing to develop this project :)

Secondly, I wanted to thank you for finally fixing the visualizer :) its working again with the latest gradio so I will probably re-implement it into my projects again :)

Now, in regard to 1.2 model update.... I am happy to see that you improved the model. It plays much better now. Howver, the model still struggles with continuations, especially solo piano continuations.

To fix this:

  1. Add more performance solo piano compositions to the training dataset
  2. Train on more MIDIs
  3. Consider to reducing augmentation, quantization and filtering.
  4. Increase model complexity. Usually, for good continuation results you would need 1024 embed with at least 24 layers.

Also, I would add RoPE to your transformer implementation because it seems to improve results as well.

Let me know what you think.

Sincerely,

Alex

i would like to add on to this. specifically with time signatures. i personally want to be able to generate some midi in a certain time signature other than 4/4 and 3/4. also, when exporting an "odd midi" usually the midi is labeled as if its in 4/4 when in reality there's a lot of time signature changes (since the model is inconsistent with timing at times).

Hey @asigalov61
Thank you for your feedback and suggestions. However, I would like to clarify a few points.

  1. I used your dataset (Los-Angeles-MIDI-Dataset) plus my own dataset and filtered it using MIDITokenizer.check_quality. There are many piano solos in the dataset. If they have no quality issues, they will be retained for training. The reason why solo piano continuations are not good is probably because there are still a lot of junk piano midis in the dataset that have not been cleaned up.
  2. After filtering, your dataset still has 266k midi, which I think should be enough.
  3. Augmentation can prevent overfitting. The current degree of augmentation is relatively small and should not have any impact on the results.
  4. Quantization to sixty-fourth notes should be enough for most midi, and you can hardly hear the difference. For the model, your note positions must be quantized because the tokens are discrete.
  5. Filtering is crucial for the dataset. It can remove bad midi in the dataset, especially chaotic midi. In filtering, alignment checks whether the notes are aligned to the bar. Some midi may not be aligned because they are recorded or tempo is lost. Removing the unaligned midi is to facilitate the use of the generated midi. Tonality checks whether the music has tonality, which is helpful for checking chaotic midi. The bandwidth check is to remove those MIDIs that are just simple melody lines. The density check is to check the note density. If the density is too high, the model may have difficulty fitting. This can also check the chaotic MIDI, which may be obtained through other AI transcriptions or spectrum analysis. The piano check is to remove the MIDIs where all channel instruments are pianos. These MIDIs do not have any instruments arranged, but are only written in the track name, so that all channel instruments are pianos. This kind of MIDI does not help fitting.
  6. RoPE seems to be already integrated into LLaMA itself.
  7. I also want to increase the complexity of the model. But at the moment I don't have enough money to continue training.

Finally, I want to say that we must come up with a high-quality midi dataset. Because the quality of the dataset determines the upper limit of the model. No matter how we design the model, if the dataset is poor, the model performance will still be poor.

Owner

i would like to add on to this. specifically with time signatures. i personally want to be able to generate some midi in a certain time signature other than 4/4 and 3/4. also, when exporting an "odd midi" usually the midi is labeled as if its in 4/4 when in reality there's a lot of time signature changes (since the model is inconsistent with timing at times).

Currently midi-model does not have a time signatures event, I will consider adding it in the next version

@skytnt Thank you very much for your detailed response :)

I absolutely agree with you on all the points. However, I need to point out that I was talking about continuation, not generation. What I mentioned worked very well for my models/projects so I simply wanted to share it all with you so that you can improve your model as well. There is no universal approach and I may even be wrong about some of it, so I am not claiming absolute authority here. But please do consider what I have suggested to you because your model struggles with continuations, especially at the transition points and especially on solo piano performances.

Here are my baseline seed MIDIs:
https://github.com/asigalov61/Giant-Music-Transformer/tree/main/Seeds

Try it out, especially the Piano ones so that you can see what I am talking about.

Other than that, I definitely agree with you about the MIDI datasets. OpenAI used custom, hand-picked and hand-processed dataset for MuseNet, which they never shared, even though it was based upon public collections of MIDIs. So I most certainly agree with you that we should do what we can (as a community) to create one for open-source AI.

In regard to model complexity and your budget issues... I may be able to help with that if you are willing to collaborate and also if you can fix continuation issue. I would be more than happy to help you to train large version of your model because i do have some money for that and because I really liked your work. So please let me know if you want to colab.

I also wanted to ask you if you would be willing to dedicate some of your time and skills to create a proper front-end for your model? I can't help with that but I would be able to help with the hosting and backend stuff. Specifically, I wanted to ask you to check out the following two implementations:

https://github.com/stevenwaterman/musetree

https://github.com/bearpelican/musicautobot_vueapp

Ideally, I would love to combine those two together and also add additional features, so that myself and other interested people have a nice and useful frontend/interface for Symbolic Music AI models. In fact, since HF provides the API for spaces, your current space can be easily serve as a backend for such frontend/interface.

Please let me know what you think :)

Sincerely,

Alex

Owner

@asigalov61 Thank you for your reply.

I tried your MIDI file and found that there is indeed a continuation problem. I have a few guesses about the possible causes of this issue:

  • The model might be overfitting, as it was trained on the v1.1 base model and the dataset is a subset of a previous one. You could try training from scratch and increasing the weight decay to prevent overfitting.

  • Some good, valid MIDI data might have been filtered out. I think there was an issue with the MIDITokenizer.check_quality check for undefined instruments. I've removed that check now, but you can adjust the other checks' thresholds as needed.

  • The model might be too small to fit the entire dataset.

Regarding the frontend development, I don't have much time at the moment. I might be able to work on it early next year.

I look forward to collaborating with you to train a bigger and better model. (You can contact me at [email protected]. Feel free to send me your other contact info via email.)

@skytnt Thank you. I emailed to you as requested.

asigalov61 changed discussion status to closed

@Timzoid Just poking you...

I wanted to direct your attention to my colab work with SkyTNT:

https://huggingface.co./spaces/skytnt/midi-composer

We greatly improved it and added new models. We also added self-continuation option which is great for controlled music generation.

So check it out please and let me know what you think.

Sincerely,

Alex

Sign up or log in to comment