This looks amazing

#1
by neph1 - opened

Could you share any information on the training regime? Dataset, hyper parameters, etc.

Hi, Sure, I used 70 49-frame videos. I captioned them using qwen2-vlm but it made many mistakes so I had to review and correct them one by one. Regarding the parameters, as this was my first LoRA with CogVideoX, I basically used the settings that come with the CogVideoX-factory repo. The whole training took around 13 hours on a L40S and used around 32 GB VRAM but there are suggested optimizations in the cogvideox-factory repo that make it possible for the training to work on 24 GB of VRAM.

@Cseti could your share you data-prep (starting from a folder with videos) scripts to split and caption , fine-tuning scripts , it would be amazing to try to make some LoRas using your scripts. it would be great if you could make a github repo (pushing your current scripts)

@Cseti could your share you data-prep (starting from a folder with videos) scripts to split and caption , fine-tuning scripts , it would be amazing to try to make some LoRas using your scripts. it would be great if you could make a github repo (pushing your current scripts)

I followed the instructions in cogvideo-factory step-by-step. They also discussing the required folder structure, here. For running gwen-vl model I used ComfyUI nodes, but it made many mistakes, however Cogvideo guys released their own caption method here. I couldn't test it yet but if they really used that for captioning the model, it could be the best method to make captions for LoRA training too.

Sign up or log in to comment