Question About LoRa training

#2
by kurobane - opened

Hello,
I have some questions about LoRa training and hope you can answer me :
How many images do you use for training a LoRa ? Are they from different style, SFW then NSFW ect ?
Which dimension do you crop your images ? How do you know what settings to use for the training ?
How do you preprocess the image ? do you use BLIP or deepbooru or maybe both ?
What model do you use to train the LoRa ? And finally how many steps for each images ?

Thanks you in advance for your answers, my goal is to create very good LoRa like yours and hope you could help me,
Have a nice day !

kurobane changed discussion status to closed
kurobane changed discussion status to open

Oops didn't know what this button does sorry

Sorry for the late reply, I've been a little busy lately.
For your question:

  1. I usually use 30~50 images for training. For NSFW, my suggestion is that unless you really like it, it is best not to add it, especially with mosaics. In general, the generalization of the model is sufficient to support the generation of NSFW images, and there is no need to put some NSFW content by yourself.
  2. I usually choose 128. Theoretically, the larger the number of laps required for training will be less, but the final effect depends on your own experiments and needs.
  3. To tag images, I generally use the tagger plug-in on webui directly, but whether it is direct BLIP, deepbooru or others, this is a lazy approach. If you want to really achieve good results, it is recommended to manually adjust the labels yourself at the end. Especially when you want to train multiple characters or clothing, different objects have the same label, which can easily lead to the inability to separate the desired multiple characters or clothing.
  4. I usually use Novelai's final-pruned model for training. Of course, you can also try anything4.5 and others. I tried anything4.5 once and the effect is not bad.

For more information I recommend checking out the developer's github:
https://github.com/kohya-ss/sd-scripts
https://github.com/bmaltais/kohya_ss

The English is directly translated by me with Google Translate, and I do not guarantee to accurately express my meaning. I also attach the Chinese~

很抱歉,这么晚回复,最近有些忙碌。
对于你的问题:
1、我一般选用30~50张图像去训练,对于NSFW,我的建议是除非你真的很喜欢,最好不要加,尤其是带马赛克的。一般情况下模型的泛化足以支撑NSFW图像的生成,并不需要自己放一些NSFW的内容。
2、我一般会选128,理论上来说大一些需要训练的圈数会少一些,不过最后效果还需要看自己的试验和需求。
3、给图像打标签,我一般直接采用webui上的tagger插件,但不管是直接BLIP、deepbooru还是其他,这都是懒惰的做法。想要真正达到好的效果,建议最后自己手动调整标签。尤其是想要训练多个角色或者服装的时候,不同对象具有的相同标签很容易导致想要的多个角色或者服装无法分开。
4、训练用的模型我一般直接用Novelai的final-pruned模型,当然你也可以试试anything4.5以及其他,anything4.5我试过一次效果也不差。

更多信息我建议去查看开发者的github:
https://github.com/kohya-ss/sd-scripts
https://github.com/bmaltais/kohya_ss

英文是我直接用谷歌翻译直接翻译的,不保证准确表达我的意思。我把中文也附上~

No problem for the delay don't worry thanks you for answering all my questions it'll really help me !
I wish you a good day.

Ah I just see that you didn't answer what are the dimensions that you use for the image 768x768 ?

The suggestion of training picture size is still determined according to the individual's independent video memory. I am using T4 GPU on the server for training, it has 16G video memory. So I can do 840840, batch size 3, almost exactly full.
But if there is only 8G video memory, then I recommend 640
640, batch size 2, and it can also run full.
训练图片尺寸这个建议还是根据个人的独立显存来决定。我使用的是服务器上的T4 GPU进行训练,它具有16G显存。因此我可以做到840840,batch size 3,差不多正好跑满。
但如果只有8G显存,那么我建议640
640,batch size 2,也能跑满。

Thanks for you reply man !

kurobane changed discussion status to closed

Sign up or log in to comment