onimai-characters / README.md
alea31415's picture
Update README.md
ec9c476
|
raw
history blame
6 kB
metadata
license: creativeml-openrail-m
tags:
  - text-to-image
  - stable-diffusion
  - anime
  - aiart

This model is trained on 6(+1?) characters from ONIMAI: I'm Now Your Sister! (γŠε…„γ‘γ‚ƒγ‚“γ―γŠγ—γΎγ„!)

Example Generations

00009-20230210181727-min.png 00041-20230210195115-min.png

Update -- 2023.02.19

Two lora checkpoints trained with the same dataset are added to the loras subfolder. The first one seems to work already. The characters are learned but unfortunately not the styles nor the outfits. It is trained based on ACertainty and works fine on Orange and Anything, but not so well on models that are trained further such as MyneFactoryBase or my own models. I still cannot figure out when Loras get transferred correctly.

The lora has dimension 32, alpha 1, and is trained with learning rate 1e-4. Here are some example generations.

00119-20230220043346.png 00107-20230220042026.png 00131-20230220045301.png

P.S. I would still suggest using the full model if you want more complex scenes, or more fidelity to the styles and outfits. You can always alter the style by merging models.

Usage

The model is shared in both diffuser and safetensors formats. As for the trigger words, the six characters can be prompted with OyamaMahiro, OyamaMihari, HozukiKaede, HozukiMomiji, OkaAsahi, and MurosakiMiyo. TenkawaNayuta is tagged but she appears in fewer than 10 images so don't expect any good result. There are also three different styles trained into the model: aniscreen, edstyle, and megazine (yes, typo). As usual you can get multiple-character imagee but starting from 4 it is difficult. By the way, the model is trained at clip skip 1.

In the following images are shown the generations of different checkpoints. The default one is that of step 22828, but all the checkpoints starting from step 9969 can be found in the checkpoints directory. They are all sufficiently good at the six characters but later ones are better at megazine and edstyle (at the risk of overfitting, I don't really know).

xyz_grid-0000-20230210154700.jpg xyz_grid-0001-20230210155723.jpg xyz_grid-0006-20230210163625.jpg

More Generations

00011-20230210182642-min.png 00003-20230210175009-min.png 00005-20230210175301-min.png 00016-20230210183918-min.png 00019-20230210184731-min.png 00038-20230210194326-min.png 00039-20230210194529-min.png 00043-20230210195945-min.png 00047-20230210202801-min.png

Dataset Description

The dataset is prepared via the workflow detailed here: https://github.com/cyber-meow/anime_screenshot_pipeline

It contains 21412 images with the following composition

  • 2133 onimai images separated in four types
    • 1496 anime screenshots from the first six episodes (for style aniscreen)
    • 70 screenshots of the ending of the anime (for style edstyle, not counted in the 1496 above)
    • 528 fan arts (or probably some official arts)
    • 39 scans of the covers of the mangas (for style megazine, don't ask me why I choose this name, it is bad but it turns out to work)
  • 19279 regularization images which intend to be as various as possible while being in anime style (i.e. no photorealistic image is used)

Note that the model is trained with a specific weighting scheme to balance between different concepts so that every image does not weight equally. After applying the per-image repeat we get around 145K images per epoch.

Training

Training is done with EveryDream2 trainer with ACertainty as base model. The following configuration is used

  • resolution 512
  • cosine learning rate scheduler, lr 2.5e-6
  • batch size 8
  • conditional dropout 0.08
  • change beta scheduler from scaler_linear to linear in config.json of the scheduler of the model
  • clip skip 1

I trained for two epochs wheareas the default release model was trained for 22828 steps as mentioned above.