Upload ff7rv4-1.ckpt
Browse filesThis is an moderate-scale fine-tuned Stable Diffusion model with characters and scenery from the video game Final Fantasy 7 Remake. The novelty here is using "dreambooth" techniques but with an extended 1400+ image training set with more than a dozen concepts, including 5 character trained to the point of near indistinguishable quality and style of the game render engine via scenery concepts.
The 4 main characters Cloud Strife, Tifa Lockhart, Barret Wallace, and Aerith Gainsborough are very well represented with 120-140 images each, as is Jessie Rasberry with just 90 images.
Additionally, "Biggs ff7r", "Wedge ff7r", "Shinra Security Officer" and various side characters have limited training, typically 20-40 images. Biggs and Wedge were trained with "ff7r" as it if were a surname as they have no canonical surname to aid in training. Sephiroth, President Shinra, Rufus Shinra are poorly represented (<20 images each) but present, with output quality mirroring their balance in the training data.
Styles and scenery such as "midgar city", "streets of midgar city business district", "midgar slums district", "seventh heaven bar", "train car", "train station in midgar", and "aerial photo of midgar city" are trained as well. As each training image was annotated with its own caption, these concepts can be mixed. "Train station in midgar city slums district" will be different than "train station in midgar city business district", or you can say "Iron Man standing on the roof tops of midgar slums district" and so-forth and get compelling output.
Data set is compiled from screenshots from the game, captured, cropped, resized, and annotated with the assistance of BLIP, then modified to annotate the new content not detected by BLIP (the new concepts) i.e. replacing "a man..." to "cloud strife...".
Approximately 90 of the images in the data set are of more than one character, such as "cloud strife and barret wallace standing in a garden with a waterfall in the background". From prior attempts, these additional images greatly improve the ability to draw characters at inference time in group setting with a reduced propensity to bleed their clothing, styles, body types, etc.
Training was performed using Kane Wallmann's fork of Xavier's original DreamBooth Stable Diffusion repo on an RTX 3090 24GB. 3 epochs with 5 repeats were trained at LR 1e-6, then an additional epoch of ~4000 steps was performed at LR 5e-7 for a total of ~14,000 steps.
Regularization was used on categories man, woman, city, indoors, building, dog, sword, person, and group.
- .gitattributes +1 -0
- ff7rv4-1.ckpt +3 -0
@@ -30,3 +30,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
30 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
31 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
32 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
30 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
31 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
32 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
33 |
+
ff7rv4-1.ckpt filter=lfs diff=lfs merge=lfs -text
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:1ba3eb041e7cb28d403f1c4feb1744db44d3f27ae5878b84249ae121f2be4b08
|
3 |
+
size 2132885089
|