RunDiffusion
commited on
Commit
•
fdc3962
1
Parent(s):
51f2e73
Update README.md
Browse files
README.md
CHANGED
@@ -75,7 +75,7 @@ license_link: https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICE
|
|
75 |
|
76 |
# Wonderman Proof of Concept - By RunDiffusion.com
|
77 |
|
78 |
-
|
79 |
- The concept can not exist in the Flux dataset. (This is cheating)
|
80 |
- The concept needed to be present but still allow flexibility for creativity.
|
81 |
- The concept needed to resemble the subject within 90% accuracy.
|
@@ -89,11 +89,12 @@ Flux thinks that "Wonderman" is "Superman"
|
|
89 |
![Flux thinks that "Wonderman" is "Superman"](Huggingface-assets/superman-flux.jpg)
|
90 |
|
91 |
|
92 |
-
|
93 |
You can view the [RAW low quality data here: ](https://huggingface.co/RunDiffusion/Wonderman-Flux-POC/tree/main/Raw%20Low%20Quality%20Data)
|
94 |
The training data was low resolution, cropped, oddly shaped, pixelated, and overall the worst possible data we've come across. That didn't stop us! AI to the rescue!
|
95 |
![Low Quality Training Data](Huggingface-assets/multiple-samples-training-data.png)
|
96 |
|
|
|
97 |
To fix the data we had to:
|
98 |
- Inpaint problem areas like backgrounds, signatures, and text
|
99 |
- Outpaint to expand images
|
@@ -104,7 +105,7 @@ We were able to get the dataset to 13 with these techniques.
|
|
104 |
Full dataset [is here](https://huggingface.co/RunDiffusion/Wonderman-Flux-POC/tree/main/Cleaned%20and%20Captioned%20Data)
|
105 |
![Cleaned Wonderman Dataset](Huggingface-assets/multiple-samples-of-cleaned-data.png)
|
106 |
|
107 |
-
|
108 |
We are not entirely familiar with Flux's preferred captioning style. We understand that this model responds will to full descriptive sentences so we went with that. Below are some examples of the images with their captions. We chose LLaMA v3 inspired by this paper: https://arxiv.org/html/2406.08478v1
|
109 |
The system prompt used was basic and could likely benefit from further refinement.
|
110 |
|
@@ -114,7 +115,7 @@ A vintage comic book cover of Wonderman. On the cover, there are three main char
|
|
114 |
Wonderman, a male superhero character. He is wearing a green and red costume with a large 'W' emblem on the chest. Wonderman has a muscular physique, brown hair, and is wearing a black mask covering his eyes. He stands confidently with his hands by his sides. photo
|
115 |
![Standing Wonderman](Cleaned%20and%20Captioned%20Data/00002.png)
|
116 |
|
117 |
-
|
118 |
All tasks were performed on a local workstation equipped with an RTX 4090, i7 processor, and 64GB RAM. Note that 32GB RAM will not suffice, as you may encounter out-of-memory (OOM) errors when caching latents. We did use RunDiffusion.com for testing the LoRAs created, enabling us to launch five servers with five checkpoints to determine the best one that converged
|
119 |
We're not going to dive into the rank and learning rate and stuff because this really depends on your goals and what you're trying to accomplish. But the rules below are good ones to follow.
|
120 |
- We used Ostris's ai-toolkit available here: [Ostris ai-toolkit](https://github.com/ostris/ai-toolkit/tree/main)
|
@@ -130,22 +131,22 @@ You'll see in the next page of examples where the captioning really helps or hur
|
|
130 |
Total time for the LoRA was about 2 to 2.5 hours. $1 to $2 on RunPod, Vast, or local electricity will be even cheaper.
|
131 |
Now for the results! (This next file is big to preserve the quality)
|
132 |
|
133 |
-
|
134 |
Right off the bat at 500 steps you will get some likeness. This will mostly be baseline Flux. If you're training a concept that exists then you will see some convergence even at just 500 steps.
|
135 |
![500 steps](Huggingface-assets/500-steps.jpg)
|
136 |
Prompt: a vintage comic book cover for Wonderman, featuring three characters in a dynamic action scene. The central figure is Wonderman with a confident expression, wearing a green shirt with a yellow belt and red gloves. To his left is a woman with a look of concern, dressed in a yellow top and red skirt. On the right, there's a monstrous creature with sharp teeth and claws, seemingly attacking the man. The background is minimal, primarily blue with a hint of landscape at the bottom. The text WONDER COMICS and No. 11 suggests this is from a series.
|
137 |
|
138 |
-
|
139 |
It will start to break apart a little bit here. Be patient. It's learning.
|
140 |
![1250 steps](Huggingface-assets/1250-steps.jpg)
|
141 |
Prompt: A vintage comic book cover titled 'Wonderman Comics'. The central figure is Wonderman who appears to be in a combat stance. He is lunging at a large, menacing creature with a gaping mouth, revealing sharp teeth. Below the main characters, there's a woman in a yellow dress holding a small device, possibly a gun. She seems to be in distress. In the background, there's a futuristic-looking tower with a few figures standing atop. The overall color palette is vibrant, with dominant yellows, greens, and purples.
|
142 |
|
143 |
-
|
144 |
Hey! We're getting somewhere! The caption as a prompt should be showing our subject well at this stage but the real test is breaking away from the caption to see if our subject is present.
|
145 |
![1750 steps](Huggingface-assets/1750-steps.jpg)
|
146 |
Prompt: Wonderman wearing a green and red costume with a large 'W' emblem on the chest standing heroically
|
147 |
|
148 |
-
|
149 |
There he is! We can now prompt more freely to get Wonderman doing other stuff. Keep in mind we will still be limited to what we trained on, but at least we have a great starting point!
|
150 |
![2500 steps](Huggingface-assets/2500-steps.jpg)
|
151 |
Prompt: comic style illustration of Wonderman running from aliens on the moon. center character is Wonderman, a male superhero character. He is wearing a green and red costume with a large 'W' emblem on the chest. Black boots to his knees. Wonderman is wearing a black mask covering his eyes
|
|
|
75 |
|
76 |
# Wonderman Proof of Concept - By RunDiffusion.com
|
77 |
|
78 |
+
# For this POC we needed to achieve these goals
|
79 |
- The concept can not exist in the Flux dataset. (This is cheating)
|
80 |
- The concept needed to be present but still allow flexibility for creativity.
|
81 |
- The concept needed to resemble the subject within 90% accuracy.
|
|
|
89 |
![Flux thinks that "Wonderman" is "Superman"](Huggingface-assets/superman-flux.jpg)
|
90 |
|
91 |
|
92 |
+
# Data Used for Training
|
93 |
You can view the [RAW low quality data here: ](https://huggingface.co/RunDiffusion/Wonderman-Flux-POC/tree/main/Raw%20Low%20Quality%20Data)
|
94 |
The training data was low resolution, cropped, oddly shaped, pixelated, and overall the worst possible data we've come across. That didn't stop us! AI to the rescue!
|
95 |
![Low Quality Training Data](Huggingface-assets/multiple-samples-training-data.png)
|
96 |
|
97 |
+
## Data Cleanup Strategy
|
98 |
To fix the data we had to:
|
99 |
- Inpaint problem areas like backgrounds, signatures, and text
|
100 |
- Outpaint to expand images
|
|
|
105 |
Full dataset [is here](https://huggingface.co/RunDiffusion/Wonderman-Flux-POC/tree/main/Cleaned%20and%20Captioned%20Data)
|
106 |
![Cleaned Wonderman Dataset](Huggingface-assets/multiple-samples-of-cleaned-data.png)
|
107 |
|
108 |
+
# Captioning the Data
|
109 |
We are not entirely familiar with Flux's preferred captioning style. We understand that this model responds will to full descriptive sentences so we went with that. Below are some examples of the images with their captions. We chose LLaMA v3 inspired by this paper: https://arxiv.org/html/2406.08478v1
|
110 |
The system prompt used was basic and could likely benefit from further refinement.
|
111 |
|
|
|
115 |
Wonderman, a male superhero character. He is wearing a green and red costume with a large 'W' emblem on the chest. Wonderman has a muscular physique, brown hair, and is wearing a black mask covering his eyes. He stands confidently with his hands by his sides. photo
|
116 |
![Standing Wonderman](Cleaned%20and%20Captioned%20Data/00002.png)
|
117 |
|
118 |
+
# Train the Data
|
119 |
All tasks were performed on a local workstation equipped with an RTX 4090, i7 processor, and 64GB RAM. Note that 32GB RAM will not suffice, as you may encounter out-of-memory (OOM) errors when caching latents. We did use RunDiffusion.com for testing the LoRAs created, enabling us to launch five servers with five checkpoints to determine the best one that converged
|
120 |
We're not going to dive into the rank and learning rate and stuff because this really depends on your goals and what you're trying to accomplish. But the rules below are good ones to follow.
|
121 |
- We used Ostris's ai-toolkit available here: [Ostris ai-toolkit](https://github.com/ostris/ai-toolkit/tree/main)
|
|
|
131 |
Total time for the LoRA was about 2 to 2.5 hours. $1 to $2 on RunPod, Vast, or local electricity will be even cheaper.
|
132 |
Now for the results! (This next file is big to preserve the quality)
|
133 |
|
134 |
+
## 500 Steps
|
135 |
Right off the bat at 500 steps you will get some likeness. This will mostly be baseline Flux. If you're training a concept that exists then you will see some convergence even at just 500 steps.
|
136 |
![500 steps](Huggingface-assets/500-steps.jpg)
|
137 |
Prompt: a vintage comic book cover for Wonderman, featuring three characters in a dynamic action scene. The central figure is Wonderman with a confident expression, wearing a green shirt with a yellow belt and red gloves. To his left is a woman with a look of concern, dressed in a yellow top and red skirt. On the right, there's a monstrous creature with sharp teeth and claws, seemingly attacking the man. The background is minimal, primarily blue with a hint of landscape at the bottom. The text WONDER COMICS and No. 11 suggests this is from a series.
|
138 |
|
139 |
+
## 1250 Steps
|
140 |
It will start to break apart a little bit here. Be patient. It's learning.
|
141 |
![1250 steps](Huggingface-assets/1250-steps.jpg)
|
142 |
Prompt: A vintage comic book cover titled 'Wonderman Comics'. The central figure is Wonderman who appears to be in a combat stance. He is lunging at a large, menacing creature with a gaping mouth, revealing sharp teeth. Below the main characters, there's a woman in a yellow dress holding a small device, possibly a gun. She seems to be in distress. In the background, there's a futuristic-looking tower with a few figures standing atop. The overall color palette is vibrant, with dominant yellows, greens, and purples.
|
143 |
|
144 |
+
## 1750 Steps
|
145 |
Hey! We're getting somewhere! The caption as a prompt should be showing our subject well at this stage but the real test is breaking away from the caption to see if our subject is present.
|
146 |
![1750 steps](Huggingface-assets/1750-steps.jpg)
|
147 |
Prompt: Wonderman wearing a green and red costume with a large 'W' emblem on the chest standing heroically
|
148 |
|
149 |
+
## 2500 Steps
|
150 |
There he is! We can now prompt more freely to get Wonderman doing other stuff. Keep in mind we will still be limited to what we trained on, but at least we have a great starting point!
|
151 |
![2500 steps](Huggingface-assets/2500-steps.jpg)
|
152 |
Prompt: comic style illustration of Wonderman running from aliens on the moon. center character is Wonderman, a male superhero character. He is wearing a green and red costume with a large 'W' emblem on the chest. Black boots to his knees. Wonderman is wearing a black mask covering his eyes
|