Update README.md
Browse files
README.md
CHANGED
@@ -24,21 +24,13 @@ In an effort to explore fine-tuning generative AI models, I decided to try my ha
|
|
24 |
|
25 |
Here are some examples of Annalaura's work:
|
26 |
|
27 |
-
|
28 |
-
|
29 |
-
<img src="./assets/instagram-2.jpg" width="120" />
|
30 |
-
<img src="./assets/instagram-3.jpg" width="120" />
|
31 |
-
<img src="./assets/instagram-4.jpg" width="120" />
|
32 |
-
</p>
|
33 |
|
34 |
And here are some examples of outputs created by my model trying to generate similar scenes (TODO):
|
35 |
|
36 |
-
|
37 |
-
|
38 |
-
<img src="./assets/annalaura-2.png" width="120" />
|
39 |
-
<img src="./assets/annalaura-3.png" width="120" />
|
40 |
-
<img src="./assets/annalaura-4.png" width="120" />
|
41 |
-
</p>
|
42 |
|
43 |
This pipeline was finetuned from **runwayml/stable-diffusion-v1-5** on the **omkar1799/annalaura-diffusion-dataset** dataset on Huggingface which I curated and annotated myself. I've included some example images generated
|
44 |
|
@@ -52,12 +44,8 @@ From left to right, the prompts are:
|
|
52 |
- blue and yellow cats riding bikes together through tropical forest path in an annalaura watercolor drawing style
|
53 |
- raccoon character wearing gold chain driving red sports car down highway in an annalaura watercolor drawing style
|
54 |
|
55 |
-
|
56 |
-
|
57 |
-
<img src="./assets/runway-2a.png" width="120" />
|
58 |
-
<img src="./assets/runway-3.png" width="120" />
|
59 |
-
<img src="./assets/runway-4.png" width="120" />
|
60 |
-
</p>
|
61 |
|
62 |
Unsurprisingly, Stable Diffusion did not do a great job of replicating Anna Laura's unique watercolor style, and furthermore failed to generalize well enough to produce animal-like characters that behaved like humans (i.e. it drew both a turtle and a teacher separately for the second prompt, instead of a single entity).
|
63 |
|
@@ -65,13 +53,13 @@ Unsurprisingly, Stable Diffusion did not do a great job of replicating Anna Laur
|
|
65 |
|
66 |
To prepare my dataset for this training task, I needed both images and captions.
|
67 |
|
68 |
-
I used the Python `instaloader` package to quickly scrape specific posts off of the @annalaura_art Instagram account. You can find this script here
|
69 |
|
70 |
-
Then, I wrote another script that fed these images to `claude-3.5-sonnet` and asked it to generate training data labels for each. In my prompt, I provided a few examples to guide it towards producing correct lables, and also set up a JSON formatter so I could easily pull these outputs into a dataframe later. Because of token limits per request, I only passed 10 images at a time. You can find that script here
|
71 |
|
72 |
-
Lastly, I pulled associated these images and captions in a structured CSV using simple Pandas logic in a Jupyter notebook, which you can find here
|
73 |
|
74 |
-
I then submitted my dataset to Huggingface at **omkar1799/annalaura-diffusion-dataset** which can be found here
|
75 |
|
76 |
## Training
|
77 |
|
@@ -98,12 +86,8 @@ Definitely an improvement! It's starting to look like the model is understanding
|
|
98 |
|
99 |
So, here are the outputs at 5,000 steps with all the same hyperparameters against the same four prompts as before:
|
100 |
|
101 |
-
|
102 |
-
|
103 |
-
<img src="./assets/annalaura-2.png" width="120" />
|
104 |
-
<img src="./assets/annalaura-3.png" width="120" />
|
105 |
-
<img src="./assets/annalaura-4.png" width="120" />
|
106 |
-
</p>
|
107 |
|
108 |
Much better! There are definitely, some hallucinations (e.g. random letters in text bubbles), but the characters have taken on the shapes of Anna Laura's artwork and the model is doing a good job of generalizing to these similar but new prompts it hasn't seen before.
|
109 |
|
|
|
24 |
|
25 |
Here are some examples of Annalaura's work:
|
26 |
|
27 |
+
| ![Image 1](./assets/instagram-1.jpg) | ![Image 2](./assets/instagram-2.jpg) | ![Image 3](./assets/instagram-3.jpg) | ![Image 4](./assets/instagram-4.jpg) |
|
28 |
+
|:------------------------------------:|:------------------------------------:|:------------------------------------:|:------------------------------------:|
|
|
|
|
|
|
|
|
|
29 |
|
30 |
And here are some examples of outputs created by my model trying to generate similar scenes (TODO):
|
31 |
|
32 |
+
| ![Image 1](./assets/annalaura-1.png) | ![Image 2](./assets/annalaura-2.png) | ![Image 3](./assets/annalaura-3.png) | ![Image 4](./assets/annalaura-4.png) |
|
33 |
+
|:------------------------------------:|:------------------------------------:|:------------------------------------:|:------------------------------------:|
|
|
|
|
|
|
|
|
|
34 |
|
35 |
This pipeline was finetuned from **runwayml/stable-diffusion-v1-5** on the **omkar1799/annalaura-diffusion-dataset** dataset on Huggingface which I curated and annotated myself. I've included some example images generated
|
36 |
|
|
|
44 |
- blue and yellow cats riding bikes together through tropical forest path in an annalaura watercolor drawing style
|
45 |
- raccoon character wearing gold chain driving red sports car down highway in an annalaura watercolor drawing style
|
46 |
|
47 |
+
| ![Runway 1](./assets/runway-1.png) | ![Runway 2a](./assets/runway-2a.png) | ![Runway 3](./assets/runway-3.png) | ![Runway 4](./assets/runway-4.png) |
|
48 |
+
|:----------------------------------:|:------------------------------------:|:----------------------------------:|:----------------------------------:|
|
|
|
|
|
|
|
|
|
49 |
|
50 |
Unsurprisingly, Stable Diffusion did not do a great job of replicating Anna Laura's unique watercolor style, and furthermore failed to generalize well enough to produce animal-like characters that behaved like humans (i.e. it drew both a turtle and a teacher separately for the second prompt, instead of a single entity).
|
51 |
|
|
|
53 |
|
54 |
To prepare my dataset for this training task, I needed both images and captions.
|
55 |
|
56 |
+
I used the Python `instaloader` package to quickly scrape specific posts off of the @annalaura_art Instagram account. You can find this script [here](https://github.com/Omkar-Waingankar/annalaura-diffusion/blob/main/scraper.py). In total, I had about 850 images to work with as training examples.
|
57 |
|
58 |
+
Then, I wrote another script that fed these images to `claude-3.5-sonnet` and asked it to generate training data labels for each. In my prompt, I provided a few examples to guide it towards producing correct lables, and also set up a JSON formatter so I could easily pull these outputs into a dataframe later. Because of token limits per request, I only passed 10 images at a time. You can find that script [here](https://github.com/Omkar-Waingankar/annalaura-diffusion/blob/main/labeler.py).
|
59 |
|
60 |
+
Lastly, I pulled associated these images and captions in a structured CSV using simple Pandas logic in a Jupyter notebook, which you can find [here](https://github.com/Omkar-Waingankar/annalaura-diffusion/blob/main/preprocessing.ipynb).
|
61 |
|
62 |
+
I then submitted my dataset to Huggingface at **omkar1799/annalaura-diffusion-dataset** which can be found [here](https://huggingface.co/datasets/omkar1799/annalaura-diffusion-dataset).
|
63 |
|
64 |
## Training
|
65 |
|
|
|
86 |
|
87 |
So, here are the outputs at 5,000 steps with all the same hyperparameters against the same four prompts as before:
|
88 |
|
89 |
+
| ![Image 1](./assets/annalaura-1.png) | ![Image 2](./assets/annalaura-2.png) | ![Image 3](./assets/annalaura-3.png) | ![Image 4](./assets/annalaura-4.png) |
|
90 |
+
|:------------------------------------:|:------------------------------------:|:------------------------------------:|:------------------------------------:|
|
|
|
|
|
|
|
|
|
91 |
|
92 |
Much better! There are definitely, some hallucinations (e.g. random letters in text bubbles), but the characters have taken on the shapes of Anna Laura's artwork and the model is doing a good job of generalizing to these similar but new prompts it hasn't seen before.
|
93 |
|