omkar1799 commited on
Commit
ff66455
1 Parent(s): 20dfc45

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -28
README.md CHANGED
@@ -24,21 +24,13 @@ In an effort to explore fine-tuning generative AI models, I decided to try my ha
24
 
25
  Here are some examples of Annalaura's work:
26
 
27
- <p float="left">
28
- <img src="./assets/instagram-1.jpg" width="120" />
29
- <img src="./assets/instagram-2.jpg" width="120" />
30
- <img src="./assets/instagram-3.jpg" width="120" />
31
- <img src="./assets/instagram-4.jpg" width="120" />
32
- </p>
33
 
34
  And here are some examples of outputs created by my model trying to generate similar scenes (TODO):
35
 
36
- <p float="left">
37
- <img src="./assets/annalaura-1.png" width="120" />
38
- <img src="./assets/annalaura-2.png" width="120" />
39
- <img src="./assets/annalaura-3.png" width="120" />
40
- <img src="./assets/annalaura-4.png" width="120" />
41
- </p>
42
 
43
  This pipeline was finetuned from **runwayml/stable-diffusion-v1-5** on the **omkar1799/annalaura-diffusion-dataset** dataset on Huggingface which I curated and annotated myself. I've included some example images generated
44
 
@@ -52,12 +44,8 @@ From left to right, the prompts are:
52
  - blue and yellow cats riding bikes together through tropical forest path in an annalaura watercolor drawing style
53
  - raccoon character wearing gold chain driving red sports car down highway in an annalaura watercolor drawing style
54
 
55
- <p float="left">
56
- <img src="./assets/runway-1.png" width="120" />
57
- <img src="./assets/runway-2a.png" width="120" />
58
- <img src="./assets/runway-3.png" width="120" />
59
- <img src="./assets/runway-4.png" width="120" />
60
- </p>
61
 
62
  Unsurprisingly, Stable Diffusion did not do a great job of replicating Anna Laura's unique watercolor style, and furthermore failed to generalize well enough to produce animal-like characters that behaved like humans (i.e. it drew both a turtle and a teacher separately for the second prompt, instead of a single entity).
63
 
@@ -65,13 +53,13 @@ Unsurprisingly, Stable Diffusion did not do a great job of replicating Anna Laur
65
 
66
  To prepare my dataset for this training task, I needed both images and captions.
67
 
68
- I used the Python `instaloader` package to quickly scrape specific posts off of the @annalaura_art Instagram account. You can find this script here (TODO). In total, I had about 850 images to work with as training examples.
69
 
70
- Then, I wrote another script that fed these images to `claude-3.5-sonnet` and asked it to generate training data labels for each. In my prompt, I provided a few examples to guide it towards producing correct lables, and also set up a JSON formatter so I could easily pull these outputs into a dataframe later. Because of token limits per request, I only passed 10 images at a time. You can find that script here (TODO).
71
 
72
- Lastly, I pulled associated these images and captions in a structured CSV using simple Pandas logic in a Jupyter notebook, which you can find here (TODO).
73
 
74
- I then submitted my dataset to Huggingface at **omkar1799/annalaura-diffusion-dataset** which can be found here (TODO).
75
 
76
  ## Training
77
 
@@ -98,12 +86,8 @@ Definitely an improvement! It's starting to look like the model is understanding
98
 
99
  So, here are the outputs at 5,000 steps with all the same hyperparameters against the same four prompts as before:
100
 
101
- <p float="left">
102
- <img src="./assets/annalaura-1.png" width="120" />
103
- <img src="./assets/annalaura-2.png" width="120" />
104
- <img src="./assets/annalaura-3.png" width="120" />
105
- <img src="./assets/annalaura-4.png" width="120" />
106
- </p>
107
 
108
  Much better! There are definitely, some hallucinations (e.g. random letters in text bubbles), but the characters have taken on the shapes of Anna Laura's artwork and the model is doing a good job of generalizing to these similar but new prompts it hasn't seen before.
109
 
 
24
 
25
  Here are some examples of Annalaura's work:
26
 
27
+ | ![Image 1](./assets/instagram-1.jpg) | ![Image 2](./assets/instagram-2.jpg) | ![Image 3](./assets/instagram-3.jpg) | ![Image 4](./assets/instagram-4.jpg) |
28
+ |:------------------------------------:|:------------------------------------:|:------------------------------------:|:------------------------------------:|
 
 
 
 
29
 
30
  And here are some examples of outputs created by my model trying to generate similar scenes (TODO):
31
 
32
+ | ![Image 1](./assets/annalaura-1.png) | ![Image 2](./assets/annalaura-2.png) | ![Image 3](./assets/annalaura-3.png) | ![Image 4](./assets/annalaura-4.png) |
33
+ |:------------------------------------:|:------------------------------------:|:------------------------------------:|:------------------------------------:|
 
 
 
 
34
 
35
  This pipeline was finetuned from **runwayml/stable-diffusion-v1-5** on the **omkar1799/annalaura-diffusion-dataset** dataset on Huggingface which I curated and annotated myself. I've included some example images generated
36
 
 
44
  - blue and yellow cats riding bikes together through tropical forest path in an annalaura watercolor drawing style
45
  - raccoon character wearing gold chain driving red sports car down highway in an annalaura watercolor drawing style
46
 
47
+ | ![Runway 1](./assets/runway-1.png) | ![Runway 2a](./assets/runway-2a.png) | ![Runway 3](./assets/runway-3.png) | ![Runway 4](./assets/runway-4.png) |
48
+ |:----------------------------------:|:------------------------------------:|:----------------------------------:|:----------------------------------:|
 
 
 
 
49
 
50
  Unsurprisingly, Stable Diffusion did not do a great job of replicating Anna Laura's unique watercolor style, and furthermore failed to generalize well enough to produce animal-like characters that behaved like humans (i.e. it drew both a turtle and a teacher separately for the second prompt, instead of a single entity).
51
 
 
53
 
54
  To prepare my dataset for this training task, I needed both images and captions.
55
 
56
+ I used the Python `instaloader` package to quickly scrape specific posts off of the @annalaura_art Instagram account. You can find this script [here](https://github.com/Omkar-Waingankar/annalaura-diffusion/blob/main/scraper.py). In total, I had about 850 images to work with as training examples.
57
 
58
+ Then, I wrote another script that fed these images to `claude-3.5-sonnet` and asked it to generate training data labels for each. In my prompt, I provided a few examples to guide it towards producing correct lables, and also set up a JSON formatter so I could easily pull these outputs into a dataframe later. Because of token limits per request, I only passed 10 images at a time. You can find that script [here](https://github.com/Omkar-Waingankar/annalaura-diffusion/blob/main/labeler.py).
59
 
60
+ Lastly, I pulled associated these images and captions in a structured CSV using simple Pandas logic in a Jupyter notebook, which you can find [here](https://github.com/Omkar-Waingankar/annalaura-diffusion/blob/main/preprocessing.ipynb).
61
 
62
+ I then submitted my dataset to Huggingface at **omkar1799/annalaura-diffusion-dataset** which can be found [here](https://huggingface.co/datasets/omkar1799/annalaura-diffusion-dataset).
63
 
64
  ## Training
65
 
 
86
 
87
  So, here are the outputs at 5,000 steps with all the same hyperparameters against the same four prompts as before:
88
 
89
+ | ![Image 1](./assets/annalaura-1.png) | ![Image 2](./assets/annalaura-2.png) | ![Image 3](./assets/annalaura-3.png) | ![Image 4](./assets/annalaura-4.png) |
90
+ |:------------------------------------:|:------------------------------------:|:------------------------------------:|:------------------------------------:|
 
 
 
 
91
 
92
  Much better! There are definitely, some hallucinations (e.g. random letters in text bubbles), but the characters have taken on the shapes of Anna Laura's artwork and the model is doing a good job of generalizing to these similar but new prompts it hasn't seen before.
93