omkar1799
/

sd-annalaura-model

@@ -24,21 +24,13 @@ In an effort to explore fine-tuning generative AI models, I decided to try my ha
 Here are some examples of Annalaura's work:
-<p float="left">
-  <img src="./assets/instagram-1.jpg" width="120" />
-  <img src="./assets/instagram-2.jpg" width="120" />
-  <img src="./assets/instagram-3.jpg" width="120" />
-  <img src="./assets/instagram-4.jpg" width="120" />
-</p>
 And here are some examples of outputs created by my model trying to generate similar scenes (TODO):
-<p float="left">
-  <img src="./assets/annalaura-1.png" width="120" />
-  <img src="./assets/annalaura-2.png" width="120" />
-  <img src="./assets/annalaura-3.png" width="120" />
-  <img src="./assets/annalaura-4.png" width="120" />
-</p>
 This pipeline was finetuned from **runwayml/stable-diffusion-v1-5** on the **omkar1799/annalaura-diffusion-dataset** dataset on Huggingface which I curated and annotated myself. I've included some example images generated
@@ -52,12 +44,8 @@ From left to right, the prompts are:
 - blue and yellow cats riding bikes together through tropical forest path in an annalaura watercolor drawing style
 - raccoon character wearing gold chain driving red sports car down highway in an annalaura watercolor drawing style
-<p float="left">
-  <img src="./assets/runway-1.png" width="120" />
-  <img src="./assets/runway-2a.png" width="120" />
-  <img src="./assets/runway-3.png" width="120" />
-  <img src="./assets/runway-4.png" width="120" />
-</p>
 Unsurprisingly, Stable Diffusion did not do a great job of replicating Anna Laura's unique watercolor style, and furthermore failed to generalize well enough to produce animal-like characters that behaved like humans (i.e. it drew both a turtle and a teacher separately for the second prompt, instead of a single entity).
@@ -65,13 +53,13 @@ Unsurprisingly, Stable Diffusion did not do a great job of replicating Anna Laur
 To prepare my dataset for this training task, I needed both images and captions.
-I used the Python `instaloader` package to quickly scrape specific posts off of the @annalaura_art Instagram account. You can find this script here (TODO). In total, I had about 850 images to work with as training examples.
-Then, I wrote another script that fed these images to `claude-3.5-sonnet` and asked it to generate training data labels for each. In my prompt, I provided a few examples to guide it towards producing correct lables, and also set up a JSON formatter so I could easily pull these outputs into a dataframe later. Because of token limits per request, I only passed 10 images at a time. You can find that script here (TODO).
-Lastly, I pulled associated these images and captions in a structured CSV using simple Pandas logic in a Jupyter notebook, which you can find here (TODO).
-I then submitted my dataset to Huggingface at **omkar1799/annalaura-diffusion-dataset** which can be found here (TODO).
 ## Training
@@ -98,12 +86,8 @@ Definitely an improvement! It's starting to look like the model is understanding
 So, here are the outputs at 5,000 steps with all the same hyperparameters against the same four prompts as before:
-<p float="left">
-  <img src="./assets/annalaura-1.png" width="120" />
-  <img src="./assets/annalaura-2.png" width="120" />
-  <img src="./assets/annalaura-3.png" width="120" />
-  <img src="./assets/annalaura-4.png" width="120" />
-</p>
 Much better! There are definitely, some hallucinations (e.g. random letters in text bubbles), but the characters have taken on the shapes of Anna Laura's artwork and the model is doing a good job of generalizing to these similar but new prompts it hasn't seen before.

 Here are some examples of Annalaura's work:
+| ![Image 1](./assets/instagram-1.jpg) | ![Image 2](./assets/instagram-2.jpg) | ![Image 3](./assets/instagram-3.jpg) | ![Image 4](./assets/instagram-4.jpg) |
+|:------------------------------------:|:------------------------------------:|:------------------------------------:|:------------------------------------:|
 And here are some examples of outputs created by my model trying to generate similar scenes (TODO):
+| ![Image 1](./assets/annalaura-1.png) | ![Image 2](./assets/annalaura-2.png) | ![Image 3](./assets/annalaura-3.png) | ![Image 4](./assets/annalaura-4.png) |
+|:------------------------------------:|:------------------------------------:|:------------------------------------:|:------------------------------------:|
 This pipeline was finetuned from **runwayml/stable-diffusion-v1-5** on the **omkar1799/annalaura-diffusion-dataset** dataset on Huggingface which I curated and annotated myself. I've included some example images generated
 - blue and yellow cats riding bikes together through tropical forest path in an annalaura watercolor drawing style
 - raccoon character wearing gold chain driving red sports car down highway in an annalaura watercolor drawing style
+| ![Runway 1](./assets/runway-1.png) | ![Runway 2a](./assets/runway-2a.png) | ![Runway 3](./assets/runway-3.png) | ![Runway 4](./assets/runway-4.png) |
+|:----------------------------------:|:------------------------------------:|:----------------------------------:|:----------------------------------:|
 Unsurprisingly, Stable Diffusion did not do a great job of replicating Anna Laura's unique watercolor style, and furthermore failed to generalize well enough to produce animal-like characters that behaved like humans (i.e. it drew both a turtle and a teacher separately for the second prompt, instead of a single entity).
 To prepare my dataset for this training task, I needed both images and captions.
+I used the Python `instaloader` package to quickly scrape specific posts off of the @annalaura_art Instagram account. You can find this script [here](https://github.com/Omkar-Waingankar/annalaura-diffusion/blob/main/scraper.py). In total, I had about 850 images to work with as training examples.
+Then, I wrote another script that fed these images to `claude-3.5-sonnet` and asked it to generate training data labels for each. In my prompt, I provided a few examples to guide it towards producing correct lables, and also set up a JSON formatter so I could easily pull these outputs into a dataframe later. Because of token limits per request, I only passed 10 images at a time. You can find that script [here](https://github.com/Omkar-Waingankar/annalaura-diffusion/blob/main/labeler.py).
+Lastly, I pulled associated these images and captions in a structured CSV using simple Pandas logic in a Jupyter notebook, which you can find [here](https://github.com/Omkar-Waingankar/annalaura-diffusion/blob/main/preprocessing.ipynb).
+I then submitted my dataset to Huggingface at **omkar1799/annalaura-diffusion-dataset** which can be found [here](https://huggingface.co/datasets/omkar1799/annalaura-diffusion-dataset).
 ## Training
 So, here are the outputs at 5,000 steps with all the same hyperparameters against the same four prompts as before:
+| ![Image 1](./assets/annalaura-1.png) | ![Image 2](./assets/annalaura-2.png) | ![Image 3](./assets/annalaura-3.png) | ![Image 4](./assets/annalaura-4.png) |
+|:------------------------------------:|:------------------------------------:|:------------------------------------:|:------------------------------------:|
 Much better! There are definitely, some hallucinations (e.g. random letters in text bubbles), but the characters have taken on the shapes of Anna Laura's artwork and the model is doing a good job of generalizing to these similar but new prompts it hasn't seen before.