|
--- |
|
license: creativeml-openrail-m |
|
base_model: runwayml/stable-diffusion-v1-5 |
|
instance_prompt: photo of a bayc nft |
|
tags: |
|
- stable-diffusion |
|
- stable-diffusion-diffusers |
|
- text-to-image |
|
- diffusers |
|
- dreambooth |
|
inference: true |
|
pipeline_tag: text-to-image |
|
--- |
|
|
|
# DreamBooth - Bored Ape Yacht Club |
|
|
|
## Model Description |
|
|
|
This DreamBooth model is an exquisite derivative of [runwayml/stable-diffusion-v1-5](https://huggingface.co./runwayml/stable-diffusion-v1-5), fine-tuned with an engaging emphasis on the Bored Ape Yacht Club (BAYC) NFT collection. The model's weights were meticulously honed using photos from BAYC NFTs, leveraging the innovative [DreamBooth](https://dreambooth.github.io/) to curate a unique, text-to-image synthesis experience. |
|
|
|
|
|
### Training |
|
|
|
Images instrumental in the model's training were generously sourced from the Covalent API, specifically via this [endpoint](https://www.covalenthq.com/docs/api/nft/get-nft-token-ids-for-contract-with-metadata/). |
|
|
|
### Inference |
|
|
|
Inference has been meticulously optimized, allowing for the generation of captivating, original, and unique images that resonate with the Bored Ape Yacht Club collection. This facilitates a vivid exploration of creativity, enabling the synthesis of images that seamlessly align with the distinctive aesthetics of Bored Ape NFTs. |
|
|
|
![img_0](./image_0.png) |
|
![img_1](./image_1.png) |
|
![img_2](./image_2.png) |
|
|
|
|
|
## Usage |
|
|
|
Here’s a basic example of how you can wield this model for generating images: |
|
|
|
```python |
|
import torch |
|
from diffusers import StableDiffusionPipeline, DDIMScheduler |
|
from transformers import CLIPTextModel |
|
import numpy as np |
|
|
|
model_id = "runwayml/stable-diffusion-v1-5" |
|
|
|
unet = UNet2DConditionModel.from_pretrained("ckandemir/boredape_diffusion", subfolder="unet") |
|
text_encoder = CLIPTextModel.from_pretrained("ckandemir/boredape_diffusion",subfolder="text_encoder") |
|
|
|
pipeline = StableDiffusionPipeline.from_pretrained( |
|
model_id, unet=unet, text_encoder=text_encoder, dtype=torch.float16, use_safetensors=True |
|
).to('cuda') |
|
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config) |
|
|
|
prompt = ["a spiderman bayc nft"] |
|
neg_prompt = ["realistic,disfigured face,disfigured eyes, deformed,bad anatomy"] * len(prompt) |
|
num_samples = 3 |
|
guidance_scale = 9 |
|
num_inference_steps = 50 |
|
height = 512 |
|
width = 512 |
|
|
|
seed = np.random.randint(0, 2**20 - 1) |
|
print("Seed: {}".format(str(seed))) |
|
generator = torch.Generator(device='cuda').manual_seed(seed) |
|
|
|
with autocast("cuda"), torch.inference_mode(): |
|
imgs = pipeline( |
|
prompt, |
|
negative_prompt=neg_prompt, |
|
height=height, width=width, |
|
num_images_per_prompt=num_samples, |
|
num_inference_steps=num_inference_steps, |
|
guidance_scale=guidance_scale, |
|
generator=generator |
|
).images |
|
|
|
for img in imgs: |
|
display(img) |
|
``` |
|
|
|
## Further Optimization |
|
Results can be further enhanced and refined through meticulous fine-tuning and adept modification of training parameters, unlocking an even broader spectrum of creativity and artistic expression. |