cartoonify / README.md
lavaman131's picture
Update README.md
c0ef1f7 verified
metadata
license: creativeml-openrail-m
library_name: diffusers
tags:
  - text-to-image
  - dreambooth
  - diffusers-training
  - stable-diffusion
  - stable-diffusion-diffusers
base_model: runwayml/stable-diffusion-v1-5
inference: true
instance_prompt: disney style

Cartoonify

This is a dreambooth model derived from runwayml/stable-diffusion-v1-5 with additional fine-tuning of the text encoder. The weights were trained from a popular animation studio using DreamBooth. Use the tokens disney style in your prompts for the effect.

You can find some example images below:

Intended uses & limitations

How to use

import torch
from diffusers import StableDiffusionPipeline

# basic usage
repo_id = "lavaman131/cartoonify"
device = torch.device("cuda")
torch_dtype = torch.float16 if device.type in ["mps", "cuda"] else torch.float32
pipeline = StableDiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch_dtype).to(device)
image = pipeline("PROMPT GOES HERE").images[0]
image.save("output.png")

Full source code

The full source-code used for training and local gradio demo for image to disney character style transfer can be found here.

Limitations and bias

As with any diffusion model, playing around with the prompt and classifier-free guidance parameter is required until you get the results you want. Zoomed-out subjects seem to loose clairity in the face. For additional safety in image generation, we use the Stable Diffusion safety checker.

Training details

The model was fine-tuned for 3500 steps on around 200 images of modern Disney characters, backgrounds, and animals. The ratios for each were 70%, 20%, and 10% respectively on an RTX A5000 GPU (24GB VRAM).

The training code used can be found here. The regularization images used for training can be found here.