Model Card for VRoid Diffusion Unconditional

This is a latent unconditional diffusion model to demonstrate how U-Net training affects the generated images.

  • Pretrained Text Encoder (OpenCLIP) is removed, but an empty text encoder is included for compatibility with StableDiffusionPipeline.
  • VAE is from Mitsua Diffusion One, Mitsua Open RAIL-M License, Training Data: Public Domain/CC0 + Licensed
  • U-Net is trained from scratch using full version of VRoid Image Dataset Lite with some modifications.
    • The architecture of the U-Net model was modified to conform to unconditional image generation. Cross-attention blocks are replaced by self-attention blocks.
  • VRoid is a trademark or registered trademark of Pixiv inc. in Japan and other regions.

Model variant

  • VRoid Diffusion
    • This is conditional text-to-image generator using OpenCLIP.

Note

  • This model works only on diffusers StableDiffusionPipeline. This model will not work on A1111 WebUI.
from diffusers import StableDiffusionPipeline
pipeline = StableDiffusionPipeline.from_pretrained("Mitsua/vroid-diffusion-test-unconditional")

Model Description

  • Developed by: Abstract Engine.
  • License: Mitsua Open RAIL-M License.

Uses

Direct Use

Image generation for research and educational purposes.

Out-of-Scope Use

Any deployed use case of the model.

Training Details

  • Trained resolution : 256x256
  • Batch Size : 48
  • Steps : 45k
  • LR : 1e-5 with warmup 1000 steps

Training Data

We use full version of VRoid Image Dataset Lite with some modifications.

Downloads last month
44
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train Mitsua/vroid-diffusion-test-unconditional