README.md · dataautogpt3/Miniaturus-Potentia at 9d9574f7f9b6a14b8c0c6eebfb8a2248e479b87f

metadata
license: cc-by-nc-nd-4.0
pipeline_tag: text-to-image
description: >
  This model is a fine-tuned version of Stable Diffusion 1.5, specifically
  enhanced for generating high-quality images of people, hands, and text. It has
  been trained on 131,000 high-quality, captioned image pairs generated using
  DALL-E 3. The training was conducted on four NVIDIA 3090 GPUs with NVLink over
  16 hours, spanning 8 epochs.

  The model demonstrates notable proficiency in rendering human figures and
  intricate details like hand gestures and written text, although it shows less
  effectiveness with animal imagery. This specialization makes it well-suited
  for applications requiring precise human and text representations.

  The fine-tuning process involved 13,100 unique examples, contributing to a
  total dataset size of 131,000 images. Each training epoch processed 31,000
  examples, with a total train batch size of 40. The model underwent a total of
  26,200 optimization steps, maintaining a gradient accumulation of 1 throughout
  the training period.

  The enhancements in this version aim to minimize common image generation flaws
  such as blurriness, disproportion, noise, and low resolution, ensuring clear
  and anatomically accurate outputs.
widget:
  - text: '-'
    output:
      url: ComfyUI_00641_.png
  - text: '-'
    output:
      url: ComfyUI_00637_.png
  - text: '-'
    output:
      url: ComfyUI_00623_.png
  - text: '-'
    output:
      url: ComfyUI_00617_.png
  - text: '-'
    output:
      url: ComfyUI_00615_.png
  - text: '-'
    parameters:
      negative_prompt: >
        bad quality, bad anatomy, worst quality, low quality, low resolution,
        extra fingers, blur, blurry, ugly, wrong proportions, watermark, image
        artifacts, lowres, ugly, jpeg artifacts, deformed, noisy image,
        embedding:ac_neg1,
    output:
      url: ComfyUI_00614_.png