---
pipeline_tag: text-to-image
widget:
- text: >-
    The image features an older man, a long white beard and mustache,  He has a
    stern expression, giving the impression of a wise and experienced
    individual. The mans beard and mustache are prominent, adding to his
    distinguished appearance. The close-up shot of the mans face emphasizes his
    facial features and the intensity of his gaze.
  output:
    url: assets/oldman.png
- text: >-
    Super Closeup Portrait, action shot, Profoundly dark whitish meadow, glass
    flowers, Stains, space grunge style, Jeanne d'Arc wearing White Olive green
    used styled Cotton frock, Wielding thin silver sword, Sci-fi vibe, dirty,
    noisy, Vintage monk style, very detailed, hd
  output:
    url: assets/swordwoman.png
- text: >-
    cinematic film still of Kodak Motion Picture Film: (Sharp Detailed Image) An
    Oscar winning movie for Best Cinematography a woman in a kimono standing on
    a subway train in Japan Kodak Motion Picture Film Style, shallow depth of
    field, vignette, highly detailed, high budget, bokeh, cinemascope, moody,
    epic, gorgeous, film grain, grainy
  output:
    url: assets/japanesewoman.png
- text: ("Proteus" text logo) powerful aura, swirling power, cinematic, masterpiece, award-winning 
  output:
    url: assets/logo.png
language:
- en
base_model:
- stabilityai/stable-diffusion-xl-base-1.0
tags:
- art
---
<Gallery />

# Proteus v0.6

I'm excited to introduce **Proteus v0.6**, a complete rebuild of my AI image generation model. This is the **first version of the rework**, focusing entirely on enhancing photorealism and improving how the model understands prompts. While it's not aiming to be state-of-the-art, I believe it's a good step forward in producing high-quality images. Please note that this is a **preliminary version**, and it's not the final, fully-featured checkpoint—more improvements and features will come in future updates.

## Overview

Proteus v0.6 is a total rework from the ground up. In previous versions, combining different training methods and learning rates caused the model to become unstable during large-scale training. Learning from those experiences, I've retrained the model using only the photorealism aspects of the Proteus dataset.

For now, I'm calling this new training technique **Multi-Perspective Fusion**.

### Multi-Perspective Fusion

This approach involves:

- **Training Multiple LoRAs and Full-Parameter Checkpoints**: I trained several Low-Rank Adaptation (LoRA) modules and full-parameter checkpoints on the same dataset multiple times to capture different "perspectives" of the data.
- **Integrating into an Overarching Framework**: These varied models are then combined within a larger framework to enhance overall performance.

I'm hoping this method will be interesting to data scientists exploring advanced training techniques.

## Key Improvements in v0.6

- **Total Rebuild**: Constructed entirely from scratch to address previous issues.
- **Enhanced Photorealism**: Focused on producing good-quality photorealistic images.
- **Better Prompt Understanding**: Improved how the model interprets and responds to user prompts.
- **Stable Training Process**: Refined training methods to prevent the model from falling apart during large-scale training.
- **Preliminary Version**: This is the first version of the rework; expect more features and improvements in future releases.

## Limitations

- **No Illustrations or Anime**: Currently, the model can't generate illustrations or anime-style images because it's only been trained on photorealistic data.
- **Not State-of-the-Art**: While the model performs well, I'm not claiming it's state-of-the-art—just that it's a good starting point.
- **Work in Progress**: This is not the final, fully-featured checkpoint. More updates are planned.

## Usage
### Recommended Settings

- **Clip Skip**: 1
- **CFG Scale**: 7
- **Steps**: 25 - 50
- **Sampler**: DPM++ 2M SDE
- **Scheduler**: Karras
- **Resolution**: 1024x1024

### Use it with 🧨 diffusers

Here's how you can use Proteus v0.6 with the Hugging Face 🧨 diffusers library:

```python
import torch
from diffusers import (
    StableDiffusionXLPipeline, 
    KDPM2AncestralDiscreteScheduler,
    AutoencoderKL
)

# Load VAE component
vae = AutoencoderKL.from_pretrained(
    "madebyollin/sdxl-vae-fp16-fix", 
    torch_dtype=torch.float16
)

# Configure the pipeline
pipe = StableDiffusionXLPipeline.from_pretrained(
    "dataautogpt3/ProteusV0.6", 
    vae=vae,
    torch_dtype=torch.float16
)
pipe.scheduler = KDPM2AncestralDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.to('cuda')

# Define prompts and generate image
prompt = "a cat wearing sunglasses on the beach"
negative_prompt = ""

image = pipe(
    prompt, 
    negative_prompt=negative_prompt, 
    width=1024,
    height=1024,
    guidance_scale=7,
    num_inference_steps=50,
).images[0]

image.save("generated_image.png")
```
## Future Plans

Following the approach from the first version, I plan to gradually introduce new concepts and visual styles by adding one large training batch at a time. This incremental method aims to expand the model's capabilities while keeping it stable.

## Collaborations

If anyone is interested, I'd be open to collaborating on papers about this work. I'm looking for a team to help me publish, but I'm new to this and would appreciate any guidance.

## License

**License Options:**

Given my goal to allow personal use and commercial use up to a certain revenue threshold while requiring larger entities to contact me for a separate agreement, I'm considering the following existing licenses:

### Polyform Small Business License 1.0.0

- **Permits**: Use by individuals and entities with annual gross revenues under a specified amount (e.g., $1 million USD).
- **Requires**: Entities exceeding the revenue threshold to obtain a commercial license from me.

For more details, see the [Polyform Small Business License](https://polyformproject.org/licenses/small-business/1.0.0/).


## Acknowledgments

This is a personal project developed solely by me.

---

**Citation**

If you use Proteus v0.6 in your work, please cite it as:

\[Alexander Rafael Izquierdo\], "Proteus v0.6: Multi-Perspective Fusion," 2024.

---