File size: 6,153 Bytes
6d106f7
 
 
 
 
 
 
 
 
 
4f64607
6d106f7
 
 
 
 
 
4f64607
6d106f7
 
 
 
 
 
 
4f64607
 
6d106f7
4f64607
6d106f7
 
 
 
 
 
 
 
 
 
 
2895cba
6d106f7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4f64607
6d106f7
 
 
 
 
4f64607
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fdf5d71
4f64607
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6d106f7
 
 
 
 
 
 
 
 
 
4f64607
 
 
 
 
 
d82b815
4f64607
 
 
 
6d106f7
 
 
4f64607
6d106f7
 
 
 
 
 
 
02c954d
6d106f7
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
---
pipeline_tag: text-to-image
widget:
- text: >-
    The image features an older man, a long white beard and mustache,  He has a
    stern expression, giving the impression of a wise and experienced
    individual. The mans beard and mustache are prominent, adding to his
    distinguished appearance. The close-up shot of the mans face emphasizes his
    facial features and the intensity of his gaze.
  output:
    url: assets/oldman.png
- text: >-
    Super Closeup Portrait, action shot, Profoundly dark whitish meadow, glass
    flowers, Stains, space grunge style, Jeanne d'Arc wearing White Olive green
    used styled Cotton frock, Wielding thin silver sword, Sci-fi vibe, dirty,
    noisy, Vintage monk style, very detailed, hd
  output:
    url: assets/swordwoman.png
- text: >-
    cinematic film still of Kodak Motion Picture Film: (Sharp Detailed Image) An
    Oscar winning movie for Best Cinematography a woman in a kimono standing on
    a subway train in Japan Kodak Motion Picture Film Style, shallow depth of
    field, vignette, highly detailed, high budget, bokeh, cinemascope, moody,
    epic, gorgeous, film grain, grainy
  output:
    url: assets/japanesewoman.png
- text: ("Proteus" text logo) powerful aura, swirling power, cinematic, masterpiece, award-winning 
  output:
    url: assets/logo.png
language:
- en
base_model:
- stabilityai/stable-diffusion-xl-base-1.0
tags:
- art
---
<Gallery />

# Proteus v0.6

I'm excited to introduce **Proteus v0.6**, a complete rebuild of my AI image generation model. This is the **first version of the rework**, focusing entirely on enhancing photorealism. While it's not aiming to be state-of-the-art, I believe it's a good step forward in producing high-quality images. Please note that this is a **preliminary version**, and it's not the final, fully-featured checkpoint—more improvements and features will come in future updates.

## Overview

Proteus v0.6 is a total rework from the ground up. In previous versions, combining different training methods and learning rates caused the model to become unstable during large-scale training. Learning from those experiences, I've retrained the model using only the photorealism aspects of the Proteus dataset.

For now, I'm calling this new training technique **Multi-Perspective Fusion**.

### Multi-Perspective Fusion

This approach involves:

- **Training Multiple LoRAs and Full-Parameter Checkpoints**: I trained several Low-Rank Adaptation (LoRA) modules and full-parameter checkpoints on the same dataset multiple times to capture different "perspectives" of the data.
- **Integrating into an Overarching Framework**: These varied models are then combined within a larger framework to enhance overall performance.

I'm hoping this method will be interesting to data scientists exploring advanced training techniques.

## Key Improvements in v0.6

- **Total Rebuild**: Constructed entirely from scratch to address previous issues.
- **Enhanced Photorealism**: Focused on producing good-quality photorealistic images.
- **Stable Training Process**: Refined training methods to prevent the model from falling apart during large-scale training.
- **Preliminary Version**: This is the first version of the rework; expect more features and improvements in future releases.

## Limitations

- **No Illustrations or Anime**: Currently, the model can't generate illustrations or anime-style images because it's only been trained on photorealistic data.
- **Not State-of-the-Art**: While the model performs well, I'm not claiming it's state-of-the-art—just that it's a good starting point.
- **Work in Progress**: This is not the final, fully-featured checkpoint. More updates are planned.

## Usage
### Recommended Settings

- **Clip Skip**: 1
- **CFG Scale**: 7
- **Steps**: 25 - 50
- **Sampler**: DPM++ 2M SDE
- **Scheduler**: Karras
- **Resolution**: 1024x1024

### Use it with 🧨 diffusers

Here's how you can use Proteus v0.6 with the Hugging Face 🧨 diffusers library:

```python
import torch
from diffusers import (
    StableDiffusionXLPipeline, 
    KDPM2AncestralDiscreteScheduler,
    AutoencoderKL
)

# Load VAE component
vae = AutoencoderKL.from_pretrained(
    "madebyollin/sdxl-vae-fp16-fix", 
    torch_dtype=torch.float16
)

# Configure the pipeline
pipe = StableDiffusionXLPipeline.from_pretrained(
    "dataautogpt3/Proteus-v0.6", 
    vae=vae,
    torch_dtype=torch.float16
)
pipe.scheduler = KDPM2AncestralDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.to('cuda')

# Define prompts and generate image
prompt = "a cat wearing sunglasses on the beach"
negative_prompt = ""

image = pipe(
    prompt, 
    negative_prompt=negative_prompt, 
    width=1024,
    height=1024,
    guidance_scale=7,
    num_inference_steps=50,
).images[0]

image.save("generated_image.png")
```
## Future Plans

Following the approach from the first version, I plan to gradually introduce new concepts and visual styles by adding one large training batch at a time. This incremental method aims to expand the model's capabilities while keeping it stable.

## Collaborations

If anyone is interested, I'd be open to collaborating on papers about this work. I'm looking for a team to help me publish, but I'm new to this and would appreciate any guidance.

## License

**License Options:**

Given my goal to allow personal use and commercial use up to a certain revenue threshold while requiring larger entities to contact me for a separate agreement, I'm considering the following existing licenses:

### Polyform Small Business License 1.0.0

- **Permits**: Use by individuals and entities with annual gross revenues under a specified amount (e.g., $5 million USD).
- **Requires**: Entities exceeding the revenue threshold to obtain a commercial license from me.

For more details, see the [Polyform Small Business License](https://polyformproject.org/licenses/small-business/1.0.0/).


## Acknowledgments

This is a personal project developed solely by me.

---

**Citation**

If you use Proteus v0.6 in your work, please cite it as:

\[Alexander Rafael Izquierdo\], "Proteus v0.6: Multi-Perspective Fusion," 2024.

---