--- pipeline_tag: text-to-image widget: - text: >- The image features an older man, a long white beard and mustache, He has a stern expression, giving the impression of a wise and experienced individual. The mans beard and mustache are prominent, adding to his distinguished appearance. The close-up shot of the mans face emphasizes his facial features and the intensity of his gaze. output: url: assets/oldman.png - text: >- Super Closeup Portrait, action shot, Profoundly dark whitish meadow, glass flowers, Stains, space grunge style, Jeanne d'Arc wearing White Olive green used styled Cotton frock, Wielding thin silver sword, Sci-fi vibe, dirty, noisy, Vintage monk style, very detailed, hd output: url: assets/swordwoman.png - text: >- cinematic film still of Kodak Motion Picture Film: (Sharp Detailed Image) An Oscar winning movie for Best Cinematography a woman in a kimono standing on a subway train in Japan Kodak Motion Picture Film Style, shallow depth of field, vignette, highly detailed, high budget, bokeh, cinemascope, moody, epic, gorgeous, film grain, grainy output: url: assets/japanesewoman.png - text: ("Proteus" text logo) powerful aura, swirling power, cinematic, masterpiece, award-winning output: url: assets/logo.png language: - en base_model: - stabilityai/stable-diffusion-xl-base-1.0 tags: - art --- # Proteus v0.6 I'm excited to introduce **Proteus v0.6**, a complete rebuild of my AI image generation model. This is the **first version of the rework**, focusing entirely on enhancing photorealism and improving how the model understands prompts. While it's not aiming to be state-of-the-art, I believe it's a good step forward in producing high-quality images. Please note that this is a **preliminary version**, and it's not the final, fully-featured checkpoint—more improvements and features will come in future updates. ## Overview Proteus v0.6 is a total rework from the ground up. In previous versions, combining different training methods and learning rates caused the model to become unstable during large-scale training. Learning from those experiences, I've retrained the model using only the photorealism aspects of the Proteus dataset. For now, I'm calling this new training technique **Multi-Perspective Fusion**. ### Multi-Perspective Fusion This approach involves: - **Training Multiple LoRAs and Full-Parameter Checkpoints**: I trained several Low-Rank Adaptation (LoRA) modules and full-parameter checkpoints on the same dataset multiple times to capture different "perspectives" of the data. - **Integrating into an Overarching Framework**: These varied models are then combined within a larger framework to enhance overall performance. I'm hoping this method will be interesting to data scientists exploring advanced training techniques. ## Key Improvements in v0.6 - **Total Rebuild**: Constructed entirely from scratch to address previous issues. - **Enhanced Photorealism**: Focused on producing good-quality photorealistic images. - **Better Prompt Understanding**: Improved how the model interprets and responds to user prompts. - **Stable Training Process**: Refined training methods to prevent the model from falling apart during large-scale training. - **Preliminary Version**: This is the first version of the rework; expect more features and improvements in future releases. ## Limitations - **No Illustrations or Anime**: Currently, the model can't generate illustrations or anime-style images because it's only been trained on photorealistic data. - **Not State-of-the-Art**: While the model performs well, I'm not claiming it's state-of-the-art—just that it's a good starting point. - **Work in Progress**: This is not the final, fully-featured checkpoint. More updates are planned. ## Usage ### Recommended Settings - **Clip Skip**: 1 - **CFG Scale**: 7 - **Steps**: 25 - 50 - **Sampler**: DPM++ 2M SDE - **Scheduler**: Karras - **Resolution**: 1024x1024 ### Use it with 🧨 diffusers Here's how you can use Proteus v0.6 with the Hugging Face 🧨 diffusers library: ```python import torch from diffusers import ( StableDiffusionXLPipeline, KDPM2AncestralDiscreteScheduler, AutoencoderKL ) # Load VAE component vae = AutoencoderKL.from_pretrained( "madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16 ) # Configure the pipeline pipe = StableDiffusionXLPipeline.from_pretrained( "dataautogpt3/ProteusV0.6", vae=vae, torch_dtype=torch.float16 ) pipe.scheduler = KDPM2AncestralDiscreteScheduler.from_config(pipe.scheduler.config) pipe.to('cuda') # Define prompts and generate image prompt = "a cat wearing sunglasses on the beach" negative_prompt = "" image = pipe( prompt, negative_prompt=negative_prompt, width=1024, height=1024, guidance_scale=7, num_inference_steps=50, ).images[0] image.save("generated_image.png") ``` ## Future Plans Following the approach from the first version, I plan to gradually introduce new concepts and visual styles by adding one large training batch at a time. This incremental method aims to expand the model's capabilities while keeping it stable. ## Collaborations If anyone is interested, I'd be open to collaborating on papers about this work. I'm looking for a team to help me publish, but I'm new to this and would appreciate any guidance. ## License **License Options:** Given my goal to allow personal use and commercial use up to a certain revenue threshold while requiring larger entities to contact me for a separate agreement, I'm considering the following existing licenses: ### Polyform Small Business License 1.0.0 - **Permits**: Use by individuals and entities with annual gross revenues under a specified amount (e.g., $1 million USD). - **Requires**: Entities exceeding the revenue threshold to obtain a commercial license from me. For more details, see the [Polyform Small Business License](https://polyformproject.org/licenses/small-business/1.0.0/). ## Acknowledgments This is a personal project developed solely by me. --- **Citation** If you use Proteus v0.6 in your work, please cite it as: \[Alexander Rafael Izquierdo\], "Proteus v0.6: Multi-Perspective Fusion," 2024. ---