README.md · keras/stable_diffusion_3.5_large at 0275605261ef19f05c2ddc52d657d0cec030d396

metadata

library_name: keras-hub

This is a StableDiffusion3 model uploaded using the KerasHub library and can be used with JAX, TensorFlow, and PyTorch backends. Model config:

name: stable_diffusion_3.5_backbone
trainable: True
mmdit_patch_size: 2
mmdit_hidden_dim: 2432
mmdit_num_layers: 38
mmdit_num_heads: 38
mmdit_position_size: 192
mmdit_qk_norm: rms_norm
vae: {'module': 'keras_hub.src.models.vae.vae_backbone', 'class_name': 'VAEBackbone', 'config': {'name': 'vae', 'trainable': True, 'encoder_num_filters': [128, 256, 512, 512], 'encoder_num_blocks': [2, 2, 2, 2], 'decoder_num_filters': [512, 512, 256, 128], 'decoder_num_blocks': [3, 3, 3, 3], 'sampler_method': 'sample', 'input_channels': 3, 'sample_channels': 32, 'output_channels': 3, 'scale': 1.5305, 'shift': 0.0609}, 'registered_name': 'VAEBackbone'}
clip_l: {'module': 'keras_hub.src.models.clip.clip_text_encoder', 'class_name': 'CLIPTextEncoder', 'config': {'name': 'clip_l', 'trainable': True, 'vocabulary_size': 49408, 'embedding_dim': 768, 'hidden_dim': 768, 'num_layers': 12, 'num_heads': 12, 'intermediate_dim': 3072, 'intermediate_activation': 'quick_gelu', 'intermediate_output_index': 10, 'max_sequence_length': 77}, 'registered_name': 'CLIPTextEncoder'}
clip_g: {'module': 'keras_hub.src.models.clip.clip_text_encoder', 'class_name': 'CLIPTextEncoder', 'config': {'name': 'clip_g', 'trainable': True, 'vocabulary_size': 49408, 'embedding_dim': 1280, 'hidden_dim': 1280, 'num_layers': 32, 'num_heads': 20, 'intermediate_dim': 5120, 'intermediate_activation': 'gelu', 'intermediate_output_index': 30, 'max_sequence_length': 77}, 'registered_name': 'CLIPTextEncoder'}
t5: None
latent_channels: 16
output_channels: 3
num_train_timesteps: 1000
shift: 3.0
image_shape: [1024, 1024, 3]

This model card has been generated automatically and should be completed by the model author. See Model Cards documentation for more information.