keras
/

stable_diffusion_3.5_large

Text-to-Image

KerasHub

Model card Files Files and versions Community

Divyasreepat commited on 23 days ago

Commit

91524eb

verified ·

1 Parent(s): cb52f1a

Update README.md with new model card content

Browse files

Files changed (1) hide show

README.md +279 -0

README.md CHANGED Viewed

@@ -5,5 +5,284 @@ tags:
 pipeline_tag: text-to-image
 ---
 ### Model Overview

 pipeline_tag: text-to-image
 ---
 ### Model Overview
+[Stable Diffusion 3.5 ](https://stability.ai/learning-hub/stable-diffusion-3-5-prompt-guide) is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
+For more technical details, please refer to the [Research paper](https://stability.ai/news/stable-diffusion-3-research-paper).
+Please note: this model is released under the Stability Community License. For Enterprise License visit Stability.ai or [contact us](https://stability.ai/enterprise) for commercial licensing details.
+## Links
+* [SD3.5 Quickstart Notebook ](https://colab.sandbox.google.com/gist/laxmareddyp/55daf77f87730c3b3f498318672f70b3/stablediffusion3_5-quckstart-notebook.ipynb)
+* [SD3.5 API Documentation](https://keras.io/keras_hub/api/models/stable_diffusion_3/)
+* [SD3.5 Model Card](https://huggingface.co/stabilityai/stable-diffusion-3.5-large)
+* [KerasHub Beginner Guide](https://keras.io/guides/keras_hub/getting_started/)
+* [KerasHub Model Publishing Guide](https://keras.io/guides/keras_hub/upload/)
+## Presets
+The following model checkpoints are provided by the Keras team. Full code examples for each are available below.
+| Preset name    | Parameters | Description                                      |
+|----------------|------------|--------------------------------------------------|
+| stable_diffusion_3.5_large|   9.05B  | 9 billion parameter, including CLIP L and CLIP G text encoders, MMDiT generative model, and VAE autoencoder. Developed by Stability AI.|
+| stable_diffusion_3.5_large_turbo | 9.05B | 9 billion parameter, including CLIP L and CLIP G text encoders, MMDiT generative model, and VAE autoencoder. A timestep-distilled version that eliminates classifier-free guidance and uses fewer steps for generation. Developed by Stability AI. |
+### Model Description
+- **Developed by:** Stability AI
+- **Model type:** MMDiT text-to-image generative model
+- **Model Description:** This is a model that can be used to generate images based on text prompts. It is a [Multimodal Diffusion Transformer](https://arxiv.org/abs/2403.03206)
+ that uses three fixed, pretrained text encoders (OpenCLIP-ViT/G, CLIP-ViT/L and T5-xxl), and QK-normalization to improve training stability.
+## Example Usage
+```python
+!pip install -U keras-hub
+!pip install -U keras
+```
+```
+# Pretrained Stable Diffusion 3 model.
+model = keras_hub.models.StableDiffusion3Backbone.from_preset(
+    "stable_diffusion_3.5_large"
+)
+# Randomly initialized Stable Diffusion 3 model with custom config.
+vae = keras_hub.models.VAEBackbone(...)
+clip_l = keras_hub.models.CLIPTextEncoder(...)
+clip_g = keras_hub.models.CLIPTextEncoder(...)
+model = keras_hub.models.StableDiffusion3Backbone(
+    mmdit_patch_size=2,
+    mmdit_num_heads=4,
+    mmdit_hidden_dim=256,
+    mmdit_depth=4,
+    mmdit_position_size=192,
+    vae=vae,
+    clip_l=clip_l,
+    clip_g=clip_g,
+)
+# Image to image example
+image_to_image = keras_hub.models.StableDiffusion3ImageToImage.from_preset(
+        "stable_diffusion_3.5_large", height=512, width=512
+)
+image_to_image.generate(
+    {
+        "images": np.ones((512, 512, 3), dtype="float32"),
+        "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
+    }
+)
+# Generate with batched prompts.
+image_to_image.generate(
+    {
+        "images": np.ones((2, 512, 512, 3), dtype="float32"),
+        "prompts": ["cute wallpaper art of a cat", "cute wallpaper art of a dog"],
+    }
+)
+# Generate with different `num_steps`, `guidance_scale` and `strength`.
+image_to_image.generate(
+    {
+        "images": np.ones((512, 512, 3), dtype="float32"),
+        "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
+    }
+    num_steps=50,
+    guidance_scale=5.0,
+    strength=0.6,
+)
+# Generate with `negative_prompts`.
+text_to_image.generate(
+    {
+        "images": np.ones((512, 512, 3), dtype="float32"),
+        "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
+        "negative_prompts": "green color",
+    }
+)
+# inpainting example
+reference_image = np.ones((1024, 1024, 3), dtype="float32")
+reference_mask = np.ones((1024, 1024), dtype="float32")
+inpaint = keras_hub.models.StableDiffusion3Inpaint.from_preset(
+    "stable_diffusion_3.5_large", height=512, width=512
+)
+inpaint.generate(
+    reference_image,
+    reference_mask,
+    "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
+)
+# Generate with batched prompts.
+reference_images = np.ones((2, 512, 512, 3), dtype="float32")
+reference_mask = np.ones((2, 1024, 1024), dtype="float32")
+inpaint.generate(
+    reference_images,
+    reference_mask,
+    ["cute wallpaper art of a cat", "cute wallpaper art of a dog"]
+)
+# Generate with different `num_steps`, `guidance_scale` and `strength`.
+inpaint.generate(
+    reference_image,
+    reference_mask,
+    "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
+    num_steps=50,
+    guidance_scale=5.0,
+    strength=0.6,
+)
+# text to image example
+text_to_image = keras_hub.models.StableDiffusion3TextToImage.from_preset(
+   "stable_diffusion_3.5_large", height=512, width=512
+)
+text_to_image.generate(
+    "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
+)
+# Generate with batched prompts.
+text_to_image.generate(
+    ["cute wallpaper art of a cat", "cute wallpaper art of a dog"]
+)
+# Generate with different `num_steps` and `guidance_scale`.
+text_to_image.generate(
+    "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
+    num_steps=50,
+    guidance_scale=5.0,
+)
+# Generate with `negative_prompts`.
+text_to_image.generate(
+    {
+        "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
+        "negative_prompts": "green color",
+    }
+)
+```
+## Example Usage with Hugging Face URI
+```python
+!pip install -U keras-hub
+!pip install -U keras
+```
+```
+# Pretrained Stable Diffusion 3 model.
+model = keras_hub.models.StableDiffusion3Backbone.from_preset(
+    "hf://keras/stable_diffusion_3.5_large"
+)
+# Randomly initialized Stable Diffusion 3 model with custom config.
+vae = keras_hub.models.VAEBackbone(...)
+clip_l = keras_hub.models.CLIPTextEncoder(...)
+clip_g = keras_hub.models.CLIPTextEncoder(...)
+model = keras_hub.models.StableDiffusion3Backbone(
+    mmdit_patch_size=2,
+    mmdit_num_heads=4,
+    mmdit_hidden_dim=256,
+    mmdit_depth=4,
+    mmdit_position_size=192,
+    vae=vae,
+    clip_l=clip_l,
+    clip_g=clip_g,
+)
+# Image to image example
+image_to_image = keras_hub.models.StableDiffusion3ImageToImage.from_preset(
+        "hf://keras/stable_diffusion_3.5_large", height=512, width=512
+)
+image_to_image.generate(
+    {
+        "images": np.ones((512, 512, 3), dtype="float32"),
+        "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
+    }
+)
+# Generate with batched prompts.
+image_to_image.generate(
+    {
+        "images": np.ones((2, 512, 512, 3), dtype="float32"),
+        "prompts": ["cute wallpaper art of a cat", "cute wallpaper art of a dog"],
+    }
+)
+# Generate with different `num_steps`, `guidance_scale` and `strength`.
+image_to_image.generate(
+    {
+        "images": np.ones((512, 512, 3), dtype="float32"),
+        "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
+    }
+    num_steps=50,
+    guidance_scale=5.0,
+    strength=0.6,
+)
+# Generate with `negative_prompts`.
+text_to_image.generate(
+    {
+        "images": np.ones((512, 512, 3), dtype="float32"),
+        "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
+        "negative_prompts": "green color",
+    }
+)
+# inpainting example
+reference_image = np.ones((1024, 1024, 3), dtype="float32")
+reference_mask = np.ones((1024, 1024), dtype="float32")
+inpaint = keras_hub.models.StableDiffusion3Inpaint.from_preset(
+    "hf://keras/stable_diffusion_3.5_large", height=512, width=512
+)
+inpaint.generate(
+    reference_image,
+    reference_mask,
+    "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
+)
+# Generate with batched prompts.
+reference_images = np.ones((2, 512, 512, 3), dtype="float32")
+reference_mask = np.ones((2, 1024, 1024), dtype="float32")
+inpaint.generate(
+    reference_images,
+    reference_mask,
+    ["cute wallpaper art of a cat", "cute wallpaper art of a dog"]
+)
+# Generate with different `num_steps`, `guidance_scale` and `strength`.
+inpaint.generate(
+    reference_image,
+    reference_mask,
+    "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
+    num_steps=50,
+    guidance_scale=5.0,
+    strength=0.6,
+)
+# text to image example
+text_to_image = keras_hub.models.StableDiffusion3TextToImage.from_preset(
+   "hf://keras/stable_diffusion_3.5_large", height=512, width=512
+)
+text_to_image.generate(
+    "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
+)
+# Generate with batched prompts.
+text_to_image.generate(
+    ["cute wallpaper art of a cat", "cute wallpaper art of a dog"]
+)
+# Generate with different `num_steps` and `guidance_scale`.
+text_to_image.generate(
+    "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
+    num_steps=50,
+    guidance_scale=5.0,
+)
+# Generate with `negative_prompts`.
+text_to_image.generate(
+    {
+        "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
+        "negative_prompts": "green color",
+    }
+)
+```