File size: 9,663 Bytes

---
library_name: keras-hub
pipeline_tag: text-to-image
tags:
- image-to-image
- keras
---
### Model Overview
[Stable Diffusion 3 Medium](https://stability.ai/news/stable-diffusion-3-medium) is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.

For more technical details, please refer to the [Research paper](https://stability.ai/news/stable-diffusion-3-research-paper).

Please note: this model is released under the Stability Community License. For Enterprise License visit Stability.ai or [contact us](https://stability.ai/enterprise) for commercial licensing details.

## Links

* [SD3 Quickstart Notebook Text-to-image](https://www.kaggle.com/code/laxmareddypatlolla/stablediffusion3-quickstart-notebook)
* [SD3 Quickstart Notebook Image-to-image](https://colab.sandbox.google.com/gist/laxmareddyp/46de6fbb274b12e8515c7bc55dfc5c57/stable-diffusion-3.ipynb)
* [SD3 API Documentation](https://keras.io/keras_hub/api/models/stable_diffusion_3/)
* [SD3 Model Card](https://huggingface.co/stabilityai/stable-diffusion-3-medium)
* [KerasHub Beginner Guide](https://keras.io/guides/keras_hub/getting_started/)
* [KerasHub Model Publishing Guide](https://keras.io/guides/keras_hub/upload/)

## Presets

The following model checkpoints are provided by the Keras team. Full code examples for each are available below.

| Preset name                            | Parameters | Description                                                                                                  |
|---------------------------------------|------------|--------------------------------------------------------------------------------------------------------------|
| stable_diffusion_3_medium | 2.99B     | 3 billion parameter, including CLIP L and CLIP G text encoders, MMDiT generative model, and VAE autoencoder. Developed by Stability AI. |


- **Developed by:** Stability AI
- **Model type:** MMDiT text-to-image generative model
- **Model Description:** This is a model that can be used to generate images based on text prompts. It is a [Multimodal Diffusion Transformer](https://arxiv.org/abs/2403.03206) that uses three fixed, pretrained text encoders 
([OpenCLIP-ViT/G](https://github.com/mlfoundations/open_clip), [CLIP-ViT/L](https://github.com/openai/CLIP/tree/main) and [T5-xxl](https://huggingface.co/google/t5-v1_1-xxl))

## Example Usage
```python
!pip install -U keras-hub
!pip install -U keras
```
```
# Pretrained Stable Diffusion 3 model.
model = keras_hub.models.StableDiffusion3Backbone.from_preset(
    "stable_diffusion_3_medium"
)

# Randomly initialized Stable Diffusion 3 model with custom config.
vae = keras_hub.models.VAEBackbone(...)
clip_l = keras_hub.models.CLIPTextEncoder(...)
clip_g = keras_hub.models.CLIPTextEncoder(...)
model = keras_hub.models.StableDiffusion3Backbone(
    mmdit_patch_size=2,
    mmdit_num_heads=4,
    mmdit_hidden_dim=256,
    mmdit_depth=4,
    mmdit_position_size=192,
    vae=vae,
    clip_l=clip_l,
    clip_g=clip_g,
)

# Image to image example
image_to_image = keras_hub.models.StableDiffusion3ImageToImage.from_preset(
        "stable_diffusion_3_medium", height=512, width=512
)
image_to_image.generate(
    {
        "images": np.ones((512, 512, 3), dtype="float32"),
        "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
    }
)

# Generate with batched prompts.
image_to_image.generate(
    {
        "images": np.ones((2, 512, 512, 3), dtype="float32"),
        "prompts": ["cute wallpaper art of a cat", "cute wallpaper art of a dog"],
    }
)

# Generate with different `num_steps`, `guidance_scale` and `strength`.
image_to_image.generate(
    {
        "images": np.ones((512, 512, 3), dtype="float32"),
        "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
    }
    num_steps=50,
    guidance_scale=5.0,
    strength=0.6,
)

# Generate with `negative_prompts`.
text_to_image.generate(
    {
        "images": np.ones((512, 512, 3), dtype="float32"),
        "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
        "negative_prompts": "green color",
    }
)

# inpainting example
reference_image = np.ones((1024, 1024, 3), dtype="float32")
reference_mask = np.ones((1024, 1024), dtype="float32")
inpaint = keras_hub.models.StableDiffusion3Inpaint.from_preset(
    "stable_diffusion_3_medium", height=512, width=512
)
inpaint.generate(
    reference_image,
    reference_mask,
    "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
)

# Generate with batched prompts.
reference_images = np.ones((2, 512, 512, 3), dtype="float32")
reference_mask = np.ones((2, 1024, 1024), dtype="float32")
inpaint.generate(
    reference_images,
    reference_mask,
    ["cute wallpaper art of a cat", "cute wallpaper art of a dog"]
)

# Generate with different `num_steps`, `guidance_scale` and `strength`.
inpaint.generate(
    reference_image,
    reference_mask,
    "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
    num_steps=50,
    guidance_scale=5.0,
    strength=0.6,
)

# text to image example
text_to_image = keras_hub.models.StableDiffusion3TextToImage.from_preset(
    "stable_diffusion_3_medium", height=512, width=512
)
text_to_image.generate(
    "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
)

# Generate with batched prompts.
text_to_image.generate(
    ["cute wallpaper art of a cat", "cute wallpaper art of a dog"]
)

# Generate with different `num_steps` and `guidance_scale`.
text_to_image.generate(
    "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
    num_steps=50,
    guidance_scale=5.0,
)

# Generate with `negative_prompts`.
text_to_image.generate(
    {
        "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
        "negative_prompts": "green color",
    }
)
```

## Example Usage with Hugging Face URI

```python
!pip install -U keras-hub
!pip install -U keras
```
```
# Pretrained Stable Diffusion 3 model.
model = keras_hub.models.StableDiffusion3Backbone.from_preset(
    "hf://keras/stable_diffusion_3_medium"
)

# Randomly initialized Stable Diffusion 3 model with custom config.
vae = keras_hub.models.VAEBackbone(...)
clip_l = keras_hub.models.CLIPTextEncoder(...)
clip_g = keras_hub.models.CLIPTextEncoder(...)
model = keras_hub.models.StableDiffusion3Backbone(
    mmdit_patch_size=2,
    mmdit_num_heads=4,
    mmdit_hidden_dim=256,
    mmdit_depth=4,
    mmdit_position_size=192,
    vae=vae,
    clip_l=clip_l,
    clip_g=clip_g,
)

# Image to image example
image_to_image = keras_hub.models.StableDiffusion3ImageToImage.from_preset(
        "hf://keras/stable_diffusion_3_medium", height=512, width=512
)
image_to_image.generate(
    {
        "images": np.ones((512, 512, 3), dtype="float32"),
        "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
    }
)

# Generate with batched prompts.
image_to_image.generate(
    {
        "images": np.ones((2, 512, 512, 3), dtype="float32"),
        "prompts": ["cute wallpaper art of a cat", "cute wallpaper art of a dog"],
    }
)

# Generate with different `num_steps`, `guidance_scale` and `strength`.
image_to_image.generate(
    {
        "images": np.ones((512, 512, 3), dtype="float32"),
        "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
    }
    num_steps=50,
    guidance_scale=5.0,
    strength=0.6,
)

# Generate with `negative_prompts`.
text_to_image.generate(
    {
        "images": np.ones((512, 512, 3), dtype="float32"),
        "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
        "negative_prompts": "green color",
    }
)

# inpainting example
reference_image = np.ones((1024, 1024, 3), dtype="float32")
reference_mask = np.ones((1024, 1024), dtype="float32")
inpaint = keras_hub.models.StableDiffusion3Inpaint.from_preset(
    "hf://keras/stable_diffusion_3_medium", height=512, width=512
)
inpaint.generate(
    reference_image,
    reference_mask,
    "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
)

# Generate with batched prompts.
reference_images = np.ones((2, 512, 512, 3), dtype="float32")
reference_mask = np.ones((2, 1024, 1024), dtype="float32")
inpaint.generate(
    reference_images,
    reference_mask,
    ["cute wallpaper art of a cat", "cute wallpaper art of a dog"]
)

# Generate with different `num_steps`, `guidance_scale` and `strength`.
inpaint.generate(
    reference_image,
    reference_mask,
    "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
    num_steps=50,
    guidance_scale=5.0,
    strength=0.6,
)

# text to image example
text_to_image = keras_hub.models.StableDiffusion3TextToImage.from_preset(
    "hf://keras/stable_diffusion_3_medium", height=512, width=512
)
text_to_image.generate(
    "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
)

# Generate with batched prompts.
text_to_image.generate(
    ["cute wallpaper art of a cat", "cute wallpaper art of a dog"]
)

# Generate with different `num_steps` and `guidance_scale`.
text_to_image.generate(
    "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
    num_steps=50,
    guidance_scale=5.0,
)

# Generate with `negative_prompts`.
text_to_image.generate(
    {
        "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
        "negative_prompts": "green color",
    }
)
```