Divyasreepat commited on
Commit
91524eb
·
verified ·
1 Parent(s): cb52f1a

Update README.md with new model card content

Browse files
Files changed (1) hide show
  1. README.md +279 -0
README.md CHANGED
@@ -5,5 +5,284 @@ tags:
5
  pipeline_tag: text-to-image
6
  ---
7
  ### Model Overview
 
8
 
 
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  pipeline_tag: text-to-image
6
  ---
7
  ### Model Overview
8
+ [Stable Diffusion 3.5 ](https://stability.ai/learning-hub/stable-diffusion-3-5-prompt-guide) is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
9
 
10
+ For more technical details, please refer to the [Research paper](https://stability.ai/news/stable-diffusion-3-research-paper).
11
 
12
+ Please note: this model is released under the Stability Community License. For Enterprise License visit Stability.ai or [contact us](https://stability.ai/enterprise) for commercial licensing details.
13
+
14
+ ## Links
15
+
16
+ * [SD3.5 Quickstart Notebook ](https://colab.sandbox.google.com/gist/laxmareddyp/55daf77f87730c3b3f498318672f70b3/stablediffusion3_5-quckstart-notebook.ipynb)
17
+ * [SD3.5 API Documentation](https://keras.io/keras_hub/api/models/stable_diffusion_3/)
18
+ * [SD3.5 Model Card](https://huggingface.co/stabilityai/stable-diffusion-3.5-large)
19
+ * [KerasHub Beginner Guide](https://keras.io/guides/keras_hub/getting_started/)
20
+ * [KerasHub Model Publishing Guide](https://keras.io/guides/keras_hub/upload/)
21
+
22
+ ## Presets
23
+
24
+ The following model checkpoints are provided by the Keras team. Full code examples for each are available below.
25
+ | Preset name | Parameters | Description |
26
+ |----------------|------------|--------------------------------------------------|
27
+ | stable_diffusion_3.5_large| 9.05B | 9 billion parameter, including CLIP L and CLIP G text encoders, MMDiT generative model, and VAE autoencoder. Developed by Stability AI.|
28
+ | stable_diffusion_3.5_large_turbo | 9.05B | 9 billion parameter, including CLIP L and CLIP G text encoders, MMDiT generative model, and VAE autoencoder. A timestep-distilled version that eliminates classifier-free guidance and uses fewer steps for generation. Developed by Stability AI. |
29
+
30
+ ### Model Description
31
+
32
+ - **Developed by:** Stability AI
33
+ - **Model type:** MMDiT text-to-image generative model
34
+ - **Model Description:** This is a model that can be used to generate images based on text prompts. It is a [Multimodal Diffusion Transformer](https://arxiv.org/abs/2403.03206)
35
+ that uses three fixed, pretrained text encoders (OpenCLIP-ViT/G, CLIP-ViT/L and T5-xxl), and QK-normalization to improve training stability.
36
+
37
+ ## Example Usage
38
+ ```python
39
+ !pip install -U keras-hub
40
+ !pip install -U keras
41
+ ```
42
+
43
+ ```
44
+ # Pretrained Stable Diffusion 3 model.
45
+ model = keras_hub.models.StableDiffusion3Backbone.from_preset(
46
+ "stable_diffusion_3.5_large"
47
+ )
48
+
49
+ # Randomly initialized Stable Diffusion 3 model with custom config.
50
+ vae = keras_hub.models.VAEBackbone(...)
51
+ clip_l = keras_hub.models.CLIPTextEncoder(...)
52
+ clip_g = keras_hub.models.CLIPTextEncoder(...)
53
+ model = keras_hub.models.StableDiffusion3Backbone(
54
+ mmdit_patch_size=2,
55
+ mmdit_num_heads=4,
56
+ mmdit_hidden_dim=256,
57
+ mmdit_depth=4,
58
+ mmdit_position_size=192,
59
+ vae=vae,
60
+ clip_l=clip_l,
61
+ clip_g=clip_g,
62
+ )
63
+
64
+ # Image to image example
65
+ image_to_image = keras_hub.models.StableDiffusion3ImageToImage.from_preset(
66
+ "stable_diffusion_3.5_large", height=512, width=512
67
+ )
68
+ image_to_image.generate(
69
+ {
70
+ "images": np.ones((512, 512, 3), dtype="float32"),
71
+ "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
72
+ }
73
+ )
74
+
75
+ # Generate with batched prompts.
76
+ image_to_image.generate(
77
+ {
78
+ "images": np.ones((2, 512, 512, 3), dtype="float32"),
79
+ "prompts": ["cute wallpaper art of a cat", "cute wallpaper art of a dog"],
80
+ }
81
+ )
82
+
83
+ # Generate with different `num_steps`, `guidance_scale` and `strength`.
84
+ image_to_image.generate(
85
+ {
86
+ "images": np.ones((512, 512, 3), dtype="float32"),
87
+ "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
88
+ }
89
+ num_steps=50,
90
+ guidance_scale=5.0,
91
+ strength=0.6,
92
+ )
93
+
94
+ # Generate with `negative_prompts`.
95
+ text_to_image.generate(
96
+ {
97
+ "images": np.ones((512, 512, 3), dtype="float32"),
98
+ "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
99
+ "negative_prompts": "green color",
100
+ }
101
+ )
102
+
103
+ # inpainting example
104
+ reference_image = np.ones((1024, 1024, 3), dtype="float32")
105
+ reference_mask = np.ones((1024, 1024), dtype="float32")
106
+ inpaint = keras_hub.models.StableDiffusion3Inpaint.from_preset(
107
+ "stable_diffusion_3.5_large", height=512, width=512
108
+ )
109
+ inpaint.generate(
110
+ reference_image,
111
+ reference_mask,
112
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
113
+ )
114
+
115
+ # Generate with batched prompts.
116
+ reference_images = np.ones((2, 512, 512, 3), dtype="float32")
117
+ reference_mask = np.ones((2, 1024, 1024), dtype="float32")
118
+ inpaint.generate(
119
+ reference_images,
120
+ reference_mask,
121
+ ["cute wallpaper art of a cat", "cute wallpaper art of a dog"]
122
+ )
123
+
124
+ # Generate with different `num_steps`, `guidance_scale` and `strength`.
125
+ inpaint.generate(
126
+ reference_image,
127
+ reference_mask,
128
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
129
+ num_steps=50,
130
+ guidance_scale=5.0,
131
+ strength=0.6,
132
+ )
133
+
134
+ # text to image example
135
+ text_to_image = keras_hub.models.StableDiffusion3TextToImage.from_preset(
136
+ "stable_diffusion_3.5_large", height=512, width=512
137
+ )
138
+ text_to_image.generate(
139
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
140
+ )
141
+
142
+ # Generate with batched prompts.
143
+ text_to_image.generate(
144
+ ["cute wallpaper art of a cat", "cute wallpaper art of a dog"]
145
+ )
146
+
147
+ # Generate with different `num_steps` and `guidance_scale`.
148
+ text_to_image.generate(
149
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
150
+ num_steps=50,
151
+ guidance_scale=5.0,
152
+ )
153
+
154
+ # Generate with `negative_prompts`.
155
+ text_to_image.generate(
156
+ {
157
+ "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
158
+ "negative_prompts": "green color",
159
+ }
160
+ )
161
+ ```
162
+
163
+ ## Example Usage with Hugging Face URI
164
+
165
+ ```python
166
+ !pip install -U keras-hub
167
+ !pip install -U keras
168
+ ```
169
+
170
+ ```
171
+ # Pretrained Stable Diffusion 3 model.
172
+ model = keras_hub.models.StableDiffusion3Backbone.from_preset(
173
+ "hf://keras/stable_diffusion_3.5_large"
174
+ )
175
+
176
+ # Randomly initialized Stable Diffusion 3 model with custom config.
177
+ vae = keras_hub.models.VAEBackbone(...)
178
+ clip_l = keras_hub.models.CLIPTextEncoder(...)
179
+ clip_g = keras_hub.models.CLIPTextEncoder(...)
180
+ model = keras_hub.models.StableDiffusion3Backbone(
181
+ mmdit_patch_size=2,
182
+ mmdit_num_heads=4,
183
+ mmdit_hidden_dim=256,
184
+ mmdit_depth=4,
185
+ mmdit_position_size=192,
186
+ vae=vae,
187
+ clip_l=clip_l,
188
+ clip_g=clip_g,
189
+ )
190
+
191
+ # Image to image example
192
+ image_to_image = keras_hub.models.StableDiffusion3ImageToImage.from_preset(
193
+ "hf://keras/stable_diffusion_3.5_large", height=512, width=512
194
+ )
195
+ image_to_image.generate(
196
+ {
197
+ "images": np.ones((512, 512, 3), dtype="float32"),
198
+ "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
199
+ }
200
+ )
201
+
202
+ # Generate with batched prompts.
203
+ image_to_image.generate(
204
+ {
205
+ "images": np.ones((2, 512, 512, 3), dtype="float32"),
206
+ "prompts": ["cute wallpaper art of a cat", "cute wallpaper art of a dog"],
207
+ }
208
+ )
209
+
210
+ # Generate with different `num_steps`, `guidance_scale` and `strength`.
211
+ image_to_image.generate(
212
+ {
213
+ "images": np.ones((512, 512, 3), dtype="float32"),
214
+ "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
215
+ }
216
+ num_steps=50,
217
+ guidance_scale=5.0,
218
+ strength=0.6,
219
+ )
220
+
221
+ # Generate with `negative_prompts`.
222
+ text_to_image.generate(
223
+ {
224
+ "images": np.ones((512, 512, 3), dtype="float32"),
225
+ "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
226
+ "negative_prompts": "green color",
227
+ }
228
+ )
229
+
230
+ # inpainting example
231
+ reference_image = np.ones((1024, 1024, 3), dtype="float32")
232
+ reference_mask = np.ones((1024, 1024), dtype="float32")
233
+ inpaint = keras_hub.models.StableDiffusion3Inpaint.from_preset(
234
+ "hf://keras/stable_diffusion_3.5_large", height=512, width=512
235
+ )
236
+ inpaint.generate(
237
+ reference_image,
238
+ reference_mask,
239
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
240
+ )
241
+
242
+ # Generate with batched prompts.
243
+ reference_images = np.ones((2, 512, 512, 3), dtype="float32")
244
+ reference_mask = np.ones((2, 1024, 1024), dtype="float32")
245
+ inpaint.generate(
246
+ reference_images,
247
+ reference_mask,
248
+ ["cute wallpaper art of a cat", "cute wallpaper art of a dog"]
249
+ )
250
+
251
+ # Generate with different `num_steps`, `guidance_scale` and `strength`.
252
+ inpaint.generate(
253
+ reference_image,
254
+ reference_mask,
255
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
256
+ num_steps=50,
257
+ guidance_scale=5.0,
258
+ strength=0.6,
259
+ )
260
+
261
+ # text to image example
262
+ text_to_image = keras_hub.models.StableDiffusion3TextToImage.from_preset(
263
+ "hf://keras/stable_diffusion_3.5_large", height=512, width=512
264
+ )
265
+ text_to_image.generate(
266
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
267
+ )
268
+
269
+ # Generate with batched prompts.
270
+ text_to_image.generate(
271
+ ["cute wallpaper art of a cat", "cute wallpaper art of a dog"]
272
+ )
273
+
274
+ # Generate with different `num_steps` and `guidance_scale`.
275
+ text_to_image.generate(
276
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
277
+ num_steps=50,
278
+ guidance_scale=5.0,
279
+ )
280
+
281
+ # Generate with `negative_prompts`.
282
+ text_to_image.generate(
283
+ {
284
+ "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
285
+ "negative_prompts": "green color",
286
+ }
287
+ )
288
+ ```