Divyasreepat commited on
Commit
dc60d43
·
verified ·
1 Parent(s): 78109f4

Update README.md with new model card content

Browse files
Files changed (1) hide show
  1. README.md +269 -21
README.md CHANGED
@@ -1,24 +1,272 @@
1
  ---
2
  library_name: keras-hub
3
  ---
4
- This is a [`StableDiffusion3` model](https://keras.io/api/keras_hub/models/stable_diffusion3) uploaded using the KerasHub library and can be used with JAX, TensorFlow, and PyTorch backends.
5
- Model config:
6
- * **name:** stable_diffusion_3_backbone
7
- * **trainable:** True
8
- * **mmdit_patch_size:** 2
9
- * **mmdit_hidden_dim:** 1536
10
- * **mmdit_num_layers:** 24
11
- * **mmdit_num_heads:** 24
12
- * **mmdit_position_size:** 192
13
- * **vae:** {'module': 'keras_hub.src.models.vae.vae_backbone', 'class_name': 'VAEBackbone', 'config': {'name': 'vae', 'trainable': True, 'encoder_num_filters': [128, 256, 512, 512], 'encoder_num_blocks': [2, 2, 2, 2], 'decoder_num_filters': [512, 512, 256, 128], 'decoder_num_blocks': [3, 3, 3, 3], 'sampler_method': 'sample', 'input_channels': 3, 'sample_channels': 32, 'output_channels': 3, 'scale': 1.5305, 'shift': 0.0609}, 'registered_name': 'VAEBackbone'}
14
- * **clip_l:** {'module': 'keras_hub.src.models.clip.clip_text_encoder', 'class_name': 'CLIPTextEncoder', 'config': {'name': 'clip_l', 'trainable': True, 'vocabulary_size': 49408, 'embedding_dim': 768, 'hidden_dim': 768, 'num_layers': 12, 'num_heads': 12, 'intermediate_dim': 3072, 'intermediate_activation': 'quick_gelu', 'intermediate_output_index': 10, 'max_sequence_length': 77}, 'registered_name': 'CLIPTextEncoder'}
15
- * **clip_g:** {'module': 'keras_hub.src.models.clip.clip_text_encoder', 'class_name': 'CLIPTextEncoder', 'config': {'name': 'clip_g', 'trainable': True, 'vocabulary_size': 49408, 'embedding_dim': 1280, 'hidden_dim': 1280, 'num_layers': 32, 'num_heads': 20, 'intermediate_dim': 5120, 'intermediate_activation': 'gelu', 'intermediate_output_index': 30, 'max_sequence_length': 77}, 'registered_name': 'CLIPTextEncoder'}
16
- * **t5:** None
17
- * **latent_channels:** 16
18
- * **output_channels:** 3
19
- * **num_train_timesteps:** 1000
20
- * **shift:** 3.0
21
- * **height:** 1024
22
- * **width:** 1024
23
-
24
- This model card has been generated automatically and should be completed by the model author. See [Model Cards documentation](https://huggingface.co/docs/hub/model-cards) for more information.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  library_name: keras-hub
3
  ---
4
+ ### Model Overview
5
+ # Stable Diffusion 3 Medium
6
+ ![demo](https://huggingface.co/stabilityai/stable-diffusion-3-medium/resolve/main/sd3demo.jpg)
7
+
8
+ ## Model
9
+
10
+ ![mmdit](https://huggingface.co/stabilityai/stable-diffusion-3-medium/resolve/main/mmdit.png)
11
+
12
+ [Stable Diffusion 3 Medium](https://stability.ai/news/stable-diffusion-3-medium) is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
13
+
14
+ For more technical details, please refer to the [Research paper](https://stability.ai/news/stable-diffusion-3-research-paper).
15
+
16
+ Please note: this model is released under the Stability Community License. For Enterprise License visit Stability.ai or [contact us](https://stability.ai/enterprise) for commercial licensing details.
17
+
18
+
19
+
20
+ ### Model Description
21
+
22
+ - **Developed by:** Stability AI
23
+ - **Model type:** MMDiT text-to-image generative model
24
+ - **Model Description:** This is a model that can be used to generate images based on text prompts. It is a Multimodal Diffusion Transformer
25
+ (https://arxiv.org/abs/2403.03206) that uses three fixed, pretrained text encoders
26
+ ([OpenCLIP-ViT/G](https://github.com/mlfoundations/open_clip), [CLIP-ViT/L](https://github.com/openai/CLIP/tree/main) and [T5-xxl](https://huggingface.co/google/t5-v1_1-xxl))
27
+
28
+ ### Model card
29
+ https://huggingface.co/stabilityai/stable-diffusion-3-medium
30
+
31
+ ### Example Usage
32
+ ```python
33
+ # Pretrained Stable Diffusion 3 model.
34
+ model = keras_hub.models.StableDiffusion3Backbone.from_preset(
35
+ "stable_diffusion_3_medium"
36
+ )
37
+
38
+ # Randomly initialized Stable Diffusion 3 model with custom config.
39
+ vae = keras_hub.models.VAEBackbone(...)
40
+ clip_l = keras_hub.models.CLIPTextEncoder(...)
41
+ clip_g = keras_hub.models.CLIPTextEncoder(...)
42
+ model = keras_hub.models.StableDiffusion3Backbone(
43
+ mmdit_patch_size=2,
44
+ mmdit_num_heads=4,
45
+ mmdit_hidden_dim=256,
46
+ mmdit_depth=4,
47
+ mmdit_position_size=192,
48
+ vae=vae,
49
+ clip_l=clip_l,
50
+ clip_g=clip_g,
51
+ )
52
+
53
+ # Image to image example
54
+ image_to_image = keras_hub.models.StableDiffusion3ImageToImage.from_preset(
55
+ "stable_diffusion_3_medium", height=512, width=512
56
+ )
57
+ image_to_image.generate(
58
+ {
59
+ "images": np.ones((512, 512, 3), dtype="float32"),
60
+ "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
61
+ }
62
+ )
63
+
64
+ # Generate with batched prompts.
65
+ image_to_image.generate(
66
+ {
67
+ "images": np.ones((2, 512, 512, 3), dtype="float32"),
68
+ "prompts": ["cute wallpaper art of a cat", "cute wallpaper art of a dog"],
69
+ }
70
+ )
71
+
72
+ # Generate with different `num_steps`, `guidance_scale` and `strength`.
73
+ image_to_image.generate(
74
+ {
75
+ "images": np.ones((512, 512, 3), dtype="float32"),
76
+ "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
77
+ }
78
+ num_steps=50,
79
+ guidance_scale=5.0,
80
+ strength=0.6,
81
+ )
82
+
83
+ # Generate with `negative_prompts`.
84
+ text_to_image.generate(
85
+ {
86
+ "images": np.ones((512, 512, 3), dtype="float32"),
87
+ "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
88
+ "negative_prompts": "green color",
89
+ }
90
+ )
91
+
92
+ # inpainting example
93
+ reference_image = np.ones((1024, 1024, 3), dtype="float32")
94
+ reference_mask = np.ones((1024, 1024), dtype="float32")
95
+ inpaint = keras_hub.models.StableDiffusion3Inpaint.from_preset(
96
+ "stable_diffusion_3_medium", height=512, width=512
97
+ )
98
+ inpaint.generate(
99
+ reference_image,
100
+ reference_mask,
101
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
102
+ )
103
+
104
+ # Generate with batched prompts.
105
+ reference_images = np.ones((2, 512, 512, 3), dtype="float32")
106
+ reference_mask = np.ones((2, 1024, 1024), dtype="float32")
107
+ inpaint.generate(
108
+ reference_images,
109
+ reference_mask,
110
+ ["cute wallpaper art of a cat", "cute wallpaper art of a dog"]
111
+ )
112
+
113
+ # Generate with different `num_steps`, `guidance_scale` and `strength`.
114
+ inpaint.generate(
115
+ reference_image,
116
+ reference_mask,
117
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
118
+ num_steps=50,
119
+ guidance_scale=5.0,
120
+ strength=0.6,
121
+ )
122
+
123
+ # text to image example
124
+ text_to_image = keras_hub.models.StableDiffusion3TextToImage.from_preset(
125
+ "stable_diffusion_3_medium", height=512, width=512
126
+ )
127
+ text_to_image.generate(
128
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
129
+ )
130
+
131
+ # Generate with batched prompts.
132
+ text_to_image.generate(
133
+ ["cute wallpaper art of a cat", "cute wallpaper art of a dog"]
134
+ )
135
+
136
+ # Generate with different `num_steps` and `guidance_scale`.
137
+ text_to_image.generate(
138
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
139
+ num_steps=50,
140
+ guidance_scale=5.0,
141
+ )
142
+
143
+ # Generate with `negative_prompts`.
144
+ text_to_image.generate(
145
+ {
146
+ "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
147
+ "negative_prompts": "green color",
148
+ }
149
+ )
150
+ ```
151
+
152
+ ## Example Usage with Hugging Face URI
153
+
154
+ ```python
155
+ # Pretrained Stable Diffusion 3 model.
156
+ model = keras_hub.models.StableDiffusion3Backbone.from_preset(
157
+ "hf://keras/stable_diffusion_3_medium"
158
+ )
159
+
160
+ # Randomly initialized Stable Diffusion 3 model with custom config.
161
+ vae = keras_hub.models.VAEBackbone(...)
162
+ clip_l = keras_hub.models.CLIPTextEncoder(...)
163
+ clip_g = keras_hub.models.CLIPTextEncoder(...)
164
+ model = keras_hub.models.StableDiffusion3Backbone(
165
+ mmdit_patch_size=2,
166
+ mmdit_num_heads=4,
167
+ mmdit_hidden_dim=256,
168
+ mmdit_depth=4,
169
+ mmdit_position_size=192,
170
+ vae=vae,
171
+ clip_l=clip_l,
172
+ clip_g=clip_g,
173
+ )
174
+
175
+ # Image to image example
176
+ image_to_image = keras_hub.models.StableDiffusion3ImageToImage.from_preset(
177
+ "hf://keras/stable_diffusion_3_medium", height=512, width=512
178
+ )
179
+ image_to_image.generate(
180
+ {
181
+ "images": np.ones((512, 512, 3), dtype="float32"),
182
+ "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
183
+ }
184
+ )
185
+
186
+ # Generate with batched prompts.
187
+ image_to_image.generate(
188
+ {
189
+ "images": np.ones((2, 512, 512, 3), dtype="float32"),
190
+ "prompts": ["cute wallpaper art of a cat", "cute wallpaper art of a dog"],
191
+ }
192
+ )
193
+
194
+ # Generate with different `num_steps`, `guidance_scale` and `strength`.
195
+ image_to_image.generate(
196
+ {
197
+ "images": np.ones((512, 512, 3), dtype="float32"),
198
+ "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
199
+ }
200
+ num_steps=50,
201
+ guidance_scale=5.0,
202
+ strength=0.6,
203
+ )
204
+
205
+ # Generate with `negative_prompts`.
206
+ text_to_image.generate(
207
+ {
208
+ "images": np.ones((512, 512, 3), dtype="float32"),
209
+ "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
210
+ "negative_prompts": "green color",
211
+ }
212
+ )
213
+
214
+ # inpainting example
215
+ reference_image = np.ones((1024, 1024, 3), dtype="float32")
216
+ reference_mask = np.ones((1024, 1024), dtype="float32")
217
+ inpaint = keras_hub.models.StableDiffusion3Inpaint.from_preset(
218
+ "hf://keras/stable_diffusion_3_medium", height=512, width=512
219
+ )
220
+ inpaint.generate(
221
+ reference_image,
222
+ reference_mask,
223
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
224
+ )
225
+
226
+ # Generate with batched prompts.
227
+ reference_images = np.ones((2, 512, 512, 3), dtype="float32")
228
+ reference_mask = np.ones((2, 1024, 1024), dtype="float32")
229
+ inpaint.generate(
230
+ reference_images,
231
+ reference_mask,
232
+ ["cute wallpaper art of a cat", "cute wallpaper art of a dog"]
233
+ )
234
+
235
+ # Generate with different `num_steps`, `guidance_scale` and `strength`.
236
+ inpaint.generate(
237
+ reference_image,
238
+ reference_mask,
239
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
240
+ num_steps=50,
241
+ guidance_scale=5.0,
242
+ strength=0.6,
243
+ )
244
+
245
+ # text to image example
246
+ text_to_image = keras_hub.models.StableDiffusion3TextToImage.from_preset(
247
+ "hf://keras/stable_diffusion_3_medium", height=512, width=512
248
+ )
249
+ text_to_image.generate(
250
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
251
+ )
252
+
253
+ # Generate with batched prompts.
254
+ text_to_image.generate(
255
+ ["cute wallpaper art of a cat", "cute wallpaper art of a dog"]
256
+ )
257
+
258
+ # Generate with different `num_steps` and `guidance_scale`.
259
+ text_to_image.generate(
260
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
261
+ num_steps=50,
262
+ guidance_scale=5.0,
263
+ )
264
+
265
+ # Generate with `negative_prompts`.
266
+ text_to_image.generate(
267
+ {
268
+ "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
269
+ "negative_prompts": "green color",
270
+ }
271
+ )
272
+ ```