redfoo philschmid commited on
Commit
5152ffc
·
0 Parent(s):

Duplicate from philschmid/stable-diffusion-2-inpainting-endpoint

Browse files

Co-authored-by: Philipp Schmid <[email protected]>

.gitattributes ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tflite filter=lfs diff=lfs merge=lfs -text
29
+ *.tgz filter=lfs diff=lfs merge=lfs -text
30
+ *.wasm filter=lfs diff=lfs merge=lfs -text
31
+ *.xz filter=lfs diff=lfs merge=lfs -text
32
+ *.zip filter=lfs diff=lfs merge=lfs -text
33
+ *.zst filter=lfs diff=lfs merge=lfs -text
34
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: openrail++
3
+ tags:
4
+ - stable-diffusion
5
+ - stable-diffusion-diffusers
6
+ - text-guided-to-image-inpainting
7
+ - endpoints-template
8
+ thumbnail: >-
9
+ https://huggingface.co/philschmid/stable-diffusion-2-inpainting-endpoint/resolve/main/Stable%20Diffusion%20Inference%20endpoints%20-%20inpainting.png
10
+ inference: true
11
+ duplicated_from: philschmid/stable-diffusion-2-inpainting-endpoint
12
+ ---
13
+
14
+
15
+ # Fork of [stabilityai/stable-diffusion-2-inpainting](https://huggingface.co/stabilityai/stable-diffusion-2-inpainting)
16
+
17
+ > Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input.
18
+ > For more information about how Stable Diffusion functions, please have a look at [🤗's Stable Diffusion with 🧨Diffusers blog](https://huggingface.co/blog/stable_diffusion).
19
+
20
+ For more information about the model, license and limitations check the original model card at [stabilityai/stable-diffusion-2-inpainting](https://huggingface.co/stabilityai/stable-diffusion-2-inpainting).
21
+
22
+ ---
23
+
24
+ This repository implements a custom `handler` task for `text-guided-to-image-inpainting` for 🤗 Inference Endpoints. The code for the customized pipeline is in the [handler.py](https://huggingface.co/philschmid/stable-diffusion-2-inpainting-endpoint/blob/main/handler.py).
25
+
26
+ There is also a [notebook](https://huggingface.co/philschmid/stable-diffusion-2-inpainting-endpoint/blob/main/create_handler.ipynb) included, on how to create the `handler.py`
27
+
28
+ ![thubmnail](Stable%20Diffusion%20Inference%20endpoints%20-%20inpainting.png)
29
+
30
+
31
+ ### expected Request payload
32
+
33
+ ```json
34
+ {
35
+ "inputs": "A prompt used for image generation",
36
+ "image" : "iVBORw0KGgoAAAANSUhEUgAAAgAAAAIACAIAAAB7GkOtAAAABGdBTUEAALGPC",
37
+ "mask_image": "iVBORw0KGgoAAAANSUhEUgAAAgAAAAIACAIAAAB7GkOtAAAABGdBTUEAALGPC",
38
+ }
39
+ ```
40
+
41
+ below is an example on how to run a request using Python and `requests`.
42
+
43
+ ## Run Request
44
+ ```python
45
+ import json
46
+ from typing import List
47
+ import requests as r
48
+ import base64
49
+ from PIL import Image
50
+ from io import BytesIO
51
+
52
+ ENDPOINT_URL = ""
53
+ HF_TOKEN = ""
54
+
55
+ # helper image utils
56
+ def encode_image(image_path):
57
+ with open(image_path, "rb") as i:
58
+ b64 = base64.b64encode(i.read())
59
+ return b64.decode("utf-8")
60
+
61
+
62
+ def predict(prompt, image, mask_image):
63
+ image = encode_image(image)
64
+ mask_image = encode_image(mask_image)
65
+
66
+ # prepare sample payload
67
+ request = {"inputs": prompt, "image": image, "mask_image": mask_image}
68
+ # headers
69
+ headers = {
70
+ "Authorization": f"Bearer {HF_TOKEN}",
71
+ "Content-Type": "application/json",
72
+ "Accept": "image/png" # important to get an image back
73
+ }
74
+
75
+ response = r.post(ENDPOINT_URL, headers=headers, json=payload)
76
+ img = Image.open(BytesIO(response.content))
77
+ return img
78
+
79
+ prediction = predict(
80
+ prompt="Face of a bengal cat, high resolution, sitting on a park bench",
81
+ image="dog.png",
82
+ mask_image="mask_dog.png"
83
+ )
84
+ ```
85
+ expected output
86
+
87
+ ![sample](result.png)
Stable Diffusion Inference endpoints - inpainting.png ADDED
create_handler.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
dog.png ADDED
handler.py ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Dict, List, Any
2
+ import torch
3
+ from diffusers import DPMSolverMultistepScheduler, StableDiffusionInpaintPipeline
4
+ from PIL import Image
5
+ import base64
6
+ from io import BytesIO
7
+
8
+
9
+ # set device
10
+ device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
11
+
12
+ if device.type != 'cuda':
13
+ raise ValueError("need to run on GPU")
14
+
15
+ class EndpointHandler():
16
+ def __init__(self, path=""):
17
+ # load StableDiffusionInpaintPipeline pipeline
18
+ self.pipe = StableDiffusionInpaintPipeline.from_pretrained(path, torch_dtype=torch.float16)
19
+ # use DPMSolverMultistepScheduler
20
+ self.pipe.scheduler = DPMSolverMultistepScheduler.from_config(self.pipe.scheduler.config)
21
+ # move to device
22
+ self.pipe = self.pipe.to(device)
23
+
24
+
25
+ def __call__(self, data: Any) -> List[List[Dict[str, float]]]:
26
+ """
27
+ :param data: A dictionary contains `inputs` and optional `image` field.
28
+ :return: A dictionary with `image` field contains image in base64.
29
+ """
30
+ inputs = data.pop("inputs", data)
31
+ encoded_image = data.pop("image", None)
32
+ encoded_mask_image = data.pop("mask_image", None)
33
+
34
+ # hyperparamters
35
+ num_inference_steps = data.pop("num_inference_steps", 25)
36
+ guidance_scale = data.pop("guidance_scale", 7.5)
37
+ negative_prompt = data.pop("negative_prompt", None)
38
+ height = data.pop("height", None)
39
+ width = data.pop("width", None)
40
+
41
+ # process image
42
+ if encoded_image is not None and encoded_mask_image is not None:
43
+ image = self.decode_base64_image(encoded_image)
44
+ mask_image = self.decode_base64_image(encoded_mask_image)
45
+ else:
46
+ image = None
47
+ mask_image = None
48
+
49
+ # run inference pipeline
50
+ out = self.pipe(inputs,
51
+ image=image,
52
+ mask_image=mask_image,
53
+ num_inference_steps=num_inference_steps,
54
+ guidance_scale=guidance_scale,
55
+ num_images_per_prompt=1,
56
+ negative_prompt=negative_prompt,
57
+ height=height,
58
+ width=width
59
+ )
60
+
61
+ # return first generate PIL image
62
+ return out.images[0]
63
+
64
+ # helper to decode input image
65
+ def decode_base64_image(self, image_string):
66
+ base64_image = base64.b64decode(image_string)
67
+ buffer = BytesIO(base64_image)
68
+ image = Image.open(buffer)
69
+ return image
mask_dog.png ADDED
model_index.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "StableDiffusionInpaintPipeline",
3
+ "_diffusers_version": "0.10.2",
4
+ "feature_extractor": [
5
+ null,
6
+ null
7
+ ],
8
+ "requires_safety_checker": false,
9
+ "safety_checker": [
10
+ null,
11
+ null
12
+ ],
13
+ "scheduler": [
14
+ "diffusers",
15
+ "PNDMScheduler"
16
+ ],
17
+ "text_encoder": [
18
+ "transformers",
19
+ "CLIPTextModel"
20
+ ],
21
+ "tokenizer": [
22
+ "transformers",
23
+ "CLIPTokenizer"
24
+ ],
25
+ "unet": [
26
+ "diffusers",
27
+ "UNet2DConditionModel"
28
+ ],
29
+ "vae": [
30
+ "diffusers",
31
+ "AutoencoderKL"
32
+ ]
33
+ }
requirements.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ diffusers==0.10.2
result.png ADDED
scheduler/scheduler_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "PNDMScheduler",
3
+ "_diffusers_version": "0.10.2",
4
+ "beta_end": 0.012,
5
+ "beta_schedule": "scaled_linear",
6
+ "beta_start": 0.00085,
7
+ "clip_sample": false,
8
+ "num_train_timesteps": 1000,
9
+ "prediction_type": "epsilon",
10
+ "set_alpha_to_one": false,
11
+ "skip_prk_steps": true,
12
+ "steps_offset": 1,
13
+ "trained_betas": null
14
+ }
text_encoder/config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/home/ubuntu/.cache/huggingface/diffusers/models--stabilityai--stable-diffusion-2-inpainting/snapshots/76b00d76134aca7fc5e7137a469498627ad6b4bf/text_encoder",
3
+ "architectures": [
4
+ "CLIPTextModel"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 0,
8
+ "dropout": 0.0,
9
+ "eos_token_id": 2,
10
+ "hidden_act": "gelu",
11
+ "hidden_size": 1024,
12
+ "initializer_factor": 1.0,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 4096,
15
+ "layer_norm_eps": 1e-05,
16
+ "max_position_embeddings": 77,
17
+ "model_type": "clip_text_model",
18
+ "num_attention_heads": 16,
19
+ "num_hidden_layers": 23,
20
+ "pad_token_id": 1,
21
+ "projection_dim": 512,
22
+ "torch_dtype": "float16",
23
+ "transformers_version": "4.24.0",
24
+ "vocab_size": 49408
25
+ }
text_encoder/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1f1fce5bf3a7d2f31cddebc1f67ec9b34c1786c5b5804fc9513a4231e8d1bf10
3
+ size 680896215
tokenizer/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer/special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|startoftext|>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|endoftext|>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "!",
17
+ "unk_token": {
18
+ "content": "<|endoftext|>",
19
+ "lstrip": false,
20
+ "normalized": true,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
tokenizer/tokenizer_config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "bos_token": {
4
+ "__type": "AddedToken",
5
+ "content": "<|startoftext|>",
6
+ "lstrip": false,
7
+ "normalized": true,
8
+ "rstrip": false,
9
+ "single_word": false
10
+ },
11
+ "do_lower_case": true,
12
+ "eos_token": {
13
+ "__type": "AddedToken",
14
+ "content": "<|endoftext|>",
15
+ "lstrip": false,
16
+ "normalized": true,
17
+ "rstrip": false,
18
+ "single_word": false
19
+ },
20
+ "errors": "replace",
21
+ "model_max_length": 77,
22
+ "name_or_path": "/home/ubuntu/.cache/huggingface/diffusers/models--stabilityai--stable-diffusion-2-inpainting/snapshots/76b00d76134aca7fc5e7137a469498627ad6b4bf/tokenizer",
23
+ "pad_token": "<|endoftext|>",
24
+ "special_tokens_map_file": "./special_tokens_map.json",
25
+ "tokenizer_class": "CLIPTokenizer",
26
+ "unk_token": {
27
+ "__type": "AddedToken",
28
+ "content": "<|endoftext|>",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false
33
+ }
34
+ }
tokenizer/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
unet/config.json ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "UNet2DConditionModel",
3
+ "_diffusers_version": "0.10.2",
4
+ "_name_or_path": "/home/ubuntu/.cache/huggingface/diffusers/models--stabilityai--stable-diffusion-2-inpainting/snapshots/76b00d76134aca7fc5e7137a469498627ad6b4bf/unet",
5
+ "act_fn": "silu",
6
+ "attention_head_dim": [
7
+ 5,
8
+ 10,
9
+ 20,
10
+ 20
11
+ ],
12
+ "block_out_channels": [
13
+ 320,
14
+ 640,
15
+ 1280,
16
+ 1280
17
+ ],
18
+ "center_input_sample": false,
19
+ "cross_attention_dim": 1024,
20
+ "down_block_types": [
21
+ "CrossAttnDownBlock2D",
22
+ "CrossAttnDownBlock2D",
23
+ "CrossAttnDownBlock2D",
24
+ "DownBlock2D"
25
+ ],
26
+ "downsample_padding": 1,
27
+ "dual_cross_attention": false,
28
+ "flip_sin_to_cos": true,
29
+ "freq_shift": 0,
30
+ "in_channels": 9,
31
+ "layers_per_block": 2,
32
+ "mid_block_scale_factor": 1,
33
+ "norm_eps": 1e-05,
34
+ "norm_num_groups": 32,
35
+ "num_class_embeds": null,
36
+ "only_cross_attention": false,
37
+ "out_channels": 4,
38
+ "sample_size": 64,
39
+ "up_block_types": [
40
+ "UpBlock2D",
41
+ "CrossAttnUpBlock2D",
42
+ "CrossAttnUpBlock2D",
43
+ "CrossAttnUpBlock2D"
44
+ ],
45
+ "upcast_attention": false,
46
+ "use_linear_projection": true
47
+ }
unet/diffusion_pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:20722649ed12ff926183129d7f2f7957388d03edc79888be6da8e3346ef9e873
3
+ size 1732120805
vae/config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "AutoencoderKL",
3
+ "_diffusers_version": "0.10.2",
4
+ "_name_or_path": "/home/ubuntu/.cache/huggingface/diffusers/models--stabilityai--stable-diffusion-2-inpainting/snapshots/76b00d76134aca7fc5e7137a469498627ad6b4bf/vae",
5
+ "act_fn": "silu",
6
+ "block_out_channels": [
7
+ 128,
8
+ 256,
9
+ 512,
10
+ 512
11
+ ],
12
+ "down_block_types": [
13
+ "DownEncoderBlock2D",
14
+ "DownEncoderBlock2D",
15
+ "DownEncoderBlock2D",
16
+ "DownEncoderBlock2D"
17
+ ],
18
+ "in_channels": 3,
19
+ "latent_channels": 4,
20
+ "layers_per_block": 2,
21
+ "norm_num_groups": 32,
22
+ "out_channels": 3,
23
+ "sample_size": 512,
24
+ "up_block_types": [
25
+ "UpDecoderBlock2D",
26
+ "UpDecoderBlock2D",
27
+ "UpDecoderBlock2D",
28
+ "UpDecoderBlock2D"
29
+ ]
30
+ }
vae/diffusion_pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3b0f1843b01fbf820827e257eec40db099cde863f07314ae6ab641298e11fc98
3
+ size 167399505