File size: 3,201 Bytes
ffd555e
 
 
 
 
2abba72
 
 
 
 
ffd555e
cc9d247
 
6c0dae3
 
 
4d8f187
 
 
 
 
 
cd8f3f0
8ba54e5
cd8f3f0
8ba54e5
4d8f187
 
 
 
 
 
7e1a73f
 
8ba54e5
7e1a73f
 
8ba54e5
ffd555e
7e1a73f
 
 
8ba54e5
7e1a73f
8ba54e5
 
7e1a73f
8ba54e5
ffd555e
 
 
8ba54e5
ffd555e
8ba54e5
 
ffd555e
 
 
8ba54e5
ffd555e
 
8ba54e5
ffd555e
7e1a73f
ffd555e
8ba54e5
7e1a73f
 
ffd555e
8ba54e5
7e1a73f
8ba54e5
 
2abba72
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
---
base_model:
- stabilityai/stable-diffusion-2-inpainting
- stabilityai/stable-diffusion-2-1
pipeline_tag: image-to-image
library_name: diffusers
tags:
- inpaint
- colorization
- stable-diffusion
---
# **Example Outputs**

| **Step**                        | **Grayscale Image (Masked)**       | **Restored Grayscale Image**        | **Fully Restored RGB Image**        |
|----------------------------------|------------------------------------|--------------------------------------|-------------------------------------|
| **Image**                        | ![image_gray_masked](gray-masked.png) | ![image_gray_restored](gray-inpaint-example.png) | ![image_restored](gray-to-rgb-example.png) |
---

# **Stable Diffusion 2-Based Gray-Inpainting to RGB**

This model pipeline demonstrates an advanced workflow for restoring grayscale images, performing inpainting, and converting them to RGB. The pipeline leverages two models based on the Stable Diffusion 2 architecture:

1. **Gray-Inpainting Model**: Fills missing regions of a grayscale image using a masked inpainting process based on an autoencoder (AE) instead of a variational autoencoder (VAE).

2. **Gray-to-RGB Conversion Model**: Converts the grayscale image (or inpainted output) into a full-color RGB image by adding a residual path in the AE. internel unet directly predicts difference between gray and color image's latent


---

## **Code Example**

```python
import torch
import numpy as np

from PIL import Image
from diffusers.utils import load_image
from transformers import AutoConfig, AutoModel, ModelCard

img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"

image_gray = load_image(img_url).resize((512, 512)).convert('L').convert('RGB') # image must be 3 channel
mask_image = load_image(mask_url).resize((512, 512))
mask = (np.array(mask_image)>128)*1
image_gray_masked = Image.fromarray(((1-mask) * np.array(image_gray)).astype(np.uint8))

# Load the gray-inpaint model
gray_inpaintor = AutoModel.from_pretrained(
    'jwengr/stable-diffusion-2-gray-inpaint-to-rgb', 
    subfolder='gray-inpaint', 
    trust_remote_code=True, 
)

Load the gray2rgb model
gray2rgb = AutoModel.from_pretrained(
    'jwengr/stable-diffusion-2-gray-inpaint-to-rgb', 
    subfolder='gray2rgb', 
    trust_remote_code=True, 
)

Move models to GPU
gray_inpaintor.to('cuda')
gray2rgb.to('cuda')

# Enable memory-efficient attention
# gray2rgb.unet.enable_xformers_memory_efficient_attention()
# gray_inpaintor.unet.enable_xformers_memory_efficient_attention()

with torch.autocast('cuda',dtype=torch.bfloat16):
    with torch.no_grad():
        # each model's input image should be one of PIL.Image, List[PIL.Image], preprocessed tensor (B,3,H,W). Image must be 3 channel
        image_gray_restored = gray_inpaintor(image_gray_masked, num_inference_steps=250, seed=10)[0].convert('L') # you can pass 'mask' arg explictly. mask : Tensor (B,1,512,512)
        image_restored = gray2rgb(image_gray_restored.convert('RGB'))