File size: 3,101 Bytes
ffd555e 2abba72 ffd555e cc9d247 6c0dae3 4d8f187 37ab5a2 8ba54e5 cd8f3f0 8ba54e5 4d8f187 7e1a73f 8ba54e5 7e1a73f 8ba54e5 ffd555e 7e1a73f 8ba54e5 7e1a73f 8ba54e5 7e1a73f 8ba54e5 ffd555e 8ba54e5 ffd555e 8ba54e5 f06688a ffd555e 8ba54e5 ffd555e f06688a ffd555e 7e1a73f ffd555e 8ba54e5 7e1a73f ffd555e 8ba54e5 7e1a73f 8ba54e5 37ab5a2 2abba72 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
---
base_model:
- stabilityai/stable-diffusion-2-inpainting
- stabilityai/stable-diffusion-2-1
pipeline_tag: image-to-image
library_name: diffusers
tags:
- inpaint
- colorization
- stable-diffusion
---
# **Example Outputs**
| **Step** | **Grayscale Image (Masked)** | **Restored Grayscale Image** | **Fully Restored RGB Image** |
|----------------------------------|------------------------------------|--------------------------------------|-------------------------------------|
| **Image** | ![image_gray_masked](gray-masked.png) | ![image_gray_restored](gray-inpaint-example.png) | ![image_restored](gray-to-rgb-example.png) |
---
# **Stable Diffusion 2-Based Gray-Inpainting to RGB**
1. **Gray-Inpainting Model**: Fills missing regions of a grayscale image using a masked inpainting diffusion process based on an autoencoder (AE) instead of a variational autoencoder (VAE). It Contains mask dectector to enable restoration without mask information(or you can pass explicitly)
2. **Gray-to-RGB Conversion Model**: Converts the grayscale image (or inpainted output) into a full-color RGB image by adding a residual path in the AE. internel unet directly predicts difference between gray and color image's latent
---
## **Code Example**
```python
import torch
import numpy as np
from PIL import Image
from diffusers.utils import load_image
from transformers import AutoConfig, AutoModel, ModelCard
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
image_gray = load_image(img_url).resize((512, 512)).convert('L').convert('RGB') # image must be 3 channel
mask_image = load_image(mask_url).resize((512, 512))
mask = (np.array(mask_image)>128)*1
image_gray_masked = Image.fromarray(((1-mask) * np.array(image_gray)).astype(np.uint8))
# Load the gray-inpaint model
gray_inpaintor = AutoModel.from_pretrained(
'jwengr/stable-diffusion-2-gray-inpaint-to-rgb',
subfolder='gray-inpaint',
trust_remote_code=True,
)
# Load the gray2rgb model
gray2rgb = AutoModel.from_pretrained(
'jwengr/stable-diffusion-2-gray-inpaint-to-rgb',
subfolder='gray2rgb',
trust_remote_code=True,
)
# Move models to GPU
gray_inpaintor.to('cuda')
gray2rgb.to('cuda')
# Enable memory-efficient attention
# gray2rgb.unet.enable_xformers_memory_efficient_attention()
# gray_inpaintor.unet.enable_xformers_memory_efficient_attention()
with torch.autocast('cuda',dtype=torch.bfloat16):
with torch.no_grad():
# each model's input image should be one of PIL.Image, List[PIL.Image], preprocessed tensor (B,3,H,W). Image must be 3 channel
image_gray_restored = gray_inpaintor(image_gray_masked, num_inference_steps=250, seed=10)[0].convert('L') # you can pass 'mask' arg explicitly. mask : Tensor (B,1,512,512)
image_restored = gray2rgb(image_gray_restored.convert('RGB')) |