---
base_model:
- stabilityai/stable-diffusion-2-inpainting
- stabilityai/stable-diffusion-2-1
pipeline_tag: image-to-image
library_name: diffusers
tags:
- inpaint
- colorization
- stable-diffusion
---
# **Example Outputs**

| **Step**                        | **Grayscale Image (Masked)**       | **Restored Grayscale Image**        | **Fully Restored RGB Image**        |
|----------------------------------|------------------------------------|--------------------------------------|-------------------------------------|
| **Image**                        | ![image_gray_masked](gray-masked.png) | ![image_gray_restored](gray-inpaint-example.png) | ![image_restored](gray-to-rgb-example.png) |
---

# **Stable Diffusion 2-Based Gray-Inpainting to RGB**

This model pipeline demonstrates an advanced workflow for restoring grayscale images, performing inpainting, and converting them to RGB. The pipeline leverages two models based on the Stable Diffusion 2 architecture:

1. **Gray-Inpainting Model**: Fills missing regions of a grayscale image using a masked inpainting process based on an autoencoder (AE) instead of a variational autoencoder (VAE).

2. **Gray-to-RGB Conversion Model**: Converts the grayscale image (or inpainted output) into a full-color RGB image by adding a residual path in the AE. internel unet directly predicts difference between gray and color image's latent


---

## **Code Example**

```python
import torch
import numpy as np

from PIL import Image
from diffusers.utils import load_image
from transformers import AutoConfig, AutoModel, ModelCard

img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"

image_gray = load_image(img_url).resize((512, 512)).convert('L').convert('RGB') # image must be 3 channel
mask_image = load_image(mask_url).resize((512, 512))
mask = (np.array(mask_image)>128)*1
image_gray_masked = Image.fromarray(((1-mask) * np.array(image_gray)).astype(np.uint8))

# Load the gray-inpaint model
gray_inpaintor = AutoModel.from_pretrained(
    'jwengr/stable-diffusion-2-gray-inpaint-to-rgb', 
    subfolder='gray-inpaint', 
    trust_remote_code=True, 
)

Load the gray2rgb model
gray2rgb = AutoModel.from_pretrained(
    'jwengr/stable-diffusion-2-gray-inpaint-to-rgb', 
    subfolder='gray2rgb', 
    trust_remote_code=True, 
)

Move models to GPU
gray_inpaintor.to('cuda')
gray2rgb.to('cuda')

# Enable memory-efficient attention
# gray2rgb.unet.enable_xformers_memory_efficient_attention()
# gray_inpaintor.unet.enable_xformers_memory_efficient_attention()

with torch.autocast('cuda',dtype=torch.bfloat16):
    with torch.no_grad():
        # each model's input image should be one of PIL.Image, List[PIL.Image], preprocessed tensor (B,3,H,W). Image must be 3 channel
        image_gray_restored = gray_inpaintor(image_gray_masked, num_inference_steps=250, seed=10)[0].convert('L') # you can pass 'mask' arg explictly. mask : Tensor (B,1,512,512)
        image_restored = gray2rgb(image_gray_restored.convert('RGB'))