--- base_model: - stabilityai/stable-diffusion-2-inpainting - stabilityai/stable-diffusion-2-1 pipeline_tag: image-to-image --- # **Example Outputs** | **Step** | **Grayscale Image (Masked)** | **Restored Grayscale Image** | **Fully Restored RGB Image** | |----------------------------------|------------------------------------|--------------------------------------|-------------------------------------| | **Image** | ![image_gray_masked](gray-masked.png) | ![image_gray_restored](gray-inpaint-example.png) | ![image_restored](gray-to-rgb-example.png) | --- # **Stable Diffusion 2-Based Gray-Inpainting to RGB** This model pipeline demonstrates an advanced workflow for restoring grayscale images, performing inpainting, and converting them to RGB. The pipeline leverages two models based on the Stable Diffusion 2 architecture: 1. **Gray-Inpainting Model**: Fills missing regions of a grayscale image using a masked inpainting process based on an **autoencoder (AE)** instead of a variational autoencoder (VAE). This simplifies the model while retaining high-quality reconstruction for the inpainted areas. 2. **Gray-to-RGB Conversion Model**: Converts the grayscale image (or inpainted output) into a full-color RGB image by introducing a **residual path in the autoencoder (AE)**. Instead of utilizing a diffusion process, the model directly predicts the latent representation of the color image, enabling efficient and accurate conversion. --- ## **Pipeline Workflow** 1. **Load Grayscale and Mask Images**: - Grayscale image input is preprocessed to ensure it has 3 channels (`RGB` format). - A binary mask identifies areas to restore or inpaint. 2. **Apply Gray-Inpainting**: - The inpainting model takes the grayscale masked image and restores the missing regions using `num_inference_steps`. 3. **Convert to RGB**: - The restored grayscale image is then processed by the gray-to-RGB model to produce a full-color output. --- ## **Code Example** ```python import torch import numpy as np from PIL import Image from diffusers.utils import load_image from transformers import AutoConfig, AutoModel, ModelCard img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png" mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png" image_gray = load_image(img_url).resize((512, 512)).convert('L').convert('RGB') # image must be 3 channel mask_image = load_image(mask_url).resize((512, 512)) mask = (np.array(mask_image)>128)*1 image_gray_masked = Image.fromarray(((1-mask) * np.array(image_gray)).astype(np.uint8)) # Load the gray-inpaint model gray_inpaintor = AutoModel.from_pretrained( 'jwengr/stable-diffusion-2-gray-inpaint-to-rgb', subfolder='gray-inpaint', trust_remote_code=True, ) Load the gray2rgb model gray2rgb = AutoModel.from_pretrained( 'jwengr/stable-diffusion-2-gray-inpaint-to-rgb', subfolder='gray2rgb', trust_remote_code=True, ) Move models to GPU gray_inpaintor.to('cuda') gray2rgb.to('cuda') # Enable memory-efficient attention # gray2rgb.unet.enable_xformers_memory_efficient_attention() # gray_inpaintor.unet.enable_xformers_memory_efficient_attention() with torch.autocast('cuda',dtype=torch.bfloat16): with torch.no_grad(): # each model's input image should be one of PIL.Image, List[PIL.Image], preprocessed tensor (B,3,H,W). Image must be 3 channel image_gray_restored = gray_inpaintor(image_gray_masked, num_inference_steps=250, seed=10)[0].convert('L') # you can pass 'mask' arg explictly. mask : Tensor (B,1,512,512) image_restored = gray2rgb(image_gray_restored.convert('RGB'))