jwengr
/

stable-diffusion-2-gray-inpaint-to-rgb

@@ -15,8 +15,10 @@ pipeline_tag: image-to-image
 This model pipeline demonstrates an advanced workflow for restoring grayscale images, performing inpainting, and converting them to RGB. The pipeline leverages two models based on the Stable Diffusion 2 architecture:
-1. **Gray-Inpainting Model**: Fills missing regions of a grayscale image using a masked inpainting process.
-2. **Gray-to-RGB Conversion Model**: Converts the grayscale image (or inpainted output) into a full-color RGB image.
 ---
@@ -39,41 +41,43 @@ This model pipeline demonstrates an advanced workflow for restoring grayscale im
 ```python
 import torch
 import numpy as np
 from PIL import Image
 from diffusers.utils import load_image
-from transformers import AutoModel
-# Load and preprocess images
 img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
 mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
-image_gray = load_image(img_url).resize((512, 512)).convert('L').convert('RGB')  # Ensure 3-channel input
 mask_image = load_image(mask_url).resize((512, 512))
-mask = (np.array(mask_image) > 128) * 1
-image_gray_masked = Image.fromarray(((1 - mask) * np.array(image_gray)).astype(np.uint8))
-# Load models
 gray_inpaintor = AutoModel.from_pretrained(
     'jwengr/stable-diffusion-2-gray-inpaint-to-rgb',
     subfolder='gray-inpaint',
-    trust_remote_code=True
 )
 gray2rgb = AutoModel.from_pretrained(
     'jwengr/stable-diffusion-2-gray-inpaint-to-rgb',
     subfolder='gray2rgb',
-    trust_remote_code=True
 )
-# Move models to GPU
 gray_inpaintor.to('cuda')
 gray2rgb.to('cuda')
-# Memory-efficient attention (optional)
 # gray2rgb.unet.enable_xformers_memory_efficient_attention()
 # gray_inpaintor.unet.enable_xformers_memory_efficient_attention()
-# Perform image restoration and conversion
-with torch.autocast('cuda', dtype=torch.bfloat16):
     with torch.no_grad():
-        image_gray_restored = gray_inpaintor(image_gray_masked, num_inference_steps=250, seed=10)[0].convert('L')
         image_restored = gray2rgb(image_gray_restored.convert('RGB'))

 This model pipeline demonstrates an advanced workflow for restoring grayscale images, performing inpainting, and converting them to RGB. The pipeline leverages two models based on the Stable Diffusion 2 architecture:
+1. **Gray-Inpainting Model**: Fills missing regions of a grayscale image using a masked inpainting process based on an **autoencoder (AE)** instead of a variational autoencoder (VAE). This simplifies the model while retaining high-quality reconstruction for the inpainted areas.
+2. **Gray-to-RGB Conversion Model**: Converts the grayscale image (or inpainted output) into a full-color RGB image by introducing a **residual path in the autoencoder (AE)**. Instead of utilizing a diffusion process, the model directly predicts the latent representation of the color image, enabling efficient and accurate conversion.
 ---
 ```python
 import torch
 import numpy as np
 from PIL import Image
 from diffusers.utils import load_image
+from transformers import AutoConfig, AutoModel, ModelCard
 img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
 mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
+image_gray = load_image(img_url).resize((512, 512)).convert('L').convert('RGB') # image must be 3 channel
 mask_image = load_image(mask_url).resize((512, 512))
+mask = (np.array(mask_image)>128)*1
+image_gray_masked = Image.fromarray(((1-mask) * np.array(image_gray)).astype(np.uint8))
+# Load the gray-inpaint model
 gray_inpaintor = AutoModel.from_pretrained(
     'jwengr/stable-diffusion-2-gray-inpaint-to-rgb',
     subfolder='gray-inpaint',
+    trust_remote_code=True,
 )
+Load the gray2rgb model
 gray2rgb = AutoModel.from_pretrained(
     'jwengr/stable-diffusion-2-gray-inpaint-to-rgb',
     subfolder='gray2rgb',
+    trust_remote_code=True,
 )
+Move models to GPU
 gray_inpaintor.to('cuda')
 gray2rgb.to('cuda')
+# Enable memory-efficient attention
 # gray2rgb.unet.enable_xformers_memory_efficient_attention()
 # gray_inpaintor.unet.enable_xformers_memory_efficient_attention()
+with torch.autocast('cuda',dtype=torch.bfloat16):
     with torch.no_grad():
+        # each model's input image should be one of PIL.Image, List[PIL.Image], preprocessed tensor (B,3,H,W). Image must be 3 channel
+        image_gray_restored = gray_inpaintor(image_gray_masked, num_inference_steps=250, seed=10)[0].convert('L') # you can pass 'mask' arg explictly. mask : Tensor (B,1,512,512)
         image_restored = gray2rgb(image_gray_restored.convert('RGB'))