File size: 3,831 Bytes
ffd555e cc9d247 6c0dae3 4d8f187 8ba54e5 4d8f187 7e1a73f 8ba54e5 7e1a73f 8ba54e5 ffd555e 7e1a73f 8ba54e5 7e1a73f 8ba54e5 7e1a73f 8ba54e5 ffd555e 8ba54e5 ffd555e 8ba54e5 ffd555e 8ba54e5 ffd555e 8ba54e5 ffd555e 7e1a73f ffd555e 8ba54e5 7e1a73f ffd555e 8ba54e5 7e1a73f 8ba54e5 7e1a73f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
---
base_model:
- stabilityai/stable-diffusion-2-inpainting
- stabilityai/stable-diffusion-2-1
pipeline_tag: image-to-image
---
# **Example Outputs**
| **Step** | **Grayscale Image (Masked)** | **Restored Grayscale Image** | **Fully Restored RGB Image** |
|----------------------------------|------------------------------------|--------------------------------------|-------------------------------------|
| **Image** | ![image_gray_masked](gray-masked.png) | ![image_gray_restored](gray-inpaint-example.png) | ![image_restored](gray-to-rgb-example.png) |
---
# **Stable Diffusion 2-Based Gray-Inpainting to RGB**
This model pipeline demonstrates an advanced workflow for restoring grayscale images, performing inpainting, and converting them to RGB. The pipeline leverages two models based on the Stable Diffusion 2 architecture:
1. **Gray-Inpainting Model**: Fills missing regions of a grayscale image using a masked inpainting process based on an **autoencoder (AE)** instead of a variational autoencoder (VAE). This simplifies the model while retaining high-quality reconstruction for the inpainted areas.
2. **Gray-to-RGB Conversion Model**: Converts the grayscale image (or inpainted output) into a full-color RGB image by introducing a **residual path in the autoencoder (AE)**. Instead of utilizing a diffusion process, the model directly predicts the latent representation of the color image, enabling efficient and accurate conversion.
---
## **Pipeline Workflow**
1. **Load Grayscale and Mask Images**:
- Grayscale image input is preprocessed to ensure it has 3 channels (`RGB` format).
- A binary mask identifies areas to restore or inpaint.
2. **Apply Gray-Inpainting**:
- The inpainting model takes the grayscale masked image and restores the missing regions using `num_inference_steps`.
3. **Convert to RGB**:
- The restored grayscale image is then processed by the gray-to-RGB model to produce a full-color output.
---
## **Code Example**
```python
import torch
import numpy as np
from PIL import Image
from diffusers.utils import load_image
from transformers import AutoConfig, AutoModel, ModelCard
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
image_gray = load_image(img_url).resize((512, 512)).convert('L').convert('RGB') # image must be 3 channel
mask_image = load_image(mask_url).resize((512, 512))
mask = (np.array(mask_image)>128)*1
image_gray_masked = Image.fromarray(((1-mask) * np.array(image_gray)).astype(np.uint8))
# Load the gray-inpaint model
gray_inpaintor = AutoModel.from_pretrained(
'jwengr/stable-diffusion-2-gray-inpaint-to-rgb',
subfolder='gray-inpaint',
trust_remote_code=True,
)
Load the gray2rgb model
gray2rgb = AutoModel.from_pretrained(
'jwengr/stable-diffusion-2-gray-inpaint-to-rgb',
subfolder='gray2rgb',
trust_remote_code=True,
)
Move models to GPU
gray_inpaintor.to('cuda')
gray2rgb.to('cuda')
# Enable memory-efficient attention
# gray2rgb.unet.enable_xformers_memory_efficient_attention()
# gray_inpaintor.unet.enable_xformers_memory_efficient_attention()
with torch.autocast('cuda',dtype=torch.bfloat16):
with torch.no_grad():
# each model's input image should be one of PIL.Image, List[PIL.Image], preprocessed tensor (B,3,H,W). Image must be 3 channel
image_gray_restored = gray_inpaintor(image_gray_masked, num_inference_steps=250, seed=10)[0].convert('L') # you can pass 'mask' arg explictly. mask : Tensor (B,1,512,512)
image_restored = gray2rgb(image_gray_restored.convert('RGB'))
|