MGIE

This repository contains the UNet and LLaVA model checkpoints from Guiding Instruction-based Image Editing via Multimodal Large Language Models.

For a detailed example of usage, refer to this notebook and the official repository. Additionally, this notebook is a memory-optimized version of the original one. This decouples the MGIE inference pipeline into two broad stages:

  1. Calculate all the embeddings in a batched manner with the LLaVA model and the edit head.
  2. Pop it off the memory to gain VRAM.
  3. Loads the InstructPix2Pix pipeline and performs editing.

💡 MGIE needs additional set up steps that are important to follow before running inference. Please refer to the repository for those instructions. Importantly, it needs you to merge the LLaVA weight deltas with the original LLaMA parameters. More details are in the repository.

Processing ultra high-resolution images

Since the InstructPi2xPi2x pipeline doesn't do any internal processing to resize the input images, you might get OOMs when processing ultra high-resolution images like this one.

So, it's recommended to resize them, preserving their aspect-ratio. Here's a utility function that can be leveraged here:

from diffusers.utils import load_image

def resize_image_aspect_ratio(img_url, base_width=None, base_height=None):
    # Load the image
    img = load_image(img_url).convert("RGB")

    # Get the current width and height of the image
    width, height = img.size

    # Calculate the new dimensions based on the aspect ratio
    if base_width is not None:
        # Calculate new height based on the base_width to maintain aspect ratio
        w_percent = (base_width / float(width))
        h_size = int((float(height) * float(w_percent)))
        new_size = (base_width, h_size)
    elif base_height is not None:
        # Calculate new width based on the base_height to maintain aspect ratio
        h_percent = (base_height / float(height))
        w_size = int((float(width) * float(h_percent)))
        new_size = (w_size, base_height)
    else:
        raise ValueError("Either base_width or base_height must be provided")

    # Resize the image
    resized_img = img.resize(new_size, Image.ANTIALIAS)
    return resized_img

Citation

@inproceedings{fu2024mgie,
  author = {Tsu-Jui Fu and Wenze Hu and Xianzhi Du and William Yang Wang and Yinfei Yang, and Zhe Gan}, 
  title = {{Guiding Instruction-based Image Editing via Multimodal Large Language Models}}, 
  booktitle = {International Conference on Learning Representations (ICLR)}, 
  year = {2024} 
}
Downloads last month
15
Inference API
Unable to determine this model’s pipeline type. Check the docs .