--- license: mit base_model: - Shitao/OmniGen-v1 pipeline_tag: text-to-image tags: - image-to-image --- This repo contains bitsandbytes 4bit-NF4 float16 model weights for [OmniGen-v1](https://huggingface.co/Shitao/OmniGen-v1). These are intended for Google Colab users or those with a GPU that does not support bfloat16. Other 4-bit seekers should prefer the [bf16-bnb-4bit](https://huggingface.co/gryan/OmniGen-v1-bnb-4bit) model as it produces higher quality images. For info about OmniGen see the [original model card](https://huggingface.co/Shitao/OmniGen-v1). - 8-bit weights: [gryan/OmniGen-v1-bnb-8bit](https://huggingface.co/gryan/OmniGen-v1-bnb-8bit) - 4-bit (bf16, nf4) weights: [gryan/OmniGen-v1-bnb-4bit](https://huggingface.co/gryan/OmniGen-v1-bnb-4bit) ## Usage Set up your environment by following the original [Quick Start Guide](https://huggingface.co/Shitao/OmniGen-v1#5-quick-start) before getting started. > [!IMPORTANT] > NOTE: This feature is not officially supported yet. You'll need to install the repo from [this pull request](https://github.com/VectorSpaceLab/OmniGen/pull/151). ```python from OmniGen import OmniGenPipeline, OmniGen # pass the quantized model in the pipeline model = OmniGen.from_pretrained('gryan/OmniGen-v1-fp16-bnb-4bit', dtype=torch.float16) pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1", model=model) # proceed as normal! ## Text to Image images = pipe( prompt="A curly-haired man in a red shirt is drinking tea.", height=1024, width=1024, guidance_scale=2.5, seed=0, ) images[0].save("example_t2i.png") # save output PIL Image ## Multi-modal to Image # In the prompt, we use the placeholder to represent the image. The image placeholder should be in the format of <|image_*|> # You can add multiple images in the input_images. Please ensure that each image has its placeholder. For example, for the list input_images [img1_path, img2_path], the prompt needs to have two placeholders: <|image_1|>, <|image_2|>. images = pipe( prompt="A man in a black shirt is reading a book. The man is the right man in <|image_1|>.", input_images=["./imgs/test_cases/two_man.jpg"], height=1024, width=1024, guidance_scale=2.5, img_guidance_scale=1.6, seed=0 ) images[0].save("example_ti2i.png") # save output PIL image ``` ## Image Samples Text Only FP16 4bit