gpustack commited on
Commit
4ecda88
·
verified ·
1 Parent(s): e74a9cd

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ inpaint-examples-min.png filter=lfs diff=lfs merge=lfs -text
37
+ *.gguf filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,116 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ license: openrail++
4
+ base_model: stabilityai/stable-diffusion-xl-base-1.0
5
+ tags:
6
+ - stable-diffusion-xl
7
+ - stable-diffusion-xl-diffusers
8
+ - text-to-image
9
+ - diffusers
10
+ - inpainting
11
+ inference: false
12
+ ---
13
+
14
+ # stable-diffusion-xl-inpainting-1.0-GGUF
15
+
16
+ !!! Experimental supported by [gpustack/llama-box v0.0.98+](https://github.com/gpustack/llama-box) only !!!
17
+
18
+ **Model creator**: [Diffusers](https://huggingface.co/diffusers)<br/>
19
+ **Original model**: [stable-diffusion-xl-1.0-inpainting-0.1](https://huggingface.co/diffusers/stable-diffusion-xl-1.0-inpainting-0.1)<br/>
20
+ **GGUF quantization**: based on stable-diffusion.cpp [ac54e](https://github.com/leejet/stable-diffusion.cpp/commit/ac54e0076052a196b7df961eb1f792c9ff4d7f22) that patched by llama-box.<br/>
21
+
22
+ | Quantization | OpenAI CLIP ViT-L/14 Quantization | OpenCLIP ViT-G/14 Quantization | VAE Quantization |
23
+ | --- | --- | --- | --- |
24
+ | FP16 | FP16 | FP16 | FP16 |
25
+ | Q8_0 | FP16 | FP16 | FP16 |
26
+ | Q4_1 | FP16 | FP16 | FP16 |
27
+ | Q4_0 | FP16 | FP16 | FP16 |
28
+
29
+
30
+ # SD-XL Inpainting 0.1 Model Card
31
+
32
+ ![inpaint-example](inpaint-examples-min.png)
33
+
34
+ SD-XL Inpainting 0.1 is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask.
35
+
36
+ The SD-XL Inpainting 0.1 was initialized with the `stable-diffusion-xl-base-1.0` weights. The model is trained for 40k steps at resolution 1024x1024 and 5% dropping of the text-conditioning to improve classifier-free classifier-free guidance sampling. For inpainting, the UNet has 5 additional input channels (4 for the encoded masked-image and 1 for the mask itself) whose weights were zero-initialized after restoring the non-inpainting checkpoint. During training, we generate synthetic masks and, in 25% mask everything.
37
+
38
+
39
+ ## How to use
40
+
41
+ ```py
42
+ from diffusers import AutoPipelineForInpainting
43
+ from diffusers.utils import load_image
44
+ import torch
45
+
46
+ pipe = AutoPipelineForInpainting.from_pretrained("diffusers/stable-diffusion-xl-1.0-inpainting-0.1", torch_dtype=torch.float16, variant="fp16").to("cuda")
47
+
48
+ img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
49
+ mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
50
+
51
+ image = load_image(img_url).resize((1024, 1024))
52
+ mask_image = load_image(mask_url).resize((1024, 1024))
53
+
54
+ prompt = "a tiger sitting on a park bench"
55
+ generator = torch.Generator(device="cuda").manual_seed(0)
56
+
57
+ image = pipe(
58
+ prompt=prompt,
59
+ image=image,
60
+ mask_image=mask_image,
61
+ guidance_scale=8.0,
62
+ num_inference_steps=20, # steps between 15 and 30 work well for us
63
+ strength=0.99, # make sure to use `strength` below 1.0
64
+ generator=generator,
65
+ ).images[0]
66
+ ```
67
+
68
+ **How it works:**
69
+ `image` | `mask_image`
70
+ :-------------------------:|:-------------------------:|
71
+ <img src="https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png" alt="drawing" width="300"/> | <img src="https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png" alt="drawing" width="300"/>
72
+
73
+
74
+ `prompt` | `Output`
75
+ :-------------------------:|:-------------------------:|
76
+ <span style="position: relative;bottom: 150px;">a tiger sitting on a park bench</span> | <img src="https://huggingface.co/datasets/valhalla/images/resolve/main/tiger.png" alt="drawing" width="300"/>
77
+
78
+ ## Model Description
79
+
80
+ - **Developed by:** The Diffusers team
81
+ - **Model type:** Diffusion-based text-to-image generative model
82
+ - **License:** [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
83
+ - **Model Description:** This is a model that can be used to generate and modify images based on text prompts. It is a [Latent Diffusion Model](https://arxiv.org/abs/2112.10752) that uses two fixed, pretrained text encoders ([OpenCLIP-ViT/G](https://github.com/mlfoundations/open_clip) and [CLIP-ViT/L](https://github.com/openai/CLIP/tree/main)).
84
+
85
+
86
+ ## Uses
87
+
88
+ ### Direct Use
89
+
90
+ The model is intended for research purposes only. Possible research areas and tasks include
91
+
92
+ - Generation of artworks and use in design and other artistic processes.
93
+ - Applications in educational or creative tools.
94
+ - Research on generative models.
95
+ - Safe deployment of models which have the potential to generate harmful content.
96
+ - Probing and understanding the limitations and biases of generative models.
97
+
98
+ Excluded uses are described below.
99
+
100
+ ### Out-of-Scope Use
101
+
102
+ The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.
103
+
104
+ ## Limitations and Bias
105
+
106
+ ### Limitations
107
+
108
+ - The model does not achieve perfect photorealism
109
+ - The model cannot render legible text
110
+ - The model struggles with more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere”
111
+ - Faces and people in general may not be generated properly.
112
+ - The autoencoding part of the model is lossy.
113
+ - When the strength parameter is set to 1 (i.e. starting in-painting from a fully masked image), the quality of the image is degraded. The model retains the non-masked contents of the image, but images look less sharp. We're investing this and working on the next version.
114
+
115
+ ### Bias
116
+ While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.
inpaint-examples-min.png ADDED
stable-diffusion-xl-inpainting-1.0-FP16.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:49381c731b1cfef58340a64d647a7be45938f805b4493f8e354efa6358bbf676
3
+ size 6937989920
stable-diffusion-xl-inpainting-1.0-Q4_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:608375abc7d6abce2aca0512719e701c3ce8ddee306dcb1a67b4cfcc1911e386
3
+ size 3772771520
stable-diffusion-xl-inpainting-1.0-Q4_1.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1eba9873f3f353be1621a12e7b564f27e062fd280eb8ec93970ec00868701bac
3
+ size 3910386528
stable-diffusion-xl-inpainting-1.0-Q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4a08b13f91933e06625a7026c787408100e2116326aeda961c9a428e61c2d9c0
3
+ size 4873718560