Lmxyy commited on
Commit
e5cad8a
·
verified ·
1 Parent(s): 6ba653b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -4
README.md CHANGED
@@ -38,7 +38,7 @@ library_name: diffusers
38
  </div>
39
 
40
  ![teaser](https://huggingface.co/mit-han-lab/svdq-int4-flux.1-depth-dev/resolve/main/demo.jpg)
41
- SVDQuant is a post-training quantization technique for 4-bit weights and activations that well maintains visual fidelity. On 12B FLUX.1-dev, it achieves 3. memory reduction compared to the BF16 model. By eliminating CPU offloading, it offers 8.7× speedup over the 16-bit model when on a 16GB laptop 4090 GPU, 3× faster than the NF4 W4A16 baseline. On PixArt-∑, it demonstrates significantly superior visual quality over other W4A4 or even W4A8 baselines. "E2E" means the end-to-end latency including the text encoder and VAE decoder.
42
 
43
  ## Method
44
  #### Quantization Method -- SVDQuant
@@ -62,7 +62,14 @@ Overview of SVDQuant. Stage1: Originally, both the activation ***X*** and weight
62
 
63
  ### Diffusers
64
 
65
- Please follow the instructions in [mit-han-lab/nunchaku](https://github.com/mit-han-lab/nunchaku) to set up the environment. Then you can run the model with
 
 
 
 
 
 
 
66
 
67
  ```python
68
  import torch
@@ -90,7 +97,6 @@ image = pipe(
90
  prompt=prompt, control_image=control_image, height=1024, width=1024, num_inference_steps=30, guidance_scale=10.0
91
  ).images[0]
92
  image.save("flux.1-depth-dev.png")
93
-
94
  ```
95
 
96
  ### Comfy UI
@@ -100,7 +106,7 @@ Work in progress. Stay tuned!
100
  ## Limitations
101
 
102
  - The model is only runnable on NVIDIA GPUs with architectures sm_86 (Ampere: RTX 3090, A6000), sm_89 (Ada: RTX 4090), and sm_80 (A100). See this [issue](https://github.com/mit-han-lab/nunchaku/issues/1) for more details.
103
- - You may observe some slight differences from the BF16 models in details.
104
 
105
  ### Citation
106
 
 
38
  </div>
39
 
40
  ![teaser](https://huggingface.co/mit-han-lab/svdq-int4-flux.1-depth-dev/resolve/main/demo.jpg)
41
+ `svdq-int4-flux.1-depth-dev` is an INT4-quantized version of [`FLUX.1-Depth-dev`](https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev). It offers approximately 4× memory savings while also running 2–3× faster than the original BF16 model.
42
 
43
  ## Method
44
  #### Quantization Method -- SVDQuant
 
62
 
63
  ### Diffusers
64
 
65
+ Please follow the instructions in [mit-han-lab/nunchaku](https://github.com/mit-han-lab/nunchaku) to set up the environment. Also, install some ControlNet dependencies:
66
+
67
+ ```shell
68
+ pip install git+https://github.com/asomoza/image_gen_aux.git
69
+ pip install controlnet_aux mediapipe
70
+ ```
71
+
72
+ Then you can run the model with
73
 
74
  ```python
75
  import torch
 
97
  prompt=prompt, control_image=control_image, height=1024, width=1024, num_inference_steps=30, guidance_scale=10.0
98
  ).images[0]
99
  image.save("flux.1-depth-dev.png")
 
100
  ```
101
 
102
  ### Comfy UI
 
106
  ## Limitations
107
 
108
  - The model is only runnable on NVIDIA GPUs with architectures sm_86 (Ampere: RTX 3090, A6000), sm_89 (Ada: RTX 4090), and sm_80 (A100). See this [issue](https://github.com/mit-han-lab/nunchaku/issues/1) for more details.
109
+ - You may observe some slight differences from the BF16 models in detail.
110
 
111
  ### Citation
112