mit-han-lab
/

svdq-int4-flux.1-depth-dev

image-generation

FLUX.1-Depth-dev

Model card Files Files and versions

Lmxyy commited on Feb 11

Commit

e5cad8a

·

verified ·

1 Parent(s): 6ba653b

Update README.md

Files changed (1) hide show

README.md +10 -4

README.md CHANGED Viewed

@@ -38,7 +38,7 @@ library_name: diffusers
 </div>
 ![teaser](https://huggingface.co/mit-han-lab/svdq-int4-flux.1-depth-dev/resolve/main/demo.jpg)
-SVDQuant is a post-training quantization technique for 4-bit weights and activations that well maintains visual fidelity. On 12B FLUX.1-dev, it achieves 3.6× memory reduction compared to the BF16 model. By eliminating CPU offloading, it offers 8.7× speedup over the 16-bit model when on a 16GB laptop 4090 GPU, 3× faster than the NF4 W4A16 baseline. On PixArt-∑, it demonstrates significantly superior visual quality over other W4A4 or even W4A8 baselines. "E2E" means the end-to-end latency including the text encoder and VAE decoder.
 ## Method
 #### Quantization Method -- SVDQuant
@@ -62,7 +62,14 @@ Overview of SVDQuant. Stage1: Originally, both the activation ***X*** and weight
 ### Diffusers
-Please follow the instructions in [mit-han-lab/nunchaku](https://github.com/mit-han-lab/nunchaku) to set up the environment. Then you can run the model with
 ```python
 import torch
@@ -90,7 +97,6 @@ image = pipe(
     prompt=prompt, control_image=control_image, height=1024, width=1024, num_inference_steps=30, guidance_scale=10.0
 ).images[0]
 image.save("flux.1-depth-dev.png")
 ```
 ### Comfy UI
@@ -100,7 +106,7 @@ Work in progress. Stay tuned!
 ## Limitations
 - The model is only runnable on NVIDIA GPUs with architectures sm_86 (Ampere: RTX 3090, A6000), sm_89 (Ada: RTX 4090), and sm_80 (A100). See this [issue](https://github.com/mit-han-lab/nunchaku/issues/1) for more details.
-- You may observe some slight differences from the BF16 models in details.
 ### Citation

 </div>
 ![teaser](https://huggingface.co/mit-han-lab/svdq-int4-flux.1-depth-dev/resolve/main/demo.jpg)
+`svdq-int4-flux.1-depth-dev` is an INT4-quantized version of [`FLUX.1-Depth-dev`](https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev). It offers approximately 4× memory savings while also running 2–3× faster than the original BF16 model.
 ## Method
 #### Quantization Method -- SVDQuant
 ### Diffusers
+Please follow the instructions in [mit-han-lab/nunchaku](https://github.com/mit-han-lab/nunchaku) to set up the environment. Also, install some ControlNet dependencies:
+```shell
+pip install git+https://github.com/asomoza/image_gen_aux.git
+pip install controlnet_aux mediapipe
+```
+Then you can run the model with
 ```python
 import torch
     prompt=prompt, control_image=control_image, height=1024, width=1024, num_inference_steps=30, guidance_scale=10.0
 ).images[0]
 image.save("flux.1-depth-dev.png")
 ```
 ### Comfy UI
 ## Limitations
 - The model is only runnable on NVIDIA GPUs with architectures sm_86 (Ampere: RTX 3090, A6000), sm_89 (Ada: RTX 4090), and sm_80 (A100). See this [issue](https://github.com/mit-han-lab/nunchaku/issues/1) for more details.
+- You may observe some slight differences from the BF16 models in detail.
 ### Citation