p1atdev
/

CogView4-6B-quanto_int8

Model card Files Files and versions

Quantization settings

vae.: torch.bfloat16. No quantization.
text_encoder.layers.:
- Int8 with Optimum Quanto
- Target layers:["q_proj", "k_proj", "v_proj", "o_proj", "mlp.down_proj", "mlp.gate_up_proj"]
diffusion_model.:
- Int8 with Optimum Quanto
- Target layers: ["to_q", "to_k", "to_v", "to_out.0", "ff.net.0.proj", "ff.net.2"]

VRAM cosumption

Text encoder (text_encoder.): about 11 GB
Denoiser (diffusion_model.): about 10 GB

Samples

`torch.bfloat16`	Quanto Int8

VRAM 40GB (without offloading)	VRAM 28GB (without offloading)

Generation parameters

prompt: """ A photo of a nendoroid figure of hatsune miku holding a sign that says "CogView4" """"
negative_prompt: "blurry, low quality, horror"
height: 1152
width: 1152
cfg_scale: 3.5
num_inference_steps: 20

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for p1atdev/CogView4-6B-quanto_int8

Base model

zai-org/glm-4-9b

Finetuned

zai-org/CogView4-6B

Quantized

(1)

this model