Quantization settings

  • vae.: torch.bfloat16. No quantization.
  • text_encoder.layers.:
    • Int8 with Optimum Quanto
    • Target layers:["q_proj", "k_proj", "v_proj", "o_proj", "mlp.down_proj", "mlp.gate_up_proj"]
  • diffusion_model.:
    • Int8 with Optimum Quanto
    • Target layers: ["to_q", "to_k", "to_v", "to_out.0", "ff.net.0.proj", "ff.net.2"]

VRAM cosumption

  • Text encoder (text_encoder.): about 11 GB
  • Denoiser (diffusion_model.): about 10 GB

Samples

torch.bfloat16 Quanto Int8
VRAM 40GB (without offloading) VRAM 28GB (without offloading)
Generation parameters
  • prompt: """ A photo of a nendoroid figure of hatsune miku holding a sign that says "CogView4" """"
  • negative_prompt: "blurry, low quality, horror"
  • height: 1152
  • width: 1152
  • cfg_scale: 3.5
  • num_inference_steps: 20
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for p1atdev/CogView4-6B-quanto_int8

Base model

THUDM/glm-4-9b
Finetuned
THUDM/CogView4-6B
Quantized
(1)
this model