luow-amd haoyang-amd commited on
Commit
043616f
·
verified ·
1 Parent(s): c98d746

Update README.md (#11)

Browse files

- Update README.md (530aa354f48dd13bd6e5795e1cd98ee7516cb9b9)


Co-authored-by: haoyanli <[email protected]>

Files changed (1) hide show
  1. README.md +5 -3
README.md CHANGED
@@ -24,16 +24,18 @@ python3 quantize_quark.py \
24
  --output_dir c4ai-command-r-plus-FP8-KV \
25
  --quant_scheme w_fp8_a_fp8 \
26
  --kv_cache_dtype fp8 \
27
- --num_calib_data 128 \
28
- --model_export quark_safetensors
 
29
  # If model size is too large for single GPU, please use multi GPU instead.
30
  python3 quantize_quark.py \
31
  --model_dir $MODEL_DIR \
32
  --output_dir c4ai-command-r-plus-FP8-KV \
33
  --quant_scheme w_fp8_a_fp8 \
34
  --kv_cache_dtype fp8 \
35
- --num_calib_data 128 \
36
  --model_export quark_safetensors \
 
37
  --multi_gpu
38
  ```
39
  ## Deployment
 
24
  --output_dir c4ai-command-r-plus-FP8-KV \
25
  --quant_scheme w_fp8_a_fp8 \
26
  --kv_cache_dtype fp8 \
27
+ --num_calib_data 128 \
28
+ --model_export quark_safetensors \
29
+ --no_weight_matrix_merge
30
  # If model size is too large for single GPU, please use multi GPU instead.
31
  python3 quantize_quark.py \
32
  --model_dir $MODEL_DIR \
33
  --output_dir c4ai-command-r-plus-FP8-KV \
34
  --quant_scheme w_fp8_a_fp8 \
35
  --kv_cache_dtype fp8 \
36
+ --num_calib_data 128 \
37
  --model_export quark_safetensors \
38
+ --no_weight_matrix_merge \
39
  --multi_gpu
40
  ```
41
  ## Deployment