amd
/

Meta-Llama-3.1-8B-Instruct-FP8-KV

Model card Files Files and versions Community

fix typo

#6

by luow-amd - opened Sep 9

base: refs/heads/main

←

from: refs/pr/6

Discussion Files changed

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -34,7 +34,7 @@ python3 quantize_quark.py \
         --multi_gpu
 ```
 ## Deployment
-Quark has its own export format and allows FP8 quantized models to be efficiently deployed using the vLLM backend(vllm-compatible).
 ## Evaluation
 Quark currently uses perplexity(PPL) as the evaluation metric for accuracy loss before and after quantization.The specific PPL algorithm can be referenced in the quantize_quark.py.

         --multi_gpu
 ```
 ## Deployment
+Quark has its own export format and allows FP8 quantized models to be efficiently deployed using the vLLM backend(vLLM-compatible).
 ## Evaluation
 Quark currently uses perplexity(PPL) as the evaluation metric for accuracy loss before and after quantization.The specific PPL algorithm can be referenced in the quantize_quark.py.