amd
/

Meta-Llama-3.1-8B-Instruct-FP8-KV

Model card Files Files and versions Community

luow-amd commited on Sep 9

Commit

6a6e7f7

•

1 Parent(s): 1056c77

fix typo (#6)

- fix typo (aa94c394879ec9c7145e482353b90372fc1f63e1)

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -34,7 +34,7 @@ python3 quantize_quark.py \
         --multi_gpu
 ```
 ## Deployment
-Quark has its own export format and allows FP8 quantized models to be efficiently deployed using the vLLM backend(vllm-compatible).
 ## Evaluation
 Quark currently uses perplexity(PPL) as the evaluation metric for accuracy loss before and after quantization.The specific PPL algorithm can be referenced in the quantize_quark.py.

         --multi_gpu
 ```
 ## Deployment
+Quark has its own export format and allows FP8 quantized models to be efficiently deployed using the vLLM backend(vLLM-compatible).
 ## Evaluation
 Quark currently uses perplexity(PPL) as the evaluation metric for accuracy loss before and after quantization.The specific PPL algorithm can be referenced in the quantize_quark.py.