amd
/

Meta-Llama-3.1-8B-Instruct-FP8-KV

luow-amd commited on Sep 9

Commit

1056c77

•

1 Parent(s): 9b9a87f

add deployment description (#5)

Files changed (1) hide show

README.md CHANGED Viewed

@@ -2,6 +2,7 @@
 license: llama3.1
 ---
 # Meta-Llama-3.1-8B-Instruct-FP8-KV
   This model was created by applying [Quark](https://quark.docs.amd.com/latest/index.html) with calibration samples from Pile dataset.
 - ## Quantization Stragegy
   - ***Quantized Layers***：All linear layers excluding "lm_head"
@@ -32,9 +33,12 @@ python3 quantize_quark.py \
         --model_export quark_safetensors \
         --multi_gpu
 ```
 ## Evaluation
 Quark currently uses perplexity(PPL) as the evaluation metric for accuracy loss before and after quantization.The specific PPL algorithm can be referenced in the quantize_quark.py.
 #### Evaluation scores
 <table>
@@ -57,6 +61,8 @@ Quark currently uses perplexity(PPL) as the evaluation metric for accuracy loss
 </table>
 #### License
 Copyright (c) 2018-2024 Advanced Micro Devices, Inc. All Rights Reserved.

 license: llama3.1
 ---
 # Meta-Llama-3.1-8B-Instruct-FP8-KV
+- ## Introduction
   This model was created by applying [Quark](https://quark.docs.amd.com/latest/index.html) with calibration samples from Pile dataset.
 - ## Quantization Stragegy
   - ***Quantized Layers***：All linear layers excluding "lm_head"
         --model_export quark_safetensors \
         --multi_gpu
 ```
+## Deployment
+Quark has its own export format and allows FP8 quantized models to be efficiently deployed using the vLLM backend(vllm-compatible).
 ## Evaluation
 Quark currently uses perplexity(PPL) as the evaluation metric for accuracy loss before and after quantization.The specific PPL algorithm can be referenced in the quantize_quark.py.
+The quantization evaluation results are conducted in pseudo-quantization mode, which may slightly differ from the actual quantized inference accuracy. These results are provided for reference only.
 #### Evaluation scores
 <table>
 </table>
 #### License
 Copyright (c) 2018-2024 Advanced Micro Devices, Inc. All Rights Reserved.