luow-amd commited on
Commit
1056c77
1 Parent(s): 9b9a87f

add deployment description (#5)

Browse files

- add deployment description (73104279b05b135fa5945a0b65bf688b1eb806f6)

Files changed (1) hide show
  1. README.md +7 -1
README.md CHANGED
@@ -2,6 +2,7 @@
2
  license: llama3.1
3
  ---
4
  # Meta-Llama-3.1-8B-Instruct-FP8-KV
 
5
  This model was created by applying [Quark](https://quark.docs.amd.com/latest/index.html) with calibration samples from Pile dataset.
6
  - ## Quantization Stragegy
7
  - ***Quantized Layers***:All linear layers excluding "lm_head"
@@ -32,9 +33,12 @@ python3 quantize_quark.py \
32
  --model_export quark_safetensors \
33
  --multi_gpu
34
  ```
 
 
 
35
  ## Evaluation
36
  Quark currently uses perplexity(PPL) as the evaluation metric for accuracy loss before and after quantization.The specific PPL algorithm can be referenced in the quantize_quark.py.
37
-
38
 
39
  #### Evaluation scores
40
  <table>
@@ -57,6 +61,8 @@ Quark currently uses perplexity(PPL) as the evaluation metric for accuracy loss
57
 
58
  </table>
59
 
 
 
60
  #### License
61
  Copyright (c) 2018-2024 Advanced Micro Devices, Inc. All Rights Reserved.
62
 
 
2
  license: llama3.1
3
  ---
4
  # Meta-Llama-3.1-8B-Instruct-FP8-KV
5
+ - ## Introduction
6
  This model was created by applying [Quark](https://quark.docs.amd.com/latest/index.html) with calibration samples from Pile dataset.
7
  - ## Quantization Stragegy
8
  - ***Quantized Layers***:All linear layers excluding "lm_head"
 
33
  --model_export quark_safetensors \
34
  --multi_gpu
35
  ```
36
+ ## Deployment
37
+ Quark has its own export format and allows FP8 quantized models to be efficiently deployed using the vLLM backend(vllm-compatible).
38
+
39
  ## Evaluation
40
  Quark currently uses perplexity(PPL) as the evaluation metric for accuracy loss before and after quantization.The specific PPL algorithm can be referenced in the quantize_quark.py.
41
+ The quantization evaluation results are conducted in pseudo-quantization mode, which may slightly differ from the actual quantized inference accuracy. These results are provided for reference only.
42
 
43
  #### Evaluation scores
44
  <table>
 
61
 
62
  </table>
63
 
64
+
65
+
66
  #### License
67
  Copyright (c) 2018-2024 Advanced Micro Devices, Inc. All Rights Reserved.
68