bowenbaoamd commited on
Commit
975dbc8
·
verified ·
1 Parent(s): 65097e1

Upload ./README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +70 -0
README.md ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: hpcai-tech/grok-1
4
+ ---
5
+
6
+
7
+ # Grok-1-FP8-KV
8
+ - ## Introduction
9
+ This model was created by applying [Quark](https://quark.docs.amd.com/latest/index.html) with calibration samples from Pile dataset.
10
+ - ## Quantization Stragegy
11
+ - ***Quantized Layers***: All linear layers excluding "lm_head", "*gate"
12
+ - ***Weight***: FP8 symmetric per-tensor
13
+ - ***Activation***: FP8 symmetric per-tensor
14
+ - ***KV Cache***: FP8 symmetric per-tensor
15
+ - ## Quick Start
16
+ 1. [Download and install Quark](https://quark.docs.amd.com/latest/install.html)
17
+ 2. Run the quantization script in the example folder using the following command line:
18
+ ```sh
19
+ export MODEL_DIR = [local model checkpoint folder] or hpcai-tech/grok-1
20
+
21
+ python3 quantize_quark.py \
22
+ --model_dir $MODEL_DIR \
23
+ --output_dir Grok-1-FP8-KV \
24
+ --quant_scheme w_fp8_a_fp8 \
25
+ --kv_cache_dtype fp8 \
26
+ --seq_len 2048 \
27
+ --num_calib_data 128 \
28
+ --model_export quark_safetensors \
29
+ --multi_gpu \
30
+ --no_weight_matrix_merge
31
+ ```
32
+ ## Deployment
33
+ Quark has its own export format and allows FP8 quantized models to be efficiently deployed using the vLLM backend(vLLM-compatible).
34
+ ## Evaluation
35
+ Quark currently uses perplexity(PPL) as the evaluation metric for accuracy loss before and after quantization.The specific PPL algorithm can be referenced in the quantize_quark.py.
36
+ The quantization evaluation results are conducted in pseudo-quantization mode, which may slightly differ from the actual quantized inference accuracy. These results are provided for reference only.
37
+ #### Evaluation scores
38
+ <table>
39
+ <tr>
40
+ <td><strong>Benchmark</strong>
41
+ </td>
42
+ <td><strong>hpcai-tech/Grok-1 (PyTorch Version) </strong>
43
+ </td>
44
+ <td><strong>Grok-1-FP8-KV (this model)</strong>
45
+ </td>
46
+ </tr>
47
+ <tr>
48
+ <td>Perplexity-wikitext2
49
+ </td>
50
+ <td>3.2410
51
+ </td>
52
+ <td>3.2739
53
+ </td>
54
+ </tr>
55
+ </table>
56
+
57
+ #### License
58
+ Copyright (c) 2018-2024 Advanced Micro Devices, Inc. All Rights Reserved.
59
+
60
+ Licensed under the Apache License, Version 2.0 (the "License");
61
+ you may not use this file except in compliance with the License.
62
+ You may obtain a copy of the License at
63
+
64
+ http://www.apache.org/licenses/LICENSE-2.0
65
+
66
+ Unless required by applicable law or agreed to in writing, software
67
+ distributed under the License is distributed on an "AS IS" BASIS,
68
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
69
+ See the License for the specific language governing permissions and
70
+ limitations under the License.