JustinLin610
commited on
Commit
·
05d0727
1
Parent(s):
f077bf1
Update README.md
Browse files
README.md
CHANGED
@@ -168,10 +168,10 @@ response, history = model.chat(tokenizer, "你好", history=None)
|
|
168 |
|
169 |
We illustrate the zero-shot performance of both BF16 and Int4 models on the benchmark, and we find that the quantized model does not suffer from significant performance degradation. Results are shown below:
|
170 |
|
171 |
-
|
|
172 |
-
|
173 |
-
| BF16
|
174 |
-
| Int4
|
175 |
|
176 |
### 推理速度 (Inference Speed)
|
177 |
|
@@ -179,10 +179,10 @@ We illustrate the zero-shot performance of both BF16 and Int4 models on the benc
|
|
179 |
|
180 |
We measured the average inference speed of generating 2048 and 8192 tokens under BF16 precision and Int4 quantization level, respectively.
|
181 |
|
182 |
-
|
|
183 |
-
|
184 |
-
|
|
185 |
-
|
|
186 |
|
187 |
具体而言,我们记录在长度为1的上下文的条件下生成8192个token的性能。评测运行于单张A100-SXM4-80G GPU,使用PyTorch 2.0.1和CUDA 11.4。推理速度是生成8192个token的速度均值。
|
188 |
|
|
|
168 |
|
169 |
We illustrate the zero-shot performance of both BF16 and Int4 models on the benchmark, and we find that the quantized model does not suffer from significant performance degradation. Results are shown below:
|
170 |
|
171 |
+
| Quantization | MMLU | CEval (val) | GSM8K | Humaneval |
|
172 |
+
|--------------|:----:|:-----------:|:-----:|:---------:|
|
173 |
+
| BF16 | 55.8 | 59.7 | 50.3 | 37.2 |
|
174 |
+
| Int4 | 55.1 | 59.2 | 49.7 | 35.4 |
|
175 |
|
176 |
### 推理速度 (Inference Speed)
|
177 |
|
|
|
179 |
|
180 |
We measured the average inference speed of generating 2048 and 8192 tokens under BF16 precision and Int4 quantization level, respectively.
|
181 |
|
182 |
+
| Quantization | Speed (2048 tokens) | Speed (8192 tokens) |
|
183 |
+
|--------------|:-------------------:|:-------------------:|
|
184 |
+
| BF16 | 30.53 | 28.51 |
|
185 |
+
| Int4 | 45.60 | 33.83 |
|
186 |
|
187 |
具体而言,我们记录在长度为1的上下文的条件下生成8192个token的性能。评测运行于单张A100-SXM4-80G GPU,使用PyTorch 2.0.1和CUDA 11.4。推理速度是生成8192个token的速度均值。
|
188 |
|