JustinLin610 commited on
Commit
601593b
·
1 Parent(s): 9d3e5eb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -45
README.md CHANGED
@@ -159,10 +159,10 @@ response, history = model.chat(tokenizer, "你好", history=None)
159
 
160
  We illustrate the zero-shot performance of both BF16 and Int4 models on the benchmark, and we find that the quantized model does not suffer from significant performance degradation. Results are shown below:
161
 
162
- | Quantization | MMLU | CEval (val) | GSM8K | Humaneval |
163
- | ------------- | :--------: | :----------: | :----: | :--------: |
164
- | BF16 | 64.6 | 69.8 | 61.0 | 43.9 |
165
- | Int4 | 63.3 | 69.0 | 59.8 | 45.7 |
166
 
167
  ### 推理速度 (Inference Speed)
168
 
@@ -170,10 +170,10 @@ We illustrate the zero-shot performance of both BF16 and Int4 models on the benc
170
 
171
  We measured the average inference speed of generating 2048 and 8192 tokens under BF16 precision and Int4 quantization level, respectively.
172
 
173
- | Quantization | Speed (2048 tokens) | Speed (8192 tokens) |
174
- | ------------- | :------------------:| :------------------:|
175
- | BF16 | 30.70 | 21.73 |
176
- | Int4 | 37.11 | 26.11 |
177
 
178
  具体而言,我们记录在长度为1的上下文的条件下生成8192个token的性能。评测运行于单张A100-SXM4-80G GPU,使用PyTorch 2.0.1和CUDA 11.4。推理速度是生成8192个token的速度均值。
179
 
@@ -185,10 +185,10 @@ In detail, the setting of profiling is generating 8192 new tokens with 1 context
185
 
186
  We also profile the peak GPU memory usage for encoding 2048 tokens as context (and generating single token) and generating 8192 tokens (with single token as context) under BF16 or Int4 quantization level, respectively. The results are shown below.
187
 
188
- | Quantization Level | Peak Usage for Encoding 2048 Tokens | Peak Usage for Generating 8192 Tokens |
189
- | ------------------ | :---------------------------------: | :-----------------------------------: |
190
- | BF16 | 30.15GB | 38.94GB |
191
- | Int4 | 13.00GB | 21.79GB |
192
 
193
  上述性能测算使用[此脚本](https://qianwen-res.oss-cn-beijing.aliyuncs.com/profile.py)完成。
194
 
@@ -202,12 +202,12 @@ The above speed and memory profiling are conducted using [this script](https://q
202
  The details of the model architecture of Qwen-14B-Chat are listed as follows
203
 
204
  | Hyperparameter | Value |
205
- | :------------- | :----: |
206
- | n_layers | 40 |
207
- | n_heads | 40 |
208
- | d_model | 5120 |
209
  | vocab size | 151851 |
210
- | sequence length | 2048 |
211
 
212
  在位置编码、FFN激活函数和normalization的实现方式上,我们也采用了目前最流行的做法,
213
  即RoPE相对位置编码、SwiGLU激活函数、RMSNorm(可选安装flash-attention加速)。
@@ -242,7 +242,7 @@ Note: Due to rounding errors caused by hardware and framework, differences in re
242
  We demonstrate the 0-shot & 5-shot accuracy of Qwen-14B-Chat on C-Eval validation set
243
 
244
  | Model | Avg. Acc. |
245
- |:--------------------------------:| :-------: |
246
  | LLaMA2-7B-Chat | 31.9 |
247
  | LLaMA2-13B-Chat | 36.2 |
248
  | LLaMA2-70B-Chat | 44.3 |
@@ -284,7 +284,7 @@ The 0-shot & 5-shot accuracy of Qwen-14B-Chat on MMLU is provided below.
284
  The performance of Qwen-14B-Chat still on the top between other human-aligned models with comparable size.
285
 
286
  | Model | Avg. Acc. |
287
- |:--------------------------------:| :-------: |
288
  | ChatGLM2-6B-Chat | 46.0 |
289
  | LLaMA2-7B-Chat | 46.2 |
290
  | InternLM-7B-Chat | 51.1 |
@@ -304,18 +304,18 @@ Qwen-14B-Chat在[HumanEval](https://github.com/openai/human-eval)的zero-shot Pa
304
 
305
  The zero-shot Pass@1 of Qwen-14B-Chat on [HumanEval](https://github.com/openai/human-eval) is demonstrated below
306
 
307
- | Model | Pass@1 |
308
- |:-----------------------:| :-------: |
309
- | ChatGLM2-6B-Chat | 11.0 |
310
- | LLaMA2-7B-Chat | 12.2 |
311
- | InternLM-7B-Chat | 14.6 |
312
- | Baichuan2-7B-Chat | 13.4 |
313
- | LLaMA2-13B-Chat | 18.9 |
314
- | Baichuan2-13B-Chat | 17.7 |
315
- | LLaMA2-70B-Chat | 32.3 |
316
- | Qwen-7B-Chat (original) | 24.4 |
317
- | **Qwen-7B-Chat** | 37.2 |
318
- | **Qwen-14B-Chat** | **43.9** |
319
 
320
  ### 数学评测(Mathematics Evaluation)
321
 
@@ -323,20 +323,20 @@ The zero-shot Pass@1 of Qwen-14B-Chat on [HumanEval](https://github.com/openai/h
323
 
324
  The accuracy of Qwen-14B-Chat on GSM8K is shown below
325
 
326
- | Model | Acc. |
327
- |:--------------------------------:| :-------: |
328
- | LLaMA2-7B-Chat | 26.3 |
329
- | ChatGLM2-6B-Chat | 28.8 |
330
- | Baichuan2-7B-Chat | 32.8 |
331
- | InternLM-7B-Chat | 33.0 |
332
- | LLaMA2-13B-Chat | 37.1 |
333
- | Baichuan2-13B-Chat | 55.3 |
334
- | LLaMA2-70B-Chat | 59.3 |
335
- | Qwen-7B-Chat (original) (0-shot) | 41.1 |
336
- | **Qwen-7B-Chat (0-shot)** | 50.3 |
337
- | **Qwen-7B-Chat (8-shot)** | 54.1 |
338
- | **Qwen-14B-Chat (0-shot)** | **60.1** |
339
- | **Qwen-14B-Chat (8-shot)** | 59.3 |
340
 
341
  ### 长序列评测(Long-Context Understanding)
342
 
 
159
 
160
  We illustrate the zero-shot performance of both BF16 and Int4 models on the benchmark, and we find that the quantized model does not suffer from significant performance degradation. Results are shown below:
161
 
162
+ | Quantization | MMLU | CEval (val) | GSM8K | Humaneval |
163
+ |--------------|:----:|:-----------:|:-----:|:---------:|
164
+ | BF16 | 64.6 | 69.8 | 61.0 | 43.9 |
165
+ | Int4 | 63.3 | 69.0 | 59.8 | 45.7 |
166
 
167
  ### 推理速度 (Inference Speed)
168
 
 
170
 
171
  We measured the average inference speed of generating 2048 and 8192 tokens under BF16 precision and Int4 quantization level, respectively.
172
 
173
+ | Quantization | Speed (2048 tokens) | Speed (8192 tokens) |
174
+ |--------------|:-------------------:|:-------------------:|
175
+ | BF16 | 30.70 | 21.73 |
176
+ | Int4 | 37.11 | 26.11 |
177
 
178
  具体而言,我们记录在长度为1的上下文的条件下生成8192个token的性能。评测运行于单张A100-SXM4-80G GPU,使用PyTorch 2.0.1和CUDA 11.4。推理速度是生成8192个token的速度均值。
179
 
 
185
 
186
  We also profile the peak GPU memory usage for encoding 2048 tokens as context (and generating single token) and generating 8192 tokens (with single token as context) under BF16 or Int4 quantization level, respectively. The results are shown below.
187
 
188
+ | Quantization Level | Peak Usage for Encoding 2048 Tokens | Peak Usage for Generating 8192 Tokens |
189
+ |--------------------|:-----------------------------------:|:-------------------------------------:|
190
+ | BF16 | 30.15GB | 38.94GB |
191
+ | Int4 | 13.00GB | 21.79GB |
192
 
193
  上述性能测算使用[此脚本](https://qianwen-res.oss-cn-beijing.aliyuncs.com/profile.py)完成。
194
 
 
202
  The details of the model architecture of Qwen-14B-Chat are listed as follows
203
 
204
  | Hyperparameter | Value |
205
+ |:----------------|:------:|
206
+ | n_layers | 40 |
207
+ | n_heads | 40 |
208
+ | d_model | 5120 |
209
  | vocab size | 151851 |
210
+ | sequence length | 2048 |
211
 
212
  在位置编码、FFN激活函数和normalization的实现方式上,我们也采用了目前最流行的做法,
213
  即RoPE相对位置编码、SwiGLU激活函数、RMSNorm(可选安装flash-attention加速)。
 
242
  We demonstrate the 0-shot & 5-shot accuracy of Qwen-14B-Chat on C-Eval validation set
243
 
244
  | Model | Avg. Acc. |
245
+ |:--------------------------------:|:---------:|
246
  | LLaMA2-7B-Chat | 31.9 |
247
  | LLaMA2-13B-Chat | 36.2 |
248
  | LLaMA2-70B-Chat | 44.3 |
 
284
  The performance of Qwen-14B-Chat still on the top between other human-aligned models with comparable size.
285
 
286
  | Model | Avg. Acc. |
287
+ |:--------------------------------:|:---------:|
288
  | ChatGLM2-6B-Chat | 46.0 |
289
  | LLaMA2-7B-Chat | 46.2 |
290
  | InternLM-7B-Chat | 51.1 |
 
304
 
305
  The zero-shot Pass@1 of Qwen-14B-Chat on [HumanEval](https://github.com/openai/human-eval) is demonstrated below
306
 
307
+ | Model | Pass@1 |
308
+ |:-----------------------:|:--------:|
309
+ | ChatGLM2-6B-Chat | 11.0 |
310
+ | LLaMA2-7B-Chat | 12.2 |
311
+ | InternLM-7B-Chat | 14.6 |
312
+ | Baichuan2-7B-Chat | 13.4 |
313
+ | LLaMA2-13B-Chat | 18.9 |
314
+ | Baichuan2-13B-Chat | 17.7 |
315
+ | LLaMA2-70B-Chat | 32.3 |
316
+ | Qwen-7B-Chat (original) | 24.4 |
317
+ | **Qwen-7B-Chat** | 37.2 |
318
+ | **Qwen-14B-Chat** | **43.9** |
319
 
320
  ### 数学评测(Mathematics Evaluation)
321
 
 
323
 
324
  The accuracy of Qwen-14B-Chat on GSM8K is shown below
325
 
326
+ | Model | Acc. |
327
+ |:--------------------------------:|:--------:|
328
+ | LLaMA2-7B-Chat | 26.3 |
329
+ | ChatGLM2-6B-Chat | 28.8 |
330
+ | Baichuan2-7B-Chat | 32.8 |
331
+ | InternLM-7B-Chat | 33.0 |
332
+ | LLaMA2-13B-Chat | 37.1 |
333
+ | Baichuan2-13B-Chat | 55.3 |
334
+ | LLaMA2-70B-Chat | 59.3 |
335
+ | Qwen-7B-Chat (original) (0-shot) | 41.1 |
336
+ | **Qwen-7B-Chat (0-shot)** | 50.3 |
337
+ | **Qwen-7B-Chat (8-shot)** | 54.1 |
338
+ | **Qwen-14B-Chat (0-shot)** | **60.1** |
339
+ | **Qwen-14B-Chat (8-shot)** | 59.3 |
340
 
341
  ### 长序列评测(Long-Context Understanding)
342