renillhuang commited on
Commit
94f9d42
1 Parent(s): 054e6fd

readme: Update inference code

Browse files

Signed-off-by: eric <[email protected]>

README.md CHANGED
@@ -40,9 +40,11 @@ tags:
40
  - [📖 Model Introduction](#model-introduction)
41
  - [🔗 Model Download](#model-download)
42
  - [🔖 Model Benchmark](#model-benchmark)
 
43
  - [📜 Declarations & License](#declarations-license)
44
  - [🥇 Company Introduction](#company-introduction)
45
 
 
46
  <a name="model-introduction"></a><br>
47
  # 1. Model Introduction
48
 
@@ -52,9 +54,9 @@ tags:
52
  - The model demonstrates excellent performance in comprehensive evaluations compared to other base models of the same parameter scale.
53
  - It has strong multilingual capabilities, significantly leading in Japanese and Korean test sets, and also performing comprehensively better in Arabic, German, French, and Spanish test sets.
54
  - Model Hyper-Parameters
55
- - The architecture of the OrionMOE 8*7B models closely resembles that of Mixtral 8*7B, with specific details shown in the table below.
56
 
57
- |Configuration |OrionMOE 8*7B|
58
  |-------------------|-------------|
59
  |Hidden Size | 4096 |
60
  |# Layers | 32 |
@@ -75,7 +77,7 @@ tags:
75
  - Model pretrain data distribution
76
  - The training dataset is primarily composed of English, Chinese, and other languages, accounting for 50%, 25%, and 12% of the data, respectively. Additionally, code makes up 9%, while mathematical text accounts for 4%. The distribution by topics is detailed in the table below.
77
  <div align="center">
78
- <img src="./assets/imgs/data_src_dist.png" alt="logo" width="80%" />
79
  </div>
80
 
81
 
@@ -84,8 +86,8 @@ tags:
84
 
85
  Model release and download links are provided in the table below:
86
 
87
- | Model Name | HuggingFace Download Links | ModelScope Download Links |
88
- |----------------------|-----------------------------------------------------------------------------|-------------------------------------------------------------------------------------------|
89
  | ⚾Orion-MOE8x7B-Base | [Orion-MOE8x7B-Base](https://huggingface.co/OrionStarAI/Orion-MOE8x7B-Base) | [Orion-MOE8x7B-Base](https://modelscope.cn/models/OrionStarAI/Orion-MOE8x7B-Base/summary) |
90
 
91
 
@@ -94,7 +96,7 @@ Model release and download links are provided in the table below:
94
 
95
  ## 3.1. Base Model Orion-MOE8x7B-Base Benchmarks
96
  ### 3.1.1. LLM evaluation results on examination and professional knowledge
97
- |TestSet|Mixtral 8*7B|Qwen1.5-32b|Qwen2.5-32b|Orion 14B|Orion 8*7B|
98
  | ----------- | ----- | ----- | ----- | ----- | ----- |
99
  |CEval | 54.09 | 83.50 | 87.74 | 72.80 | 89.74 |
100
  |CMMLU | 53.21 | 82.30 | 89.01 | 70.57 | 89.16 |
@@ -146,10 +148,8 @@ Model release and download links are provided in the table below:
146
  ### 3.1.5. Leakage Detection Benchmark
147
  When the pre-training data of a large language model contains content from a specific dataset, the model’s performance on that dataset may be artificially enhanced, leading to inaccurate performance evaluations. To address this issue, researchers from the Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, and other institutions have proposed a simple and effective method for detecting data leakage. This method leverages the interchangeable nature of multiple-choice options by shuffling the options in the original dataset to generate derived data. The log-probability distribution of the derived dataset is then computed using the model to detect whether the original dataset has been leaked.
148
 
149
- We conducted data leakage detection experiments on three benchmark datasets: MMLU, CMMLU, and C-Eval.
150
-
151
- More details can be found in the paper: https://web3.arxiv.org/pdf/2409.01790.
152
-
153
  Test code: https://github.com/nishiwen1214/Benchmark-leakage-detection.
154
 
155
  |Threshold 0.2|Qwen2.5 32B|Qwen1.5 32B|Orion 8x7B|Orion 14B|Mixtral 8x7B|
@@ -158,31 +158,29 @@ Test code: https://github.com/nishiwen1214/Benchmark-leakage-detection.
158
  |CEval | 0.39 | 0.38 | 0.27 | 0.26 | 0.26 |
159
  |CMMLU | 0.38 | 0.39 | 0.23 | 0.27 | 0.22 |
160
 
161
- ### 3.1.6. Inference speed[Todo]
162
- Based on 8x Nvidia RTX3090, in unit of tokens per second.
163
- |OrionLLM_V2.4.6.1 | 1para_out62 | 1para_out85 | 1para_out125 | 1para_out210 |
164
- |----|----|----|----|----|
165
- |OrionMOE | 33.03544296 | 33.43113606 | 33.53014102 | 33.58693529 |
166
- |Qwen32B | 26.46267188 | 26.72846906 | 26.80413838 | 27.03123611 |
167
- |Orion14B | 41.69121312 | 41.77423491 | 41.76050902 | 42.26096669 |
168
-
169
- |OrionLLM_V2.4.6.1 | 4para_out62 | 4para_out90 | 4para_out125 | 4para_out220 |
170
- |----|----|----|----|----|
171
- |OrionMOE | 29.45015743 | 30.4472947 | 31.03748516 | 31.45783599 |
172
- |Qwen32B | 23.60912215 | 24.30431956 | 24.86132023 | 25.16827535 |
173
- |Orion14B | 38.08240373 | 38.8572788 | 39.50040645 | 40.44875947 |
174
-
175
- |OrionLLM_V2.4.6.1 | 8para_out62 | 8para_out85 | 8para_out125 | 8para_out220 |
176
- |----|----|----|----|----|
177
- |OrionMOE | 25.71006327 | 27.13446743 | 28.89463226 | 29.70440167 |
178
- |Qwen32B | 21.15920951 | 21.92001035 | 23.13867947 | 23.5649106 |
179
- |Orion14B | 34.4151923 | 36.05635893 | 37.0874908 | 37.91705944 |
180
 
181
- <div align="center">
182
- <img src="./assets/imgs/inf_spd_en.png" alt="inf_speed" width="100%" />
183
- </div>
 
 
 
 
 
 
184
 
 
 
 
 
185
 
 
 
 
 
 
186
 
187
 
188
  <a name="model-inference"></a><br>
@@ -225,7 +223,21 @@ device, you can use something like `export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7`
225
  CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python demo/text_generation_base.py --model OrionStarAI/Orion-MOE8x7B-Base --tokenizer OrionStarAI/Orion-MOE8x7B-Base --prompt hello
226
 
227
  ```
228
- ## 4.3. [Todo] vLLM inference code
 
 
 
 
 
 
 
 
 
 
 
 
 
 
229
 
230
 
231
  <a name="declarations-license"></a><br>
 
40
  - [📖 Model Introduction](#model-introduction)
41
  - [🔗 Model Download](#model-download)
42
  - [🔖 Model Benchmark](#model-benchmark)
43
+ - [📊 Model Inference](#model-inference)
44
  - [📜 Declarations & License](#declarations-license)
45
  - [🥇 Company Introduction](#company-introduction)
46
 
47
+
48
  <a name="model-introduction"></a><br>
49
  # 1. Model Introduction
50
 
 
54
  - The model demonstrates excellent performance in comprehensive evaluations compared to other base models of the same parameter scale.
55
  - It has strong multilingual capabilities, significantly leading in Japanese and Korean test sets, and also performing comprehensively better in Arabic, German, French, and Spanish test sets.
56
  - Model Hyper-Parameters
57
+ - The architecture of the OrionMOE 8x7B models closely resembles that of Mixtral 8x7B, with specific details shown in the table below.
58
 
59
+ |Configuration |OrionMOE 8x7B|
60
  |-------------------|-------------|
61
  |Hidden Size | 4096 |
62
  |# Layers | 32 |
 
77
  - Model pretrain data distribution
78
  - The training dataset is primarily composed of English, Chinese, and other languages, accounting for 50%, 25%, and 12% of the data, respectively. Additionally, code makes up 9%, while mathematical text accounts for 4%. The distribution by topics is detailed in the table below.
79
  <div align="center">
80
+ <img src="./assets/imgs/data_src_dist.png" alt="logo" width="70%" />
81
  </div>
82
 
83
 
 
86
 
87
  Model release and download links are provided in the table below:
88
 
89
+ | Model Name | HuggingFace Download Links | ModelScope Download Links |
90
+ |------------|----------------------------|---------------------------|
91
  | ⚾Orion-MOE8x7B-Base | [Orion-MOE8x7B-Base](https://huggingface.co/OrionStarAI/Orion-MOE8x7B-Base) | [Orion-MOE8x7B-Base](https://modelscope.cn/models/OrionStarAI/Orion-MOE8x7B-Base/summary) |
92
 
93
 
 
96
 
97
  ## 3.1. Base Model Orion-MOE8x7B-Base Benchmarks
98
  ### 3.1.1. LLM evaluation results on examination and professional knowledge
99
+ |TestSet|Mixtral 8x7B|Qwen1.5-32b|Qwen2.5-32b|Orion 14B|Orion 8x7B|
100
  | ----------- | ----- | ----- | ----- | ----- | ----- |
101
  |CEval | 54.09 | 83.50 | 87.74 | 72.80 | 89.74 |
102
  |CMMLU | 53.21 | 82.30 | 89.01 | 70.57 | 89.16 |
 
148
  ### 3.1.5. Leakage Detection Benchmark
149
  When the pre-training data of a large language model contains content from a specific dataset, the model’s performance on that dataset may be artificially enhanced, leading to inaccurate performance evaluations. To address this issue, researchers from the Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, and other institutions have proposed a simple and effective method for detecting data leakage. This method leverages the interchangeable nature of multiple-choice options by shuffling the options in the original dataset to generate derived data. The log-probability distribution of the derived dataset is then computed using the model to detect whether the original dataset has been leaked.
150
 
151
+ We conducted data leakage detection experiments on three benchmark datasets: MMLU, CMMLU, and C-Eval.<br>
152
+ More details can be found in the paper: https://web3.arxiv.org/pdf/2409.01790.<br>
 
 
153
  Test code: https://github.com/nishiwen1214/Benchmark-leakage-detection.
154
 
155
  |Threshold 0.2|Qwen2.5 32B|Qwen1.5 32B|Orion 8x7B|Orion 14B|Mixtral 8x7B|
 
158
  |CEval | 0.39 | 0.38 | 0.27 | 0.26 | 0.26 |
159
  |CMMLU | 0.38 | 0.39 | 0.23 | 0.27 | 0.22 |
160
 
161
+ ### 3.1.6. Inference speed
162
+ Setup inference server on 8x Nvidia RTX3090, and get results from client in unit of tokens per second.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
163
 
164
+ |OrionLLM_V2.4.6.1|1para_out62|1para_out85|1para_out125|1para_out210|
165
+ |---------|-------|-------|-------|-------|
166
+ |OrionMOE | 33.04 | 33.43 | 33.53 | 33.59 |
167
+ |Qwen32 | 26.46 | 26.73 | 26.80 | 27.03 |
168
+
169
+ |OrionLLM_V2.4.6.1|4para_out62|4para_out90|4para_out125|4para_out220|
170
+ |---------|-------|-------|-------|-------|
171
+ |OrionMOE | 29.45 | 30.45 | 31.04 | 31.46 |
172
+ |Qwen32 | 23.61 | 24.30 | 24.86 | 25.17 |
173
 
174
+ |OrionLLM_V2.4.6.1|8para_out62|8para_out85|8para_out125|8para_out220|
175
+ |---------|-------|-------|-------|-------|
176
+ |OrionMOE | 25.71 | 27.13 | 28.89 | 29.70 |
177
+ |Qwen32 | 21.16 | 21.92 | 23.14 | 23.56 |
178
 
179
+ We found that the inference speed results vary based on the number of concurrent requests and the length of output. To facilitate horizontal comparisons, we conducted multiple sets of tests. Each set of test data has a specific format: \<n>para_out\<m>. For example, "4para_out220" indicates the inference speed when there are 4 concurrent requests from the client and the average output token length is 220.
180
+
181
+ <div align="center">
182
+ <img src="./assets/imgs/inf_spd.png" alt="inf_speed" width="100%" />
183
+ </div>
184
 
185
 
186
  <a name="model-inference"></a><br>
 
223
  CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python demo/text_generation_base.py --model OrionStarAI/Orion-MOE8x7B-Base --tokenizer OrionStarAI/Orion-MOE8x7B-Base --prompt hello
224
 
225
  ```
226
+ ## 4.3. vLLM Inference Service
227
+ Download project(https://github.com/OrionStarAI/vllm_server), follow the instructions to build up the vLLM service docker image.
228
+ ```shell
229
+ git clone [email protected]:OrionStarAI/vllm_server.git
230
+ cd vllm_server
231
+ docker build -t vllm_server:0.0.0.0 -f Dockerfile .
232
+ ```
233
+ Start docker service
234
+ ```shell
235
+ docker run --gpus all -it -p 9999:9999 -v $(pwd)/logs:/workspace/logs:rw -v $HOME/Downloads:/workspace/models -e CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 -e MODEL_DIR=Orion-MOE8x7B-Base -e MODEL_NAME=orion-moe vllm_server:0.0.0.0
236
+ ```
237
+ Run inference
238
+ ```shell
239
+ curl http://0.0.0.0:9999/v1/chat/completions -H "Content-Type: application/json" -d '{"model": "orion-moe","temperature": 0.2,"stream": false, "messages": [{"role": "user", "content":"Which company developed you as an AI agent?"}]}'
240
+ ```
241
 
242
 
243
  <a name="declarations-license"></a><br>
README_zh.md CHANGED
@@ -31,6 +31,7 @@
31
  - [📖 模型介绍](#zh_model-introduction)
32
  - [🔗 下载路径](#zh_model-download)
33
  - [🔖 评估结果](#zh_model-benchmark)
 
34
  - [📜 声明协议](#zh_declarations-license)
35
  - [🥇 企业介绍](#zh_company-introduction)
36
 
@@ -47,7 +48,7 @@
47
  - Orion-MOE8x7B-Base模型超参
48
  - Orion-MOE8x7B-Base模型架构接近Mixtral 8x7B,超参细节请参考下表
49
 
50
- |Configuration |OrionMOE 8*7B|
51
  |-------------------|-------------|
52
  |Hidden Size | 4096 |
53
  |# Layers | 32 |
@@ -68,7 +69,7 @@
68
  - Orion-MOE8x7B-Base训练数据组成
69
  - 预训练数据语种上主要由英语、中文和其他多语言语言组成,分别占比50%、25%和12%。数据分类上,代码占9%,数学文本占4%,分布参考下图。
70
  <div align="center">
71
- <img src="./assets/imgs/data_src_dist.png" alt="logo" width="80%" />
72
  </div>
73
 
74
 
@@ -77,10 +78,9 @@
77
 
78
  发布模型和下载链接见下表:
79
 
80
- | 模型名称 | HuggingFace下载链接 | ModelScope下载链接 |
81
- |---------------------|-----------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------|
82
- | ⚾ 基座模型 | [Orion-MOE8x7B-Base](https://huggingface.co/OrionStarAI/Orion-MOE8x7B-Base) | [Orion-MOE8x7B-Base](https://modelscope.cn/models/OrionStarAI/Orion-MOE8x7B-Base/summary) |
83
-
84
 
85
 
86
  <a name="zh_model-benchmark"></a><br>
@@ -89,7 +89,7 @@
89
  ## 3.1. 基座模型Orion-MOE8x7B-Base评估
90
 
91
  ### 3.1.1. 基座模型基准测试对比
92
- |TestSet|Mixtral 8*7B|Qwen1.5-32b|Qwen2.5-32b|Orion 14B|Orion 8*7B|
93
  | ----------- | ----- | ----- | ----- | ----- | ----- |
94
  |CEval | 54.09 | 83.50 | 87.74 | 72.80 | 89.74 |
95
  |CMMLU | 53.21 | 82.30 | 89.01 | 70.57 | 89.16 |
@@ -110,8 +110,6 @@
110
  |GSM8K | 47.50 | 77.40 | 80.36 | 52.01 | 59.82 |
111
  |MATH | 28.40 | 36.10 | 48.88 | 7.84 | 23.68 |
112
 
113
-
114
-
115
  ### 3.1.2. 小语种: 日文
116
  | Model | JSQuAD | JCommonSenseQA | JNLI | MARC-ja | JAQKET v2 | PAWS-ja | avg |
117
  |--------------|-------|-------|-------|-------|-------|-------|-------|
@@ -121,7 +119,6 @@
121
  |Orion-14B-Base| 74.22 | 88.20 | 72.85 | 94.06 | 66.20 | 49.90 | 74.24 |
122
  |Orion 8x7B | 91.77 | 90.43 | 90.46 | 96.40 | 81.19 | 47.35 | 82.93 |
123
 
124
-
125
  ### 3.1.3. 小语种: 韩文
126
  |Model | HAE-RAE | KoBEST BoolQ | KoBEST COPA | KoBEST HellaSwag | KoBEST SentiNeg | KoBEST WiC | PAWS-ko | avg |
127
  |--------------|-------|-------|-------|-------|-------|-------|-------|-------|
@@ -131,8 +128,6 @@
131
  |Orion-14B-Base| 69.66 | 80.63 | 77.10 | 58.20 | 92.44 | 51.19 | 44.55 | 67.68 |
132
  |Orion 8x7B | 65.17 | 85.40 | 80.40 | 56.00 | 96.98 | 73.57 | 46.35 | 71.98 |
133
 
134
-
135
-
136
  ### 3.1.4. 小语种: 阿拉伯语,德语,法语,西班牙语
137
  | Lang | ar | | de | | fr | | es | |
138
  |----|----|----|----|----|----|----|----|----|
@@ -143,14 +138,11 @@
143
  |Orion-14B-Base| 69.66 | 80.63 | 77.10 | 58.20 | 92.44 | 51.19 | 44.55 | 67.68 |
144
  |Orion 8x7B | 65.17 | 85.40 | 80.40 | 56.00 | 96.98 | 73.57 | 46.35 | 71.98 |
145
 
146
-
147
  ### 3.1.5. 泄漏检测结果
148
  当大型语言模型的预训练数据包含特定数据集的内容时,该模型在该数据集上的表现可能会被人为提高,从而导致不准确的性能评估。为了解决这个问题,来自中国科学院深圳先进技术研究院和其他机构的研究人员提出了一种简单有效的数据泄露检测方法。该方法利用多选项的可互换性,通过打乱原始数据集中的选项生成派生数据。然后,使用模型计算派生数据集的对数概率分布,以检测原始数据集是否存在泄露。
149
 
150
- 我们在三个基准数据集上进行了数据泄露检测实验:MMLU、CMMLU 和 C-Eval
151
-
152
- 更多细节可以在论文中找到:https://web3.arxiv.org/pdf/2409.01790。
153
-
154
  测试代码:https://github.com/nishiwen1214/Benchmark-leakage-detection。
155
 
156
  |Threshold 0.2|Qwen2.5 32B|Qwen1.5 32B|Orion 8x7B|Orion 14B|Mixtral 8x7B|
@@ -159,32 +151,32 @@
159
  |CEval | 0.39 | 0.38 | 0.27 | 0.26 | 0.26 |
160
  |CMMLU | 0.38 | 0.39 | 0.23 | 0.27 | 0.22 |
161
 
 
 
162
 
163
- ### 3.1.6. 推理速度[Todo: Remove result of 14B, add more description of result]
164
- 基于8卡Nvidia RTX3090,单位是令牌每秒
165
- |OrionLLM_V2.4.6.1 | 1并发_输出62 | 1并发_输出85 | 1并发_输出125 | 1并发_输出210 |
166
- |----|----|----|----|----|
167
- |OrionMOE | 33.03544296 | 33.43113606 | 33.53014102 | 33.58693529 |
168
- |Qwen32B | 26.46267188 | 26.72846906 | 26.80413838 | 27.03123611 |
169
- |Orion14B | 41.69121312 | 41.77423491 | 41.76050902 | 42.26096669 |
170
 
171
- |OrionLLM_V2.4.6.1 | 4并发_输出62 | 4并发_输出90 | 4并发_输出125 | 4并发_输出220 |
172
- |----|----|----|----|----|
173
- |OrionMOE | 29.45015743 | 30.4472947 | 31.03748516 | 31.45783599 |
174
- |Qwen32B | 23.60912215 | 24.30431956 | 24.86132023 | 25.16827535 |
175
- |Orion14B | 38.08240373 | 38.8572788 | 39.50040645 | 40.44875947 |
176
 
177
- |OrionLLM_V2.4.6.1 | 8并发_输出62 | 8并发_输出85 | 8并发_输出125 | 8并发_输出220 |
178
- |----|----|----|----|----|
179
- |OrionMOE | 25.71006327 | 27.13446743 | 28.89463226 | 29.70440167 |
180
- |Qwen32B | 21.15920951 | 21.92001035 | 23.13867947 | 23.5649106 |
181
- |Orion14B | 34.4151923 | 36.05635893 | 37.0874908 | 37.91705944 |
 
182
 
183
  <div align="center">
184
- <img src="./assets/imgs/inf_spd_zh.png" alt="inf_speed" width="100%" />
185
  </div>
186
 
187
 
 
188
  # 4. 模型推理
189
 
190
  推理所需的模型权重、源码、配置已发布在 Hugging Face,下载链接见本文档最开始的表格。我们在此示范多种推理方式。程序会自动从
@@ -211,11 +203,9 @@ response = model.chat(tokenizer, messages, streaming=Flase)
211
  print(response)
212
 
213
  ```
214
-
215
  在上述两段代码中,模型加载指定 `device_map='auto'`
216
  ,会使用所有可用显卡。如需指定使用的设备,可以使用类似 `export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7`(使用了0、1、2、3、4、5、6、7号显卡)的方式控制。
217
 
218
-
219
  ## 4.2. 脚本直接推理
220
 
221
  ```shell
@@ -224,7 +214,21 @@ print(response)
224
  CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python demo/text_generation_base.py --model OrionStarAI/Orion-MOE8x7B-Base --tokenizer OrionStarAI/Orion-MOE8x7B-Base --prompt 你好,你叫什么名字
225
 
226
  ```
227
- ## 4.3. [Todo] vLLM推理代码
 
 
 
 
 
 
 
 
 
 
 
 
 
 
228
 
229
 
230
  <a name="zh_declarations-license"></a><br>
 
31
  - [📖 模型介绍](#zh_model-introduction)
32
  - [🔗 下载路径](#zh_model-download)
33
  - [🔖 评估结果](#zh_model-benchmark)
34
+ - [📊 模型推理](#zh_model-inference)
35
  - [📜 声明协议](#zh_declarations-license)
36
  - [🥇 企业介绍](#zh_company-introduction)
37
 
 
48
  - Orion-MOE8x7B-Base模型超参
49
  - Orion-MOE8x7B-Base模型架构接近Mixtral 8x7B,超参细节请参考下表
50
 
51
+ |Configuration |OrionMOE 8x7B|
52
  |-------------------|-------------|
53
  |Hidden Size | 4096 |
54
  |# Layers | 32 |
 
69
  - Orion-MOE8x7B-Base训练数据组成
70
  - 预训练数据语种上主要由英语、中文和其他多语言语言组成,分别占比50%、25%和12%。数据分类上,代码占9%,数学文本占4%,分布参考下图。
71
  <div align="center">
72
+ <img src="./assets/imgs/data_src_dist.png" alt="logo" width="70%" />
73
  </div>
74
 
75
 
 
78
 
79
  发布模型和下载链接见下表:
80
 
81
+ | 模型名称 | HuggingFace下载链接 | ModelScope下载链接 |
82
+ |---------|-------------------|-------------------|
83
+ | ⚾ 基座模型 | [Orion-MOE8x7B-Base](https://huggingface.co/OrionStarAI/Orion-MOE8x7B-Base) | [Orion-MOE8x7B-Base](https://modelscope.cn/models/OrionStarAI/Orion-MOE8x7B-Base/summary) |
 
84
 
85
 
86
  <a name="zh_model-benchmark"></a><br>
 
89
  ## 3.1. 基座模型Orion-MOE8x7B-Base评估
90
 
91
  ### 3.1.1. 基座模型基准测试对比
92
+ |TestSet|Mixtral 8x7B|Qwen1.5-32b|Qwen2.5-32b|Orion 14B|Orion 8x7B|
93
  | ----------- | ----- | ----- | ----- | ----- | ----- |
94
  |CEval | 54.09 | 83.50 | 87.74 | 72.80 | 89.74 |
95
  |CMMLU | 53.21 | 82.30 | 89.01 | 70.57 | 89.16 |
 
110
  |GSM8K | 47.50 | 77.40 | 80.36 | 52.01 | 59.82 |
111
  |MATH | 28.40 | 36.10 | 48.88 | 7.84 | 23.68 |
112
 
 
 
113
  ### 3.1.2. 小语种: 日文
114
  | Model | JSQuAD | JCommonSenseQA | JNLI | MARC-ja | JAQKET v2 | PAWS-ja | avg |
115
  |--------------|-------|-------|-------|-------|-------|-------|-------|
 
119
  |Orion-14B-Base| 74.22 | 88.20 | 72.85 | 94.06 | 66.20 | 49.90 | 74.24 |
120
  |Orion 8x7B | 91.77 | 90.43 | 90.46 | 96.40 | 81.19 | 47.35 | 82.93 |
121
 
 
122
  ### 3.1.3. 小语种: 韩文
123
  |Model | HAE-RAE | KoBEST BoolQ | KoBEST COPA | KoBEST HellaSwag | KoBEST SentiNeg | KoBEST WiC | PAWS-ko | avg |
124
  |--------------|-------|-------|-------|-------|-------|-------|-------|-------|
 
128
  |Orion-14B-Base| 69.66 | 80.63 | 77.10 | 58.20 | 92.44 | 51.19 | 44.55 | 67.68 |
129
  |Orion 8x7B | 65.17 | 85.40 | 80.40 | 56.00 | 96.98 | 73.57 | 46.35 | 71.98 |
130
 
 
 
131
  ### 3.1.4. 小语种: 阿拉伯语,德语,法语,西班牙语
132
  | Lang | ar | | de | | fr | | es | |
133
  |----|----|----|----|----|----|----|----|----|
 
138
  |Orion-14B-Base| 69.66 | 80.63 | 77.10 | 58.20 | 92.44 | 51.19 | 44.55 | 67.68 |
139
  |Orion 8x7B | 65.17 | 85.40 | 80.40 | 56.00 | 96.98 | 73.57 | 46.35 | 71.98 |
140
 
 
141
  ### 3.1.5. 泄漏检测结果
142
  当大型语言模型的预训练数据包含特定数据集的内容时,该模型在该数据集上的表现可能会被人为提高,从而导致不准确的性能评估。为了解决这个问题,来自中国科学院深圳先进技术研究院和其他机构的研究人员提出了一种简单有效的数据泄露检测方法。该方法利用多选项的可互换性,通过打乱原始数据集中的选项生成派生数据。然后,使用模型计算派生数据集的对数概率分布,以检测原始数据集是否存在泄露。
143
 
144
+ 我们在三个基准数据集上进行了数据泄露检测实验:MMLU、CMMLU 和 C-Eval。<br>
145
+ 更多细节可以在论文中找到:https://web3.arxiv.org/pdf/2409.01790。<br>
 
 
146
  测试代码:https://github.com/nishiwen1214/Benchmark-leakage-detection。
147
 
148
  |Threshold 0.2|Qwen2.5 32B|Qwen1.5 32B|Orion 8x7B|Orion 14B|Mixtral 8x7B|
 
151
  |CEval | 0.39 | 0.38 | 0.27 | 0.26 | 0.26 |
152
  |CMMLU | 0.38 | 0.39 | 0.23 | 0.27 | 0.22 |
153
 
154
+ ### 3.1.6. 推理速度
155
+ 搭建基于8卡Nvidia RTX3090,采���"token/秒"为单位,从客户端统计测试结果。
156
 
157
+ |OrionLLM_V2.4.6.1|1并发_输出62|1并发_输出85|1并发_输出125|1并发_输出210|
158
+ |---------|-------|-------|-------|-------|
159
+ |OrionMOE | 33.04 | 33.43 | 33.53 | 33.59 |
160
+ |Qwen32 | 26.46 | 26.73 | 26.80 | 27.03 |
 
 
 
161
 
162
+ |OrionLLM_V2.4.6.1|4并发_输出62|4并发_输出90|4并发_输出125|4并发_220|
163
+ |---------|-------|-------|-------|-------|
164
+ |OrionMOE | 29.45 | 30.45 | 31.04 | 31.46 |
165
+ |Qwen32 | 23.61 | 24.30 | 24.86 | 25.17 |
 
166
 
167
+ |OrionLLM_V2.4.6.1|8并发_输出62|8并发_输出85|8并发_输出125|8并发_输出220|
168
+ |---------|-------|-------|-------|-------|
169
+ |OrionMOE | 25.71 | 27.13 | 28.89 | 29.70 |
170
+ |Qwen32 | 21.16 | 21.92 | 23.14 | 23.56 |
171
+
172
+ 我们发现根据推理的并发数以及模型输出长度的不同,推理速度的结果会有变化,为了方便横向对比,我们做了多组数据的测试,每一组测试数据的格式含义:客户端并发数_每次推理输出token数,在此前提条件下的推理速度,例如:4para_out220,表示客户端4并发打请求,输出token数平均在220个token时的推理速度。
173
 
174
  <div align="center">
175
+ <img src="./assets/imgs/inf_spd.png" alt="inf_speed" width="100%" />
176
  </div>
177
 
178
 
179
+ <a name="zh_model-inference"></a><br>
180
  # 4. 模型推理
181
 
182
  推理所需的模型权重、源码、配置已发布在 Hugging Face,下载链接见本文档最开始的表格。我们在此示范多种推理方式。程序会自动从
 
203
  print(response)
204
 
205
  ```
 
206
  在上述两段代码中,模型加载指定 `device_map='auto'`
207
  ,会使用所有可用显卡。如需指定使用的设备,可以使用类似 `export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7`(使用了0、1、2、3、4、5、6、7号显卡)的方式控制。
208
 
 
209
  ## 4.2. 脚本直接推理
210
 
211
  ```shell
 
214
  CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python demo/text_generation_base.py --model OrionStarAI/Orion-MOE8x7B-Base --tokenizer OrionStarAI/Orion-MOE8x7B-Base --prompt 你好,你叫什么名字
215
 
216
  ```
217
+ ## 4.3. vLLM推理服务
218
+ 下载工程(https://github.com/OrionStarAI/vllm_server), 搭建基于vLLM的推理服务镜像.
219
+ ```shell
220
+ git clone [email protected]:OrionStarAI/vllm_server.git
221
+ cd vllm_server
222
+ docker build -t vllm_server:0.0.0.0 -f Dockerfile .
223
+ ```
224
+ 开启docker镜像服务
225
+ ```shell
226
+ docker run --gpus all -it -p 9999:9999 -v $(pwd)/logs:/workspace/logs:rw -v $HOME/Downloads:/workspace/models -e CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 -e MODEL_DIR=Orion-MOE8x7B-Base -e MODEL_NAME=orion-moe vllm_server:0.0.0.0
227
+ ```
228
+ 运行推理
229
+ ```shell
230
+ curl http://0.0.0.0:9999/v1/chat/completions -H "Content-Type: application/json" -d '{"model": "orion-moe","temperature": 0.2,"stream": false, "messages": [{"role": "user", "content":"Which company developed you as an AI agent?"}]}'
231
+ ```
232
 
233
 
234
  <a name="zh_declarations-license"></a><br>
assets/imgs/inf_spd.png ADDED
assets/imgs/inf_spd_en.png DELETED
Binary file (140 kB)
 
assets/imgs/inf_spd_zh.png DELETED
Binary file (56.7 kB)