sperfu commited on
Commit
a339536
1 Parent(s): 2c3fffe

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +404 -49
README.md CHANGED
@@ -1,122 +1,239 @@
 
 
 
 
 
 
1
  <div align="center">
2
- <img src="https://raw.githubusercontent.com/AI4Bread/.github/main/goumang_logoall.png" width="600"/>
3
- <br /><br />
4
 
 
 
 
 
 
 
 
 
 
 
5
 
6
- 🔍 Explore our models on
7
- [![Static Badge](https://img.shields.io/badge/-gery?style=social&label=🤗%20Huggingface)](https://huggingface.co/AI4Bread/GouMang/)
8
-
9
 
10
  </div>
11
 
12
- # GouMang Agriculture Large Language Model
 
 
 
 
 
 
13
 
14
- ## This is the official repository of GouMang Agriculture Large Language Model.
15
 
16
- ## We used Xtuner Framework to train and finetune the model.
17
 
 
18
 
19
- ## 🎉 News
20
 
21
- - **\[2024/06\]** [GouMang_7B](https://huggingface.co/AI4Bread/GouMang) is released! Click [here](https://huggingface.co/AI4Bread/GouMang) for details!
22
- - **\[2024/06\]** Support [Llama 3](xtuner/configs/llama) models!
23
 
24
- ## Usage
25
 
 
26
 
27
- ## DEMO
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
- Install the dependencies required for the web demo
30
 
31
- lmdeploy 没有安装,我们接下来手动安装一下,建议安装最新的稳定版。
32
- 如果是在 InternStudio 开发环境,需要先运行下面的命令,否则会报错。
 
33
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
  ```bash
36
- # 解决 ModuleNotFoundError: No module named 'packaging' 问题
37
- pip install packaging
38
- # 使用 flash_attn 的预编译包解决安装过慢问题
39
- pip install /root/share/wheels/flash_attn-2.4.2+cu118torch2.0cxx11abiTRUE-cp310-cp310-linux_x86_64.whl
40
  ```
41
 
 
 
 
 
 
 
 
 
42
  ```bash
43
- pip install 'lmdeploy[all]==v0.4.2'
 
 
 
 
 
 
 
 
44
  ```
45
- 由于默认安装的是 runtime 依赖包,但是我们这里还需要部署和量化,所以,这里选择 `[all]`。
46
 
 
47
 
48
- ### Model convert
49
 
50
- Convert lmdeploy TurboMind
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
 
52
  ```bash
53
- # 转换模型(FastTransformer格式) TurboMind
54
- lmdeploy convert internlm-chat-7b /path/to/internlm-chat-7b
 
 
 
 
 
 
 
55
  ```
56
 
57
- 这里我们使用我们训练好的提供的模型文件,就在用户根目录执行,如下所示。
 
 
 
 
 
 
58
 
59
  ```bash
60
- lmdeploy convert internlm2-chat-7b /root/autodl-tmp/agri_intern/GouMang --tokenizer-path ./GouMang/tokenizer.json
 
61
  ```
62
 
63
- 执行完成后将会在当前目录生成一个 `workspace` 的文件夹。这里面包含的就是 TurboMind Triton “模型推理”需要到的文件。
 
 
 
 
 
64
 
65
- ### Chat Locally
66
 
67
  ```bash
68
  lmdeploy chat turbomind ./workspace
69
  ```
70
 
71
- ### TurboMind Inference + API Service
72
 
73
- 在上面的部分我们尝试了直接用命令行启动 Client,接下来我们尝试如何运用 lmdepoy 进行服务化。
74
 
75
- ”模型推理/服务“目前提供了 Turbomind TritonServer 两种服务化方式。此时,Server TurboMind TritonServerAPI Server 可以提供对外的 API 服务。我们推荐使用 TurboMind,TritonServer 使用方式详见《附录1》。
76
 
77
- 首先,通过下面命令启动服务。
78
 
79
 
80
  ```bash
81
  # ApiServer+Turbomind api_server => AsyncEngine => TurboMind
82
  lmdeploy serve api_server ./workspace \
83
- --server_name 0.0.0.0 \
84
  --server-port 23333 \
85
- --instance_num 64 \
86
  --tp 1
87
  ```
88
 
89
- 上面的参数中 `server_name` `server_port` 分别表示服务地址和端口,`tp` 参数我们之前已经提到过了,表示 Tensor 并行。还剩下一个 `instance_num` 参数,表示实例数,可以理解成 Batch 的大小。执行后如下图所示。
90
 
91
- ### 2.4 网页 Demo 演示
92
 
93
- 这一部分主要是将 Gradio 作为前端 Demo 演示。在上一节的基础上,我们不执行后面的 `api_client` 或 `triton_client`,而是执行 `gradio`。
94
 
95
- > 由于 Gradio 需要本地访问展示界面,因此也需要通过 ssh 将数据转发到本地。命令如下:
 
 
 
 
 
96
  >
97
- > ssh -CNg -L 6006:127.0.0.1:6006 [email protected] -p <你的 ssh 端口号>
98
 
99
- #### 2.4.1 TurboMind 服务作为后端
100
 
101
- API Server 的启动和上一节一样,这里直接启动作为前端的 Gradio
102
 
103
  ```bash
104
- # Gradio+ApiServer。必须先开启 Server,此时 Gradio Client
105
  lmdeploy serve gradio http://0.0.0.0:23333 --server-port 6006
106
  ```
107
 
108
- #### 2.4.2 TurboMind 推理作为后端
109
 
110
- 当然,Gradio 也可以直接和 TurboMind 连接,如下所示。
111
 
112
  ```bash
113
  # Gradio+Turbomind(local)
114
  lmdeploy serve gradio ./workspace
115
  ```
116
 
117
- 可以直接启动 Gradio,此时没有 API ServerTurboMind 直接与 Gradio 通信。
118
 
 
119
 
 
120
 
121
  ```bash
122
  pip install streamlit==1.24.0
@@ -124,15 +241,253 @@ pip install streamlit==1.24.0
124
 
125
  Download the [GouMang](https://huggingface.co/AI4Bread/GouMang) project model (please Star if you like it)
126
 
 
 
 
 
 
127
 
128
  Replace the model path in `web_demo.py` with the path where the downloaded parameters of `GouMang` are stored
129
 
130
  Run the `web_demo.py` file in the directory, and after entering the following command, [**check this tutorial 5.2 for local port configuration**](https://github.com/InternLM/tutorial/blob/main/helloworld/hello_world.md#52-%E9%85%8D%E7%BD%AE%E6%9C%AC%E5%9C%B0%E7%AB%AF%E5%8F%A3),to map the port to your local machine. Enter `http://127.0.0.1:6006` in your local browser.
131
 
132
  ```
133
- streamlit run /root/personal_assistant/code/InternLM/web_demo.py --server.address 127.0.0.1 --server.port 6006
134
  ```
135
 
136
  Note: The model will load only after you open the `http://127.0.0.1:6006` page in your browser.
137
- Once the model is loaded, you can start conversing with GouMang.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
138
 
 
 
1
+ ---
2
+ pipeline_tag: text-generation
3
+ license: other
4
+ ---
5
+ # XiXiLM
6
+
7
  <div align="center">
 
 
8
 
9
+ <img src="https://github.com/AI4Bread/GouMang/blob/main/assets/goumang_logoallnew.png?raw=true" width="600"/>
10
+ <div>&nbsp;</div>
11
+ <div align="center">
12
+ <!-- <b><font size="5">XiXiLM</font></b> -->
13
+ <sup>
14
+ <a href="http://www.ai4bread.com">
15
+ </a>
16
+ </sup>
17
+ <div>&nbsp;</div>
18
+ </div>
19
 
20
+
21
+ [💻Github Repo](https://github.com/AI4Bread/GouMang) • [🤔Reporting Issues](https://github.com/AI4Bread/GouMang/issues) • [📜Technical Report](https://github.com/AI4Bread)
 
22
 
23
  </div>
24
 
25
+ <p align="center">
26
+ 👋 join us on <a href="https://github.com/AI4Bread/GouMang" target="_blank">Github</a>
27
+ </p>
28
+
29
+
30
+
31
+ ## Introduction
32
 
33
+ XiXiLM(GouMang LLM) has open-sourced a 7 billion parameter base model and a chat model tailored for agricultural scenarios. The model has the following characteristics:
34
 
35
+ 1. **High Professionalism**: XiXiLM focuses on the agricultural field, providing professional and accurate answers especially in areas such as tuber crop cultivation, pest and disease control, and soil management.
36
 
37
+ 2. **Academic Support**: The model is based on the latest agricultural research findings, capable of providing academic-level answers to help researchers and agricultural practitioners gain a deeper understanding of agricultural issues.
38
 
39
+ 3. **Multilingual Support**: Supports both Chinese and English languages, making it convenient for users both domestically and internationally.
40
 
41
+ 4. **Free Commercial Use**: The model weights are fully open, supporting not only academic research but also allowing **free** commercial usage. Users can use the model in commercial projects for free, lowering the usage threshold.
 
42
 
43
+ 5. **Efficient Training**: Employs advanced training algorithms and techniques, enabling the model to respond quickly to user inquiries and provide efficient Q&A services.
44
 
45
+ 6. **Continuous Optimization**: The model will be continuously optimized based on user feedback and the latest research findings, constantly improving the quality and coverage of its answers.
46
 
47
+ ## XiXiLM-Qwen-14B
48
+
49
+
50
+ **Limitations:** Although we have made efforts to ensure the safety of the model during the training process and to
51
+ encourage the model to generate text that complies with ethical and legal requirements, the model may still produce unexpected
52
+ outputs due to its size and probabilistic generation paradigm. For example, the generated responses may contain biases, discrimination,
53
+ or other harmful content. Please do not propagate such content. We are not responsible for any consequences resulting from the
54
+ dissemination of harmful information.
55
+
56
+ ### Import from Transformers
57
+
58
+ To load the XiXiLM model using Transformers, use the following code:
59
+
60
+ ```python
61
+ import torch
62
+ from transformers import AutoTokenizer, AutoModelForCausalLM
63
+ tokenizer = AutoTokenizer.from_pretrained("AI4Bread/XiXi_Qwen_base_14b", trust_remote_code=True)
64
+ # Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and cause OOM Error.
65
+ model = AutoModelForCausalLM.from_pretrained("AI4Bread/XiXi_Qwen_base_14b", torch_dtype=torch.float16, trust_remote_code=True).cuda()
66
+ model = model.eval()
67
+ response, history = model.chat(tokenizer, "你好", history=[])
68
+ print(response)
69
+ # Hello! How can I help you today?
70
+ response, history = model.chat(tokenizer, "马铃薯育种有什么注意事项?需要注意什么呢?", history=history)
71
+ print(response)
72
+ ```
73
 
74
+ The responses can be streamed using `stream_chat`:
75
 
76
+ ```python
77
+ import torch
78
+ from transformers import AutoModelForCausalLM, AutoTokenizer
79
 
80
+ model_path = "AI4Bread/XiXi_Qwen_base_14b"
81
+ model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16, trust_remote_code=True).cuda()
82
+ tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
83
+
84
+ model = model.eval()
85
+ length = 0
86
+ for response, history in model.stream_chat(tokenizer, "Hello", history=[]):
87
+ print(response[length:], flush=True, end="")
88
+ length = len(response)
89
+ ```
90
+
91
+
92
+ ## Deployment
93
+
94
+ ### LMDeploy
95
+
96
+ LMDeploy is a toolkit for compressing, deploying, and serving LLM, developed by the MMRazor and MMDeploy teams.
97
 
98
  ```bash
99
+ pip install lmdeploy
 
 
 
100
  ```
101
 
102
+ Or you can launch an OpenAI compatible server with the following command:
103
+
104
+ ```bash
105
+ lmdeploy serve api_server internlm/internlm2-chat-7b --model-name internlm2-chat-7b --server-port 23333
106
+ ```
107
+
108
+ Then you can send a chat request to the server:
109
+
110
  ```bash
111
+ curl http://localhost:23333/v1/chat/completions \
112
+ -H "Content-Type: application/json" \
113
+ -d '{
114
+ "model": "internlm2-chat-7b",
115
+ "messages": [
116
+ {"role": "system", "content": "你是一个专业的农业专家"},
117
+ {"role": "user", "content": "马铃薯种植的时候有哪些注意事项?"}
118
+ ]
119
+ }'
120
  ```
 
121
 
122
+ The output be like:
123
 
124
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/658a3c4cbbb04840e3ce7e2c/NPdRr5Y5l5E0m0URCVZ1f.png)
125
 
126
+
127
+
128
+ Find more details in the [LMDeploy documentation](https://lmdeploy.readthedocs.io/en/latest/)
129
+
130
+ ### vLLM
131
+
132
+ Launch OpenAI compatible server with `vLLM>=0.3.2`:
133
+
134
+ ```bash
135
+ pip install vllm
136
+ ```
137
+
138
+ ```bash
139
+ python -m vllm.entrypoints.openai.api_server --model internlm/internlm2-chat-7b --served-model-name internlm2-chat-7b --trust-remote-code
140
+ ```
141
+
142
+ Then you can send a chat request to the server:
143
 
144
  ```bash
145
+ curl http://localhost:8000/v1/chat/completions \
146
+ -H "Content-Type: application/json" \
147
+ -d '{
148
+ "model": "internlm2-chat-7b",
149
+ "messages": [
150
+ {"role": "system", "content": "You are a professional agriculture expert."},
151
+ {"role": "user", "content": "Introduce potato farming to me."}
152
+ ]
153
+ }'
154
  ```
155
 
156
+ Find more details in the [vLLM documentation](https://docs.vllm.ai/en/latest/index.html)
157
+
158
+ ## Used local trained model
159
+
160
+ ### First: Convert lmdeploy TurboMind
161
+
162
+ Here, we will use our pre-trained model file and execute the conversion in the user's root directory, as shown below.
163
 
164
  ```bash
165
+ # Converting Model to TurboMind (FastTransformer Format)
166
+ lmdeploy convert internlm2-chat-7b /root/autodl-tmp/agri_intern/XiXiLM --tokenizer-path ./GouMang/tokenizer.json
167
  ```
168
 
169
+ After execution, a workspace folder will be generated in the current directory.
170
+ This folder contains the necessary files for TurboMind and Triton "Model Inference." as shown below:
171
+
172
+
173
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/658a3c4cbbb04840e3ce7e2c/CqdwhshIL8xxjog_WD_St.png)
174
+
175
 
176
+ ### Second: Chat Locally
177
 
178
  ```bash
179
  lmdeploy chat turbomind ./workspace
180
  ```
181
 
182
+ ### Third(Optional): TurboMind Inference + API Service
183
 
184
+ In the previous section, we tried starting the Client directly using the command line. Now, we will attempt to use lmdeploy for service deployment.
185
 
186
+ The "Model Inference/Service" currently offers two service deployment methods: TurboMind and TritonServer. In this case, the Server is either TurboMind or TritonServer, and the API Server can provide external API services. We recommend using TurboMind.
187
 
188
+ First, start the service with the following command:
189
 
190
 
191
  ```bash
192
  # ApiServer+Turbomind api_server => AsyncEngine => TurboMind
193
  lmdeploy serve api_server ./workspace \
194
+ --server-name 0.0.0.0 \
195
  --server-port 23333 \
 
196
  --tp 1
197
  ```
198
 
199
+ In the above parameters, `server_name` and `server_port` indicate the service address and port, respectively. The `tp` parameter, as mentioned earlier, stands for Tensor Parallelism.
200
 
201
+ After this, users can start the Web Service as described in [TurboMind Service as the Backend](#--turbomind-service-as-the-backend).
202
 
203
+ ## Web Service Startup Method 1:
204
 
205
+ ### Starting the Service with Gradio
206
+
207
+ This section demonstrates using Gradio as a front-end demo.
208
+
209
+ > Since Gradio requires local access to display the interface,
210
+ > you also need to forward the data to your local machine via SSH. The command is as follows:
211
  >
212
+ > ssh -CNg -L 6006:127.0.0.1:6006 [email protected] -p <your ssh port>
213
 
214
+ #### --TurboMind Service as the Backend
215
 
216
+ The API Server is started the same way as in the previous section. Here, we directly start Gradio as the front-end.
217
 
218
  ```bash
219
+ # Gradio+ApiServer. The Server must be started first, and Gradio acts as the Client
220
  lmdeploy serve gradio http://0.0.0.0:23333 --server-port 6006
221
  ```
222
 
223
+ #### --Other way(Recommended!!!)
224
 
225
+ Of course, Gradio can also connect directly with TurboMind, as shown below
226
 
227
  ```bash
228
  # Gradio+Turbomind(local)
229
  lmdeploy serve gradio ./workspace
230
  ```
231
 
232
+ You can start Gradio directly. In this case, there is no API Server, and TurboMind communicates directly with Gradio.
233
 
234
+ ## Web Service Startup Method 2:
235
 
236
+ ### Starting the Service with Streamlit
237
 
238
  ```bash
239
  pip install streamlit==1.24.0
 
241
 
242
  Download the [GouMang](https://huggingface.co/AI4Bread/GouMang) project model (please Star if you like it)
243
 
244
+ ```bash
245
+ git clone https://github.com/AI4Bread/GouMang.git
246
+ cd GouMang
247
+ ```
248
+
249
 
250
  Replace the model path in `web_demo.py` with the path where the downloaded parameters of `GouMang` are stored
251
 
252
  Run the `web_demo.py` file in the directory, and after entering the following command, [**check this tutorial 5.2 for local port configuration**](https://github.com/InternLM/tutorial/blob/main/helloworld/hello_world.md#52-%E9%85%8D%E7%BD%AE%E6%9C%AC%E5%9C%B0%E7%AB%AF%E5%8F%A3),to map the port to your local machine. Enter `http://127.0.0.1:6006` in your local browser.
253
 
254
  ```
255
+ streamlit run web_demo.py --server.address 127.0.0.1 --server.port 6006
256
  ```
257
 
258
  Note: The model will load only after you open the `http://127.0.0.1:6006` page in your browser.
259
+ Once the model is loaded, you can start conversing with GouMang like this.
260
+
261
+
262
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/658a3c4cbbb04840e3ce7e2c/VcuSpAKrRGY1HP1mwLGI6.png)
263
+
264
+
265
+ ## Open Source License
266
+
267
+ The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow **free** commercial usage. To apply for a commercial license, please fill in the <a href="https://wj.qq.com/s2/14897739/e871/" target="_blank">申请表(中文)</a>. For other questions or collaborations, please contact <[email protected]>.
268
+
269
+ ## Citation
270
+
271
+
272
+
273
+ ## 简介
274
+
275
+ XiXiLM ,即西西大模型(又名:句芒大模型),开源了面向农业问答的大模型。模型具有以下特点:
276
+
277
+ 1. **专业性强**:XiXiLM 专注于农业领域,特别是薯类作物的种植、病虫害防治、土壤管理等方面,提供专业、精准的解答。
278
+
279
+ 2. **学术化支持**:模型基于最新的农业研究成果,能够提供学术化的回答,帮助研究人员和农业从业者深入理解农业问题。
280
+
281
+ 3. **多语言支持**:支持中文和英文两种语言,方便国内外用户使用。
282
+
283
+ 4. **免费商业使用**:模型权重完全开放,不仅支持学术研究,还允许**申请**商业使用。用户可以在商业项目中免费使用该模型,降低了使用门槛。
284
+
285
+ 5. **高效训练**:采用先进的训练算法和技术,使得模型能够快速响应用户提问,提供高效的问答服务。
286
+
287
+ 6. **持续优化**:模型会根据用户反馈和最新研究成果进行持续优化,不断提升问答质量和覆盖面。
288
+
289
+
290
+ ## XiXiLM-Qwen-14B
291
+
292
+
293
+ **局限性:** 尽管在训练过程中我们非常注重模型的安全性,尽力促使模型输出符合伦理和法律要求的文本,但���限于模型大小以及概率生成范式,模型可能会产生各种不符合预期的输出,例如回复内容包含偏见、歧视等有害内容,请勿传播这些内容。由于传播不良信息导致的任何后果,本项目不承担责任。
294
+
295
+ ### 通过 Transformers 加载
296
+
297
+ 通过以下的代码加载 InternLM2 7B Chat 模型
298
+
299
+ ```python
300
+ import torch
301
+ from transformers import AutoTokenizer, AutoModelForCausalLM
302
+ tokenizer = AutoTokenizer.from_pretrained("AI4Bread/XiXi_Qwen_base_14b", trust_remote_code=True)
303
+ # Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and cause OOM Error.
304
+ model = AutoModelForCausalLM.from_pretrained("AI4Bread/XiXi_Qwen_base_14b", torch_dtype=torch.float16, trust_remote_code=True).cuda()
305
+ model = model.eval()
306
+ response, history = model.chat(tokenizer, "你好", history=[])
307
+ print(response)
308
+ # Hello! How can I help you today?
309
+ response, history = model.chat(tokenizer, "马铃薯育种有什么注意事项?需要注意什么呢?", history=history)
310
+ print(response)
311
+ ```
312
+
313
+ 如果想进行流式生成,则可以使用 `stream_chat` 接口:
314
+
315
+ ```python
316
+ import torch
317
+ from transformers import AutoModelForCausalLM, AutoTokenizer
318
+
319
+ model_path = "AI4Bread/XiXi_Qwen_base_14b"
320
+ model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16, trust_remote_code=True).cuda()
321
+ tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
322
+
323
+ model = model.eval()
324
+ length = 0
325
+ for response, history in model.stream_chat(tokenizer, "马铃薯育种有什么注意事项?需要注意什么呢?", history=[]):
326
+ print(response[length:], flush=True, end="")
327
+ length = len(response)
328
+ ```
329
+
330
+ ## 部署
331
+
332
+ ### LMDeploy
333
+
334
+ LMDeploy 由 MMDeploy 和 MMRazor 团队联合开发,是涵盖了 LLM 任务的全套轻量化、部署和服务解决方案。
335
+
336
+ ```bash
337
+ pip install lmdeploy
338
+ ```
339
+
340
+ 你可以使用以下命令启动兼容 OpenAI API 的服务:
341
+
342
+ ```bash
343
+ lmdeploy serve api_server internlm/internlm2-chat-7b --server-port 23333
344
+ ```
345
+
346
+ 然后你可以向服务端发起一个聊天请求:
347
+
348
+ ```bash
349
+ curl http://localhost:23333/v1/chat/completions \
350
+ -H "Content-Type: application/json" \
351
+ -d '{
352
+ "model": "internlm2-chat-7b",
353
+ "messages": [
354
+ {"role": "system", "content": "你是一个专业的农业专家"},
355
+ {"role": "user", "content": "马铃薯种植的时候有哪些注意事项?"}
356
+ ]
357
+ }'
358
+ ```
359
+
360
+ 更多信息请查看 [LMDeploy 文档](https://lmdeploy.readthedocs.io/en/latest/)
361
+
362
+ ### vLLM
363
+
364
+ 使用`vLLM>=0.3.2`启动兼容 OpenAI API 的服务:
365
+
366
+ ```bash
367
+ pip install vllm
368
+ ```
369
+
370
+ ```bash
371
+ python -m vllm.entrypoints.openai.api_server --model internlm/internlm2-chat-7b --trust-remote-code
372
+ ```
373
+
374
+ 然后你可以向服务端发起一个聊天请求:
375
+
376
+ ```bash
377
+ curl http://localhost:8000/v1/chat/completions \
378
+ -H "Content-Type: application/json" \
379
+ -d '{
380
+ "model": "internlm2-chat-7b",
381
+ "messages": [
382
+ {"role": "system", "content": "你是一个专业的农业专家."},
383
+ {"role": "user", "content": "请给我介绍一下马铃薯育种."}
384
+ ]
385
+ }'
386
+ ```
387
+
388
+ 更多信息请查看 [vLLM 文档](https://docs.vllm.ai/en/latest/index.html)
389
+
390
+ ## 使用本地训练模型
391
+
392
+ ### 第一步:转换为 lmdeploy TurboMind 格式
393
+
394
+ 这里,我们将使用预训练的模型文件,并在用户的根目录下执行转换,如下所示。
395
+
396
+ ```bash
397
+ # 将模型转换为 TurboMind (FastTransformer 格式)
398
+ lmdeploy convert internlm2-chat-7b /root/autodl-tmp/agri_intern/XiXiLM --tokenizer-path ./GouMang/tokenizer.json
399
+ ```
400
+
401
+ 执行完毕后,当前目录下将生成一个 workspace 文件夹。
402
+ 这个文件夹包含 TurboMind 和 Triton “模型推理”所需的文件,如下所示:
403
+
404
+
405
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/658a3c4cbbb04840e3ce7e2c/CqdwhshIL8xxjog_WD_St.png)
406
+
407
+
408
+ ### 第二步:本地聊天
409
+
410
+ ```bash
411
+ lmdeploy chat turbomind ./workspace
412
+ ```
413
+
414
+ ### 第三步(可选):TurboMind 推理 + API 服务
415
+
416
+ 在前一部分中,我们尝试通过命令行直接启动客户端。现在,我们将尝试使用 lmdeploy 进行服务部署。
417
+
418
+ “模型推理/服务”目前提供两种服务部署方式:TurboMind 和 TritonServer。在这种情况下,服务器可以是 TurboMind 或 TritonServer,而 API 服务器可以提供外部 API 服务。我们推荐使用 TurboMind。
419
+
420
+ 首先,使用以下命令启动服务:
421
+
422
+ ```bash
423
+ # ApiServer+Turbomind api_server => AsyncEngine => TurboMind
424
+ lmdeploy serve api_server ./workspace \
425
+ --server-name 0.0.0.0 \
426
+ --server-port 23333 \
427
+ --tp 1
428
+ ```
429
+
430
+ 在上述参数中,server_name 和 server_port 分别表示服务地址和端口。tp 参数如前所述代表 Tensor 并行性。
431
+
432
+ 之后,用户可以按照[TurboMind Service as the Backend](#--turbomind-service-as-the-backend) 中描述的启动 Web 服务。
433
+
434
+
435
+
436
+ ## 网页服务启动方式1:
437
+
438
+ ### Gradio 方式启动服务
439
+
440
+ 这一部分主要是将 Gradio 作为前端 Demo 演示���在上一节的基础上,我们不执行后面的 `api_client` 或 `triton_client`,而是执行 `gradio`。
441
+ 请参考[LMDeploy](#lmdeploy)部分获取详细信息。
442
+
443
+ > 由于 Gradio 需要本地访问展示界面,因此也需要通过 ssh 将数据转发到本地。命令如下:
444
+ >
445
+ > ssh -CNg -L 6006:127.0.0.1:6006 [email protected] -p <你的 ssh 端口号>
446
+
447
+ #### --TurboMind 服务作为后端
448
+
449
+ 直接启动作为前端的 Gradio。
450
+
451
+ ```bash
452
+ # Gradio+ApiServer。必须先开启 Server,此时 Gradio 为 Client
453
+ lmdeploy serve gradio http://0.0.0.0:23333 --server-port 6006
454
+ ```
455
+
456
+ #### --其他方式(推荐!!!)
457
+
458
+ 当然,Gradio 也可以直接和 TurboMind 连接,如下所示。
459
+
460
+ ```bash
461
+ # Gradio+Turbomind(local)
462
+ lmdeploy serve gradio ./workspace
463
+ ```
464
+
465
+ 可以直接启动 Gradio,此时没有 API Server,TurboMind 直接与 Gradio 通信。
466
+
467
+ ## 网页服务启动方式2:
468
+
469
+ ### Streamlit 方式启动服务:
470
+
471
+ 下载 [GouMang](https://huggingface.co/AI4Bread/GouMang) 项目模型(如果喜欢请给个 Star)
472
+
473
+ ```bash
474
+ git clone https://github.com/AI4Bread/GouMang.git
475
+ cd GouMang
476
+ ```
477
+
478
+ 将 `web_demo.py` 中的模型路径替换为下载的 `GouMang` 参数存储路径
479
+
480
+ 在目录中运行 `web_demo.py` 文件,并在输入以下命令后,[**查看本教程 5.2 以配置本地端口**](https://github.com/InternLM/tutorial/blob/main/helloworld/hello_world.md#52-%E9%85%8D%E7%BD%AE%E6%9C%AC%E5%9C%B0%E7%AB%AF%E5%8F%A3),将端口映射到本地。在本地浏览器中输入 `http://127.0.0.1:6006`。
481
+
482
+ ```
483
+ streamlit run /root/personal_assistant/code/InternLM/web_demo.py --server.address 127.0.0.1 --server.port 6006
484
+ ```
485
+
486
+ 注意:只有在浏览器中打开 `http://127.0.0.1:6006` 页面后,模型才会加载。
487
+ 模型加载完成后,您就可以开始与 西西(句芒) 进行对话了。
488
+
489
+ ## 开源许可证
490
+
491
+ 本仓库的代码依照 Apache-2.0 协议开源。模型权重对学术研究完全开放,也可申请免费的商业使用授权(<a href="https://wj.qq.com/s2/14897739/e871/" target="_blank">申请表(中文)</a>)。其他问题与合作请联系 <[email protected]>。
492
 
493
+ ## 引用