Asher commited on
Commit
c864579
·
1 Parent(s): e568489

doc update: add china docker mirror for vllm.

Browse files
Files changed (2) hide show
  1. README.md +18 -3
  2. README_CN.md +18 -3
README.md CHANGED
@@ -221,15 +221,30 @@ trtllm-serve \
221
 
222
  ### vLLM
223
 
224
- #### Docker Image
225
- We provide a pre-built Docker image containing vLLM 0.8.5 with full support for this model. The official vllm release is currently under development, **note: cuda 12.8 is require for this docker**.
226
 
227
- - To get started:
228
 
 
 
 
229
  ```
230
  docker pull hunyuaninfer/hunyuan-infer-vllm-cuda12.4:v1
231
  ```
232
 
 
 
 
 
 
 
 
 
 
 
 
 
 
233
  - Download Model file:
234
  - Huggingface: will download automicly by vllm.
235
  - ModelScope: `modelscope download --model Tencent-Hunyuan/Hunyuan-A13B-Instruct`
 
221
 
222
  ### vLLM
223
 
224
+ #### Inference from Docker Image
225
+ We provide a pre-built Docker image containing vLLM 0.8.5 with full support for this model. The official vllm release is currently under development, **note: cuda 12.4 is require for this docker**.
226
 
 
227
 
228
+ - To Get Started, Download the Docker Image:
229
+
230
+ **From Docker Hub:**
231
  ```
232
  docker pull hunyuaninfer/hunyuan-infer-vllm-cuda12.4:v1
233
  ```
234
 
235
+ **From China Mirror(Thanks to [CNB](https://cnb.cool/ "CNB.cool")):**
236
+
237
+
238
+ First, pull the image from CNB:
239
+ ```
240
+ docker pull docker.cnb.cool/tencent/hunyuan/hunyuan-a13b/hunyuan-infer-vllm-cuda12.4:v1
241
+ ```
242
+
243
+ Then, rename the image to better align with the following scripts:
244
+ ```
245
+ docker tag docker.cnb.cool/tencent/hunyuan/hunyuan-a13b/hunyuan-infer-vllm-cuda12.4:v1 hunyuaninfer/hunyuan-infer-vllm-cuda12.4:v1
246
+ ```
247
+
248
  - Download Model file:
249
  - Huggingface: will download automicly by vllm.
250
  - ModelScope: `modelscope download --model Tencent-Hunyuan/Hunyuan-A13B-Instruct`
README_CN.md CHANGED
@@ -178,16 +178,31 @@ print(response)
178
 
179
  ## vLLM 部署
180
 
181
- ### Docker 镜像
182
 
183
  我们提供了一个基于官方 vLLM 0.8.5 版本的 Docker 镜像方便快速部署和测试。**注意:该镜像要求使用 CUDA 12.4 版本。**
184
 
185
- - 快速开始方式如下:
186
 
 
187
  ```
188
  docker pull hunyuaninfer/hunyuan-infer-vllm-cuda12.4:v1
189
  ```
190
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
191
  - 下载模型文件:
192
  - Huggingface:vLLM 会自动下载。
193
  - ModelScope:`modelscope download --model Tencent-Hunyuan/Hunyuan-A13B-Instruct`
@@ -238,7 +253,7 @@ docker run --rm --ipc=host \
238
 
239
  ### 模型上下文长度支持
240
 
241
- Hunyuan A13B 模型支持最大 **256K token(即 262,144 个位置)** 的上下文长度。但由于大多数 GPU 硬件配置的显存限制,默认 `config.json` 中将上下文长度限制为 **32K token**,以避免出现显存溢出(OOM)问题。
242
 
243
  #### 将上下文长度扩展至 256K
244
 
 
178
 
179
  ## vLLM 部署
180
 
181
+ ### Docker 镜像推理
182
 
183
  我们提供了一个基于官方 vLLM 0.8.5 版本的 Docker 镜像方便快速部署和测试。**注意:该镜像要求使用 CUDA 12.4 版本。**
184
 
185
+ - 首先,下载 Docker 镜像文件:
186
 
187
+ **从Docker Hub下载**:
188
  ```
189
  docker pull hunyuaninfer/hunyuan-infer-vllm-cuda12.4:v1
190
  ```
191
 
192
+ **中国国内镜像**:
193
+
194
+ 考虑到下载速度, 也可以选择从 CNB 下载镜像,感谢[CNB云原生构建](https://cnb.cool/)提供支持:
195
+
196
+ 1. 下载镜像
197
+ ```
198
+ docker pull docker.cnb.cool/tencent/hunyuan/hunyuan-a13b/hunyuan-infer-vllm-cuda12.4:v1
199
+ ```
200
+
201
+ 2. 然后更名镜像(可选,更好的和下面脚本名字匹配)
202
+ ```
203
+ docker tag docker.cnb.cool/tencent/hunyuan/hunyuan-a13b/hunyuan-infer-vllm-cuda12.4:v1 hunyuaninfer/hunyuan-infer-vllm-cuda12.4:v1
204
+ ```
205
+
206
  - 下载模型文件:
207
  - Huggingface:vLLM 会自动下载。
208
  - ModelScope:`modelscope download --model Tencent-Hunyuan/Hunyuan-A13B-Instruct`
 
253
 
254
  ### 模型上下文长度支持
255
 
256
+ Hunyuan A13B 模型支持最大 **256K token262,144 Token)** 的上下文长度。但由于大多数 GPU 硬件配置的显存限制,默认 `config.json` 中将上下文长度限制为 **32K token**,以避免出现显存溢出(OOM)问题。
257
 
258
  #### 将上下文长度扩展至 256K
259