update doc (#2)
Browse files- update doc (948e645df6573cbfa9a4d03b628e14866ab55a61)
Co-authored-by: asher <[email protected]>
README.md
CHANGED
@@ -168,7 +168,7 @@ docker run --privileged --user root --net=host --ipc=host \
|
|
168 |
--gpus=all -it --entrypoint python hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm
|
169 |
\
|
170 |
-m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8000 \
|
171 |
-
--tensor-parallel-size 2 --model tencent/Hunyuan-A13B-Instruct-GPTQ-Int4 --trust-remote-code
|
172 |
|
173 |
```
|
174 |
|
@@ -177,14 +177,17 @@ model downloaded by modelscope:
|
|
177 |
docker run --privileged --user root --net=host --ipc=host \
|
178 |
-v ~/.cache/modelscope:/root/.cache/modelscope \
|
179 |
--gpus=all -it --entrypoint python hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm \
|
180 |
-
-m vllm.entrypoints.openai.api_server --host 0.0.0.0 --tensor-parallel-size 2 --port 8000 \
|
181 |
--model /root/.cache/modelscope/hub/models/Tencent-Hunyuan/Hunyuan-A13B-Instruct-GPTQ-Int4/ --trust_remote_code
|
182 |
```
|
183 |
|
|
|
|
|
|
|
184 |
|
185 |
### SGLang
|
186 |
|
187 |
-
Support for INT4 quantization on sglang is in progress and will be available in a future update.
|
188 |
|
189 |
## Contact Us
|
190 |
|
|
|
168 |
--gpus=all -it --entrypoint python hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm
|
169 |
\
|
170 |
-m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8000 \
|
171 |
+
--tensor-parallel-size 2 --quantization gptq_marlin --model tencent/Hunyuan-A13B-Instruct-GPTQ-Int4 --trust-remote-code
|
172 |
|
173 |
```
|
174 |
|
|
|
177 |
docker run --privileged --user root --net=host --ipc=host \
|
178 |
-v ~/.cache/modelscope:/root/.cache/modelscope \
|
179 |
--gpus=all -it --entrypoint python hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm \
|
180 |
+
-m vllm.entrypoints.openai.api_server --host 0.0.0.0 --quantization gptq_marlin --tensor-parallel-size 2 --port 8000 \
|
181 |
--model /root/.cache/modelscope/hub/models/Tencent-Hunyuan/Hunyuan-A13B-Instruct-GPTQ-Int4/ --trust_remote_code
|
182 |
```
|
183 |
|
184 |
+
### TensorRT-LLM
|
185 |
+
|
186 |
+
Support for INT4 quantization on TensorRT-LLM for this model is in progress and will be available in a future update.
|
187 |
|
188 |
### SGLang
|
189 |
|
190 |
+
Support for INT4 quantization on sglang for this model is in progress and will be available in a future update.
|
191 |
|
192 |
## Contact Us
|
193 |
|