manaestras asherszhang commited on
Commit
d01b83b
·
verified ·
1 Parent(s): 8a81bdc

update doc (#2)

Browse files

- update doc (948e645df6573cbfa9a4d03b628e14866ab55a61)


Co-authored-by: asher <[email protected]>

Files changed (1) hide show
  1. README.md +6 -3
README.md CHANGED
@@ -168,7 +168,7 @@ docker run --privileged --user root --net=host --ipc=host \
168
  --gpus=all -it --entrypoint python hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm
169
  \
170
  -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8000 \
171
- --tensor-parallel-size 2 --model tencent/Hunyuan-A13B-Instruct-GPTQ-Int4 --trust-remote-code
172
 
173
  ```
174
 
@@ -177,14 +177,17 @@ model downloaded by modelscope:
177
  docker run --privileged --user root --net=host --ipc=host \
178
  -v ~/.cache/modelscope:/root/.cache/modelscope \
179
  --gpus=all -it --entrypoint python hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm \
180
- -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --tensor-parallel-size 2 --port 8000 \
181
  --model /root/.cache/modelscope/hub/models/Tencent-Hunyuan/Hunyuan-A13B-Instruct-GPTQ-Int4/ --trust_remote_code
182
  ```
183
 
 
 
 
184
 
185
  ### SGLang
186
 
187
- Support for INT4 quantization on sglang is in progress and will be available in a future update.
188
 
189
  ## Contact Us
190
 
 
168
  --gpus=all -it --entrypoint python hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm
169
  \
170
  -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8000 \
171
+ --tensor-parallel-size 2 --quantization gptq_marlin --model tencent/Hunyuan-A13B-Instruct-GPTQ-Int4 --trust-remote-code
172
 
173
  ```
174
 
 
177
  docker run --privileged --user root --net=host --ipc=host \
178
  -v ~/.cache/modelscope:/root/.cache/modelscope \
179
  --gpus=all -it --entrypoint python hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm \
180
+ -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --quantization gptq_marlin --tensor-parallel-size 2 --port 8000 \
181
  --model /root/.cache/modelscope/hub/models/Tencent-Hunyuan/Hunyuan-A13B-Instruct-GPTQ-Int4/ --trust_remote_code
182
  ```
183
 
184
+ ### TensorRT-LLM
185
+
186
+ Support for INT4 quantization on TensorRT-LLM for this model is in progress and will be available in a future update.
187
 
188
  ### SGLang
189
 
190
+ Support for INT4 quantization on sglang for this model is in progress and will be available in a future update.
191
 
192
  ## Contact Us
193