tencent
/

Hunyuan-A13B-Instruct-GPTQ-Int4

asherszhang commited on about 1 month ago

Commit

d01b83b

verified ·

1 Parent(s): 8a81bdc

update doc (#2)

Files changed (1) hide show

README.md CHANGED Viewed

@@ -168,7 +168,7 @@ docker run  --privileged --user root  --net=host --ipc=host \
         --gpus=all -it --entrypoint python  hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm
  \
          -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8000 \
-         --tensor-parallel-size 2 --model tencent/Hunyuan-A13B-Instruct-GPTQ-Int4 --trust-remote-code
 ```
@@ -177,14 +177,17 @@ model downloaded by modelscope:
 docker run  --privileged --user root  --net=host --ipc=host \
         -v ~/.cache/modelscope:/root/.cache/modelscope \
         --gpus=all -it --entrypoint python   hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm \
-         -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --tensor-parallel-size 2 --port 8000 \
          --model /root/.cache/modelscope/hub/models/Tencent-Hunyuan/Hunyuan-A13B-Instruct-GPTQ-Int4/ --trust_remote_code
 ```
 ### SGLang
-Support for INT4 quantization on sglang is in progress and will be available in a future update.
 ## Contact Us

         --gpus=all -it --entrypoint python  hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm
  \
          -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8000 \
+         --tensor-parallel-size 2 --quantization gptq_marlin --model tencent/Hunyuan-A13B-Instruct-GPTQ-Int4 --trust-remote-code
 ```
 docker run  --privileged --user root  --net=host --ipc=host \
         -v ~/.cache/modelscope:/root/.cache/modelscope \
         --gpus=all -it --entrypoint python   hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm \
+         -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --quantization gptq_marlin --tensor-parallel-size 2 --port 8000 \
          --model /root/.cache/modelscope/hub/models/Tencent-Hunyuan/Hunyuan-A13B-Instruct-GPTQ-Int4/ --trust_remote_code
 ```
+### TensorRT-LLM
+Support for INT4 quantization on TensorRT-LLM for this model is in progress and will be available in a future update.
 ### SGLang
+Support for INT4 quantization on sglang  for this model is in progress and will be available in a future update.
 ## Contact Us