Any one can run this model with SGlang framework?
I try to run this model with SGlang, but it is extremely slow. Does anyone have a good setting to run this model with SGlang?
Try run this with vLLM, it is much faster.
you can try this command python3 -m sglang.launch_server --host 0.0.0.0 --port 30000 --model-path models/DeepSeek-R1-AWQ --tp 8 --enable-p2p-check --trust-remote-code --dtype float16 --mem-fraction-static 0.95 --served-model-name deepseek-r1-awq --disable-cuda-graph
with sglang==0.4.2
but the result is not as expected..I get empty content for some queries and the think
is not complete
I managed to run this model with sglangpython3 -m sglang.launch_server --model-path /home/service/var/models/deepseek-r1-huggingface/DeepSeek-R1-AWQ/ --trust-remote-code --tp 8 --mem-fraction-static 0.8 --dtype float16 --host 0.0.0.0 --port 9000 --disable-radix --disable-custom-all-reduce --log-requests --cuda-graph-max-bs 16 --max-total-tokens 65536
. It runs on my 8*H20 with 30t/s. However i have same issue as
@Eric10
. Tried some parameter but finally give up. I am now trying to use q4 gguf model.