Any one can run this model with SGlang framework?

#13
by muziyongshixin - opened

I try to run this model with SGlang, but it is extremely slow. Does anyone have a good setting to run this model with SGlang?

Cognitive Computations org

Try run this with vLLM, it is much faster.

you can try this command python3 -m sglang.launch_server --host 0.0.0.0 --port 30000 --model-path models/DeepSeek-R1-AWQ --tp 8 --enable-p2p-check --trust-remote-code --dtype float16 --mem-fraction-static 0.95 --served-model-name deepseek-r1-awq --disable-cuda-graph with sglang==0.4.2
but the result is not as expected..I get empty content for some queries and the think is not complete

I managed to run this model with sglang
python3 -m sglang.launch_server --model-path /home/service/var/models/deepseek-r1-huggingface/DeepSeek-R1-AWQ/ --trust-remote-code --tp 8 --mem-fraction-static 0.8 --dtype float16 --host 0.0.0.0 --port 9000 --disable-radix --disable-custom-all-reduce --log-requests --cuda-graph-max-bs 16 --max-total-tokens 65536. It runs on my 8*H20 with 30t/s. However i have same issue as @Eric10 . Tried some parameter but finally give up. I am now trying to use q4 gguf model.

Sign up or log in to comment