cognitivecomputations/DeepSeek-R1-AWQ · Any one can run this model with SGlang framework？

21 days ago

•

I try to run this model with SGlang, but it is extremely slow. Does anyone have a good setting to run this model with SGlang?

v2ray

Cognitive Computations org 21 days ago

Try run this with vLLM, it is much faster.

Eric108

21 days ago

you can try this command python3 -m sglang.launch_server --host 0.0.0.0 --port 30000 --model-path models/DeepSeek-R1-AWQ --tp 8 --enable-p2p-check --trust-remote-code --dtype float16 --mem-fraction-static 0.95 --served-model-name deepseek-r1-awq --disable-cuda-graph with sglang==0.4.2
but the result is not as expected..I get empty content for some queries and the think is not complete

oliver0102

7 days ago

I managed to run this model with sglang
python3 -m sglang.launch_server --model-path /home/service/var/models/deepseek-r1-huggingface/DeepSeek-R1-AWQ/ --trust-remote-code --tp 8 --mem-fraction-static 0.8 --dtype float16 --host 0.0.0.0 --port 9000 --disable-radix --disable-custom-all-reduce --log-requests --cuda-graph-max-bs 16 --max-total-tokens 65536. It runs on my 8*H20 with 30t/s. However i have same issue as @Eric10 . Tried some parameter but finally give up. I am now trying to use q4 gguf model.