8*a100 OUT OF MEMORY
#19
by
Jaren
- opened
VLLM_WORKER_MULTIPROC_METHOD=spawn python -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 12345 --max-model-len 65536 --max-num-batched-tokens 65536 --trust-remote-code --tensor-parallel-size 8 --gpu-memory-utilization 0.97 --dtype float16 --served-model-name deepseek-reasoner --model cognitivecomputations/DeepSeek-R1-AWQ
8*a100 OUT OF MEMORY
Decrease --gpu-memory-utilization
.
v2ray
changed discussion status to
closed