Can not host as vLLM ?

#17
by tommywu052 - opened

when I try to host that as API endpoint, vllm serve "nvidia/NVLM-D-72B" --trust-remote-code
it will throw the error as
Model architectures ['NVLM_D'] are not supported for now

image.png

NVIDIA org

We currently do not support vLLM but are actively working on integrating NVLM with vLLM. Our team is committed to delivering this support as soon as possible.

Thanks,
Boxin

Seems supported in this pr https://github.com/vllm-project/vllm/pull/9045, but not yet released.
Installing vllm from the latest code might work.

checking out from latest main and building docker image worked for me

Thanks guys. It work well after install the latest wheel -
pip install https://vllm-wheels.s3.us-west-2.amazonaws.com/nightly/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl
and serve as
vllm serve nvidia/NVLM-D-72B --tensor-parallel-size 4 --enforce-eager --max-num-seqs 16 --trust-remote-code

image.png

Can you please help with the specs on which this was run? I tried run on ColabPro with A100 and it did not work.
Thanks in advance.

ValueError: The number of required GPUs exceeds the total number of available GPUs in the placement group.
Traceback (most recent call last):
File "/usr/local/bin/vllm", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/vllm/scripts.py", line 195, in main
args.dispatch_function(args)
File "/usr/local/lib/python3.10/dist-packages/vllm/scripts.py", line 41, in serve
uvloop.run(run_server(args))
File "/usr/local/lib/python3.10/dist-packages/uvloop/init.py", line 82, in run
return loop.run_until_complete(wrapper())
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/usr/local/lib/python3.10/dist-packages/uvloop/init.py", line 61, in wrapper
return await main
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 552, in run_server
async with build_async_engine_client(args) as engine_client:
File "/usr/lib/python3.10/contextlib.py", line 199, in aenter
return await anext(self.gen)
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 107, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
File "/usr/lib/python3.10/contextlib.py", line 199, in aenter
return await anext(self.gen)
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 194, in build_async_engine_client_from_engine_args
raise RuntimeError(
RuntimeError: Engine process failed to start

@Malini
You might need 4 * A100 to get this running or try --cpu-offload-gb

Sign up or log in to comment