Can not host as vLLM ?

#17

by tommywu052 - opened Oct 9, 2024

Oct 9, 2024

when I try to host that as API endpoint, vllm serve "nvidia/NVLM-D-72B" --trust-remote-code
it will throw the error as
Model architectures ['NVLM_D'] are not supported for now

boxin-wbx

NVIDIA org Oct 10, 2024

We currently do not support vLLM but are actively working on integrating NVLM with vLLM. Our team is committed to delivering this support as soon as possible.

Thanks,
Boxin

jeff1jeffo

Oct 11, 2024

Seems supported in this pr https://github.com/vllm-project/vllm/pull/9045, but not yet released.
Installing vllm from the latest code might work.

ersanbil

Oct 11, 2024

checking out from latest main and building docker image worked for me

tommywu052

Oct 12, 2024

Thanks guys. It work well after install the latest wheel -
pip install https://vllm-wheels.s3.us-west-2.amazonaws.com/nightly/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl
and serve as
vllm serve nvidia/NVLM-D-72B --tensor-parallel-size 4 --enforce-eager --max-num-seqs 16 --trust-remote-code

Malini

Oct 23, 2024

•

edited Oct 23, 2024

Can you please help with the specs on which this was run? I tried run on ColabPro with A100 and it did not work.
Thanks in advance.

ValueError: The number of required GPUs exceeds the total number of available GPUs in the placement group.
Traceback (most recent call last):
File "/usr/local/bin/vllm", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/vllm/scripts.py", line 195, in main
args.dispatch_function(args)
File "/usr/local/lib/python3.10/dist-packages/vllm/scripts.py", line 41, in serve
uvloop.run(run_server(args))
File "/usr/local/lib/python3.10/dist-packages/uvloop/init.py", line 82, in run
return loop.run_until_complete(wrapper())
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/usr/local/lib/python3.10/dist-packages/uvloop/init.py", line 61, in wrapper
return await main
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 552, in run_server
async with build_async_engine_client(args) as engine_client:
File "/usr/lib/python3.10/contextlib.py", line 199, in aenter
return await anext(self.gen)
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 107, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
File "/usr/lib/python3.10/contextlib.py", line 199, in aenter
return await anext(self.gen)
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 194, in build_async_engine_client_from_engine_args
raise RuntimeError(
RuntimeError: Engine process failed to start

jeff1jeffo

Oct 30, 2024

@Malini
You might need 4 * A100 to get this running or try --cpu-offload-gb

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment