How to run the model OpenGVLab/InternVL2-40B-AWQ with vllm docker image?

by andryevinnik - opened Aug 6, 2024

Aug 6, 2024

I used the following command:
docker run --log-opt max-size=10m --log-opt max-file=1 --rm -it --gpus '"device=0"' -p 8080:8000 --mount type=bind,source=/ssd_2/huggingface,target=/root/.cache/huggingface vllm/vllm-openai:v0.5.4 --model OpenGVLab/InternVL2-40B-AWQ --max-model-len 8192 --trust-remote-code

KeyError: 'model.layers.51.mlp.gate_up_proj.qweight'

Tried command with additional: --dtype half
the same error

Also tried: -q awq
ValueError: Cannot find the config file for awq

Please advise how to run the model

qwertyjack

Aug 14, 2024

Why not trying LMDeploy: https://lmdeploy.readthedocs.io/en/latest/llm/api_server.html

zwgao

OpenGVLab org Aug 21, 2024

Our awq model is generated by lmdeploy. Vllm recently supports Internvl. Please try again.

czczup changed discussion status to closed Sep 4, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment