can vllm launch this model?

#2
by chopin1998 - opened

current, it said..

ERROR 12-16 09:49:55 engine.py:366] raise ValueError(f"No supported config format found in {model}")
ERROR 12-16 09:49:55 engine.py:366] ValueError: No supported config format found in unsloth/Llama-3.3-70B-Instruct-GGUF

vllm version is 0.6.4.post1
transformers 4.47.0

current, it said..

ERROR 12-16 09:49:55 engine.py:366] raise ValueError(f"No supported config format found in {model}")
ERROR 12-16 09:49:55 engine.py:366] ValueError: No supported config format found in unsloth/Llama-3.3-70B-Instruct-GGUF

vllm version is 0.6.4.post1
transformers 4.47.0

You need a config.json file. Copy the original config.json 16bit file from Github and it should work. I wouldnt recommend using vllm for GGUF, instead use llama.cpp

Hi. Where is this file link exactly?
Thanks.

Why would you not recommend vllm for GGUF exactly? I am looking at serving a quantized model API and they seem to be the best for this? (https://blog.vllm.ai/2024/09/05/perf-update.html)

Or is this recommendation only for individual users?

Nevermind found the answer: (https://docs.vllm.ai/en/latest/features/quantization/gguf.html)

Unsloth AI org

Hi. Where is this file link exactly?
Thanks.

whoops much apologies i missed your message. it's located here: https://huggingface.co/unsloth/Llama-3.3-70B-Instruct/tree/main

Unsloth AI org

Why would you not recommend vllm for GGUF exactly? I am looking at serving a quantized model API and they seem to be the best for this? (https://blog.vllm.ai/2024/09/05/perf-update.html)

Or is this recommendation only for individual users?

Nevermind found the answer: (https://docs.vllm.ai/en/latest/features/quantization/gguf.html)

Yea usually you'd rather 4bit / 8bit versions in VLLM rather than GGUFs

Sign up or log in to comment