can vllm launch this model?
current, it said..
ERROR 12-16 09:49:55 engine.py:366] raise ValueError(f"No supported config format found in {model}")
ERROR 12-16 09:49:55 engine.py:366] ValueError: No supported config format found in unsloth/Llama-3.3-70B-Instruct-GGUF
vllm version is 0.6.4.post1
transformers 4.47.0
current, it said..
ERROR 12-16 09:49:55 engine.py:366] raise ValueError(f"No supported config format found in {model}")
ERROR 12-16 09:49:55 engine.py:366] ValueError: No supported config format found in unsloth/Llama-3.3-70B-Instruct-GGUFvllm version is 0.6.4.post1
transformers 4.47.0
You need a config.json file. Copy the original config.json 16bit file from Github and it should work. I wouldnt recommend using vllm for GGUF, instead use llama.cpp
Hi. Where is this file link exactly?
Thanks.
Why would you not recommend vllm for GGUF exactly? I am looking at serving a quantized model API and they seem to be the best for this? (https://blog.vllm.ai/2024/09/05/perf-update.html)
Or is this recommendation only for individual users?
Nevermind found the answer: (https://docs.vllm.ai/en/latest/features/quantization/gguf.html)
Hi. Where is this file link exactly?
Thanks.
whoops much apologies i missed your message. it's located here: https://huggingface.co/unsloth/Llama-3.3-70B-Instruct/tree/main
Why would you not recommend vllm for GGUF exactly? I am looking at serving a quantized model API and they seem to be the best for this? (https://blog.vllm.ai/2024/09/05/perf-update.html)
Or is this recommendation only for individual users?
Nevermind found the answer: (https://docs.vllm.ai/en/latest/features/quantization/gguf.html)
Yea usually you'd rather 4bit / 8bit versions in VLLM rather than GGUFs