Run model via docker and vllm
#1
by
keysonya
- opened
Hello! Sorry for bothering you! I would like t oask a help how to run a model on vllm inside docker container, I tried and faced with the issue raise ValueError(f"No supported config format found in {model}"), as I noticed this issue can solve to set up directly a tokenizer and/or use gguf-split for merging all files in one so that to support gguf format by vllm, what do you think about it?
I will be really thankful for any help!
Support for gguf in vllm is experimental, so releasing in this format is primarily aimed at use in llama.cpp/ollama frameworks. For vllm, if quantization is needed, you need to make awq quant. In the future, we plan to make them for our models along with gguf.