Text Generation
GGUF
Russian
Inference Endpoints
conversational

Run model via docker and vllm

#1
by keysonya - opened

Hello! Sorry for bothering you! I would like t oask a help how to run a model on vllm inside docker container, I tried and faced with the issue raise ValueError(f"No supported config format found in {model}"), as I noticed this issue can solve to set up directly a tokenizer and/or use gguf-split for merging all files in one so that to support gguf format by vllm, what do you think about it?
I will be really thankful for any help!

Support for gguf in vllm is experimental, so releasing in this format is primarily aimed at use in llama.cpp/ollama frameworks. For vllm, if quantization is needed, you need to make awq quant. In the future, we plan to make them for our models along with gguf.

Sign up or log in to comment