Run model via docker and vllm

by keysonya - opened 7 days ago

7 days ago

Hello! Sorry for bothering you! I would like t oask a help how to run a model on vllm inside docker container, I tried and faced with the issue raise ValueError(f"No supported config format found in {model}"), as I noticed this issue can solve to set up directly a tokenizer and/or use gguf-split for merging all files in one so that to support gguf format by vllm, what do you think about it?
I will be really thankful for any help!

RefalMachine

Owner 4 days ago

Support for gguf in vllm is experimental, so releasing in this format is primarily aimed at use in llama.cpp/ollama frameworks. For vllm, if quantization is needed, you need to make awq quant. In the future, we plan to make them for our models along with gguf.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment