Error serving GGUF models on vllm
Hi. Thanks for the quantizations.
I am trying to run the model on vllm. The architecture is supported as per the vllm documentation.
I first downloaded the GGUF shards from DeepSeek-V2.5-Q6_K
Then I merged them using the llama-gguf-split utility.
Finally when I run it as vllm serve DeepSeek-V2.5-Q6_K.gguf --tokenizer deepseek-ai/DeepSeek-V2.5
, I get the error ValueError: Architecture deepseek2 not supported
and it seems to originate from 'transformers/modeling_gguf_pytorch_utils.py'. Upon further investigataion, the GGUF_SUPPORTED_ARCHITECTURES list only ['llama', 'mistral', 'qwen2', 'qwen2moe', 'phi3']
Do you know any solution to this problem? Or do we have to wait until huggingface support deepseekv2 with GGUF?
thanks
Have you solved the problem yet? Because I'm experiencing the same issue.
No I didn't. Ended up using the paid API.
It's just not supported in vLLM I'm pretty sure :(
I think it's not vllm, but gguf support on hugging face transformers, that we need to wait for.
hmm maybe, i don't know whose responsibility it is to add it, just that it's explicitly not added in VLLM list