bartowski/DeepSeek-V2.5-GGUF · Error serving GGUF models on vllm

Oct 10, 2024

Hi. Thanks for the quantizations.

I am trying to run the model on vllm. The architecture is supported as per the vllm documentation.

I first downloaded the GGUF shards from DeepSeek-V2.5-Q6_K

Then I merged them using the llama-gguf-split utility.

Finally when I run it as vllm serve DeepSeek-V2.5-Q6_K.gguf --tokenizer deepseek-ai/DeepSeek-V2.5, I get the error ValueError: Architecture deepseek2 not supported and it seems to originate from 'transformers/modeling_gguf_pytorch_utils.py'. Upon further investigataion, the GGUF_SUPPORTED_ARCHITECTURES list only ['llama', 'mistral', 'qwen2', 'qwen2moe', 'phi3']

Do you know any solution to this problem? Or do we have to wait until huggingface support deepseekv2 with GGUF?

thanks