mradermacher/model_requests · New quantize request

21 days ago

Hello,
could you please create GGUF formats for the below models? i couldn't find them made by anyone else on hf.
thank you in advance!

deepseek-ai/deepseek-vl2
deepseek-ai/deepseek-vl2-small
deepseek-ai/deepseek-vl2-tiny

thanks for the great job you are doing guys for many models and make them available for users who can run only those types!

linuxboy changed discussion title from New model request to New quantize request 21 days ago

nicoboss

21 days ago

•

edited 21 days ago

They are unfortunately not yet supported by llama.cpp. These models are using DeepseekVLV2ForCausalLM which is simply is not implemented in llama.cpp and might never be due to a lack of interest in multinodular models from many llama.cpp developer unless deepseek-ai release the text only DeepSeekMoE-27B model likely using the a very similar architecture. The model is not even supported by vLLM yet so there is to my knowledge no other way to run this than on your GPU. Luckily given the model size that should be feasible on a high-end consumer GPU if you just load the model in 4-bit precision even for the largest model. If you really need to run this model from RAM here some great news for you: https://github.com/vllm-project/vllm/pull/11578 - once that is merged you can use vLLM to do so - you could likely even do this now on the feature branch if you are willing to compile vLLM from source.

mradermacher changed discussion status to closed 21 days ago

linuxboy

14 days ago

hello,
thank you very much for your detailed and supportive answer, appreciated!
i will do check the recommended ways.
have a great one and thanks again for the great job your team does!