How to run this version with vLLM
Anybody got ir running with vLLM?
Anybody got ir running with vLLM?
Not supported at the moment but will soon :)
Anybody got ir running with vLLM?
Not supported at the moment but will soon :)
can I convert this into gguf just like what you have done to the R1 model?
Seems to be working on the latest version for me.
I would like to complement the information. The following instruction is not workingvllm serve unsloth/Mistral-Small-24B-Instruct-2501-unsloth-bnb-4bit --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice
I served this model as follows (for me it is working with vllm 0.8.1)vllm serve unsloth/Mistral-Small-24B-Instruct-2501-unsloth-bnb-4bit --load_format bitsandbytes --quantization bitsandbytes --gpu-memory-utilization 0.9 --max-model-len 9000
It is mandatory for me to specify gpu-memory-utilization and max-model-len
Regards