How to run this version with vLLM

#2
by alecauduro - opened

Anybody got ir running with vLLM?

Unsloth AI org

Anybody got ir running with vLLM?

Not supported at the moment but will soon :)

Anybody got ir running with vLLM?

Not supported at the moment but will soon :)

can I convert this into gguf just like what you have done to the R1 model?

@alecauduro

Seems to be working on the latest version for me.

I would like to complement the information. The following instruction is not working
vllm serve unsloth/Mistral-Small-24B-Instruct-2501-unsloth-bnb-4bit --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice

I served this model as follows (for me it is working with vllm 0.8.1)
vllm serve unsloth/Mistral-Small-24B-Instruct-2501-unsloth-bnb-4bit --load_format bitsandbytes --quantization bitsandbytes --gpu-memory-utilization 0.9 --max-model-len 9000

It is mandatory for me to specify gpu-memory-utilization and max-model-len

Regards

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment