Does vLLM support batch inference of models?
Yes, just give a list of messages instead of one
AFAIK batch works in vlllm python object "offline mode" but online mode will return an error if you try to submit more than one message list at once
· Sign up or log in to comment