Can Model Batch Infer By vLLM

#30
by BITDDD - opened

Does vLLM support batch inference of models?

Yes, just give a list of messages instead of one

AFAIK batch works in vlllm python object "offline mode" but online mode will return an error if you try to submit more than one message list at once

Sign up or log in to comment