Multiple Requests at once quantized models

#3
by bilal-munirr - opened

How to make multiple requests to the model, I'm using the Flask to build an api but whenever a new user hits the prompt, the api gets closed. I have searched the web but it is stating the issue is with llama-cpp-python. Is there any alternative to this?

Use the semaphore mechanism and/or the queue mechanism.

Sign up or log in to comment