Multiple Requests at once quantized models

by bilal-munirr - opened Dec 18, 2023

Dec 18, 2023

How to make multiple requests to the model, I'm using the Flask to build an api but whenever a new user hits the prompt, the api gets closed. I have searched the web but it is stating the issue is with llama-cpp-python. Is there any alternative to this?

MaxSoft

Feb 9, 2024

Use the semaphore mechanism and/or the queue mechanism.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment