Load Test

This folder contains the content to load test gradio apps, and check the ability to handle multiple connections simultaneously.

Setup

For a proper "production" environment load test, you can run gradio behind an nginx config. Install nginx and add nginx.conf to /etc/nginx/conf.d/*.conf on the machine running gradio. (You may want to have a machine running gradio that is separate from the machine running the load test, to include the effect of network latency).

gradio.py contains a simple gradio chat app that streams 500 tokens at a rate of 100 tokens/sec. This app is compatible with gradio 3.x as well.

simple.py contains a simple fastapi that streams 500 tokens at a rate of 100 tokens/sec, with both a WS and SSE endpoint. This does not use gradio. The purpose of this file is to compare the performance of streaming raw websockets and SSE, vs with gradio overhead.

workers.py contains a fastapi that uses queues and worker threads to stream 500 tokens at a rate of 100 tokens/sec, with both a WS and SSE endpoint. The purpose of this file is to compare the performance of streaming websockets and SSE with the implementation of gradio but without all the overhead.

load.ipynb supports running load tests on chat.py with gradio 3.x and 4.0, as well as on app.py. Simply configure the URL to point to where the app is running.