|
# Real Time Object Detection from a Webcam Stream with WebRTC |
|
|
|
Tags: VISION, STREAMING, WEBCAM |
|
|
|
In this guide, we'll use YOLOv10 to perform real-time object detection in Gradio from a user's webcam feed. We'll utilize the latest streaming features introduced in Gradio 5.0. You can see the finished product in action below: |
|
|
|
<video src="https://github.com/user-attachments/assets/4584cec6-8c1a-401b-9b61-a4fe0718b558" controls |
|
height="600" width="600" style="display: block; margin: auto;" autoplay="true" loop="true"> |
|
</video> |
|
|
|
## Setting up |
|
|
|
Start by installing all the dependencies. Add the following lines to a `requirements.txt` file and run `pip install -r requirements.txt`: |
|
|
|
```bash |
|
opencv-python |
|
twilio |
|
gradio>=5.0 |
|
gradio-webrtc |
|
onnxruntime-gpu |
|
``` |
|
|
|
We'll use the ONNX runtime to speed up YOLOv10 inference. This guide assumes you have access to a GPU. If you don't, change `onnxruntime-gpu` to `onnxruntime`. Without a GPU, the model will run slower, resulting in a laggy demo. |
|
|
|
We'll use OpenCV for image manipulation and the [Gradio WebRTC](https://github.com/freddyaboulton/gradio-webrtc) custom component to use [WebRTC](https://webrtc.org/) under the hood, achieving near-zero latency. |
|
|
|
**Note**: If you want to deploy this app on any cloud provider, you'll need to use the free Twilio API for their [TURN servers](https://www.twilio.com/docs/stun-turn). Create a free account on Twilio. If you're not familiar with TURN servers, consult this [guide](https://www.twilio.com/docs/stun-turn/faq#faq-what-is-nat). |
|
|
|
## The Inference Function |
|
|
|
We'll download the YOLOv10 model from the Hugging Face hub and instantiate a custom inference class to use this model. |
|
|
|
The implementation of the inference class isn't covered in this guide, but you can find the source code [here](https://huggingface.co/spaces/freddyaboulton/webrtc-yolov10n/blob/main/inference.py#L9) if you're interested. This implementation borrows heavily from this [github repository](https://github.com/ibaiGorordo/ONNX-YOLOv8-Object-Detection). |
|
|
|
We're using the `yolov10-n` variant because it has the lowest latency. See the [Performance](https://github.com/THU-MIG/yolov10?tab=readme-ov-file#performance) section of the README in the YOLOv10 GitHub repository. |
|
|
|
```python |
|
from huggingface_hub import hf_hub_download |
|
from inference import YOLOv10 |
|
|
|
model_file = hf_hub_download( |
|
repo_id="onnx-community/yolov10n", filename="onnx/model.onnx" |
|
) |
|
|
|
model = YOLOv10(model_file) |
|
|
|
def detection(image, conf_threshold=0.3): |
|
image = cv2.resize(image, (model.input_width, model.input_height)) |
|
new_image = model.detect_objects(image, conf_threshold) |
|
return new_image |
|
``` |
|
|
|
Our inference function, `detection`, accepts a numpy array from the webcam and a desired confidence threshold. Object detection models like YOLO identify many objects and assign a confidence score to each. The lower the confidence, the higher the chance of a false positive. We'll let users adjust the confidence threshold. |
|
|
|
The function returns a numpy array corresponding to the same input image with all detected objects in bounding boxes. |
|
|
|
## The Gradio Demo |
|
|
|
The Gradio demo is straightforward, but we'll implement a few specific features: |
|
|
|
1. Use the `WebRTC` custom component to ensure input and output are sent to/from the server with WebRTC. |
|
2. The [WebRTC](https://github.com/freddyaboulton/gradio-webrtc) component will serve as both an input and output component. |
|
3. Utilize the `time_limit` parameter of the `stream` event. This parameter sets a processing time for each user's stream. In a multi-user setting, such as on Spaces, we'll stop processing the current user's stream after this period and move on to the next. |
|
|
|
We'll also apply custom CSS to center the webcam and slider on the page. |
|
|
|
```python |
|
import gradio as gr |
|
from gradio_webrtc import WebRTC |
|
|
|
css = """.my-group {max-width: 600px !important; max-height: 600px !important;} |
|
.my-column {display: flex !important; justify-content: center !important; align-items: center !important;}""" |
|
|
|
with gr.Blocks(css=css) as demo: |
|
gr.HTML( |
|
""" |
|
<h1 style='text-align: center'> |
|
YOLOv10 Webcam Stream (Powered by WebRTC ⚡️) |
|
</h1> |
|
""" |
|
) |
|
with gr.Column(elem_classes=["my-column"]): |
|
with gr.Group(elem_classes=["my-group"]): |
|
image = WebRTC(label="Stream", rtc_configuration=rtc_configuration) |
|
conf_threshold = gr.Slider( |
|
label="Confidence Threshold", |
|
minimum=0.0, |
|
maximum=1.0, |
|
step=0.05, |
|
value=0.30, |
|
) |
|
|
|
image.stream( |
|
fn=detection, inputs=[image, conf_threshold], outputs=[image], time_limit=10 |
|
) |
|
|
|
if __name__ == "__main__": |
|
demo.launch() |
|
``` |
|
|
|
## Conclusion |
|
|
|
Our app is hosted on Hugging Face Spaces [here](https://huggingface.co/spaces/freddyaboulton/webrtc-yolov10n). |
|
|
|
You can use this app as a starting point to build real-time image applications with Gradio. Don't hesitate to open issues in the space or in the [WebRTC component GitHub repo](https://github.com/freddyaboulton/gradio-webrtc) if you have any questions or encounter problems. |