Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
@@ -1,6 +1,6 @@
|
|
1 |
---
|
2 |
-
title:
|
3 |
-
emoji:
|
4 |
colorFrom: green
|
5 |
colorTo: gray
|
6 |
sdk: gradio
|
@@ -11,3 +11,30 @@ license: apache-2.0
|
|
11 |
tag: agent-demo-track
|
12 |
---
|
13 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
title: ScouterAI
|
3 |
+
emoji: π
|
4 |
colorFrom: green
|
5 |
colorTo: gray
|
6 |
sdk: gradio
|
|
|
11 |
tag: agent-demo-track
|
12 |
---
|
13 |
|
14 |
+
# ScouterAI - The Vision enhanced Agent
|
15 |
+
|
16 |
+
Welcome to ScouterAI, my [Agents - MCP Hackathon](https://huggingface.co/Agents-MCP-Hackathon) submission.
|
17 |
+
This app falls under the track 3 : Agentic Demo.
|
18 |
+
The goal of the app is to demonstrate the capabilities of agentic llm's combined with more "traditional" deep learning computer vision.
|
19 |
+
LLM's (and VLM's) are great models when it comes to interacting with the user and understanding its queries but are not (yet) capable of a precise perception of the images presented to them.
|
20 |
+
Computer Vision models like object detection or image segmentation models are tailored models to accomplish these tasks but require some engineering to wrap them and be user ready.
|
21 |
+
The idea of the agentic demo is to provide powerful LLM with access to expert vision models like object detection or image segmentation models.
|
22 |
+
The agent can fulfill precise perception task on any object present in the image : detection, location, classification, masking, counting, etc...
|
23 |
+
|
24 |
+
##
|
25 |
+
|
26 |
+
In this preliminary app, the agent is a CodeAgent (provided by the smolagents framework) provided with access to a set of tools :
|
27 |
+
- Any object detection and image segmentation models available of HuggingFace
|
28 |
+
- Image processing functions
|
29 |
+
- Image annotation functions
|
30 |
+
|
31 |
+
To complete a user request
|
32 |
+
|
33 |
+
## Use-cases
|
34 |
+
|
35 |
+
## Stack
|
36 |
+
|
37 |
+
Agent framework : smolagents
|
38 |
+
LLM : Anthropic
|
39 |
+
Compute : Modal
|
40 |
+
|