Spaces:
Running
Running
Commit
·
5e05897
1
Parent(s):
bf34015
Update README.md to remove outdated introductory content and add a link to the submission video, streamlining the overview of ScouterAI.
Browse files
README.md
CHANGED
@@ -14,32 +14,4 @@ short_description: The agent using over 9000 vision models from the HF Hub.
|
|
14 |
|
15 |
# ScouterAI - The Vision enhanced Agent
|
16 |
|
17 |
-
|
18 |
-
This app falls under the track 3 : Agentic Demo.
|
19 |
-
The goal of the app is to demonstrate the capabilities of agentic llm's combined with more "traditional" deep learning computer vision.
|
20 |
-
LLM's (and VLM's) are great models when it comes to interacting with the user and understanding its queries but are not (yet) capable of a precise perception of the images presented to them.
|
21 |
-
Computer Vision models like object detection or image segmentation models are tailored models to accomplish these tasks but require some engineering to wrap them and be user ready.
|
22 |
-
The idea of the agentic demo is to provide powerful LLM with access to expert vision models like object detection or image segmentation models.
|
23 |
-
The agent can fulfill precise perception task on any object present in the image : detection, location, classification, masking, counting, etc...
|
24 |
-
|
25 |
-
## Overview
|
26 |
-
|
27 |
-
In this preliminary app, the agent is a CodeAgent provided by the smolagents framework.
|
28 |
-
Its interface consists of a chat interface with example and a gallery which is used to display the agent's work.
|
29 |
-
The agent is provided with a set of tools :
|
30 |
-
- Task model retriever : a RAG tool which, given a task (object-detection or image-segmentation) and a query (car e.g.), returns a list of models with their model id and the list of classes it is capable of detecting/segmenting. The list if based on a curated dataset of all the models available on the HuggingFace Hub, returns the mo
|
31 |
-
- Computer vision models : Any object detection and image segmentation models available of HuggingFace
|
32 |
-
- Image processing functions : Resizing, cropping, ...
|
33 |
-
- Image annotation functions : Label, bounding box and mask annotators
|
34 |
-
|
35 |
-
|
36 |
-
|
37 |
-
To complete a user request
|
38 |
-
|
39 |
-
## Use-cases
|
40 |
-
|
41 |
-
## Stack
|
42 |
-
|
43 |
-
Agent framework : smolagents
|
44 |
-
LLM : Anthropic
|
45 |
-
Compute : Modal
|
|
|
14 |
|
15 |
# ScouterAI - The Vision enhanced Agent
|
16 |
|
17 |
+
[Submission video](https://youtu.be/FD8sZTjF5_4)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|