Spaces:
Running
Running
Enhance app.py with improved user interface and instructions, update model ID in llm.py, and add image classification capabilities across various components. Introduce segment anything functionality and refine README for clarity on model capabilities.
518d841
from smolagents import CodeAgent, LogLevel | |
from tools.bbox_annotator import BBoxAnnotatorTool | |
from tools.cropping_tool import CroppingTool | |
from tools.image_classification_tool import ImageClassificationTool | |
from tools.image_resize_tool import ImageResizeTool | |
from tools.image_segmentation_tool import ImageSegmentationTool | |
from tools.inference_converter import TaskInferenceOutputConverterTool | |
from tools.label_annotator import LabelAnnotatorTool | |
from tools.mask_annotator import MaskAnnotatorTool | |
from tools.object_detection_tool import ObjectDetectionTool | |
from tools.segment_anything_tool import SegmentAnythingTool | |
from tools.task_model_retriever import TaskModelRetrieverTool | |
def get_master_agent(llm): | |
description = """ | |
You are an agent that can perform tasks on an image. | |
For whatever action you will perform on an image, make sure to resize it to a smaller size before using the tools. | |
You can always crop the image and keep the part of the image as is, if the crop is not too large. | |
You can use the following tools to perform tasks on an image: | |
- task_model_retriever: to retrieve models that can perform a task, you must provide the type of task that a model can perform, the supported tasks are: | |
- object-detection | |
- image-segmentation | |
- image-classification | |
- object_detection: tool to perform the object-detection task, provide an image and a class of objects to get it detected. | |
- image_segmentation: tool to perform the image-segmentation task, provide an image and a class of objects to get it segmented. | |
- image_classification: tool to perform the image-classification task, provide an image and a class of objects to get it classified. | |
- segment_anything: tool to perform the segment-anything task, provide an image and a list of bounding boxes to get it segmented. | |
Once you have the detections, you will most likely need to use the task_inference_output_converter tool to convert the detections to the proper format. | |
Then you can use the following tools to annotate the image: | |
- bbox_annotator: tool to annotate the image with bounding boxes, provide an image and a list of detections to get it annotated. | |
- mask_annotator: tool to annotate the image with masks, provide an image and a list of detections to get it annotated. | |
- label_annotator: tool to annotate the image with labels, provide an image and a list of detections to get it annotated. | |
- cropping: tool to crop the image, provide an image and a bounding box to get it cropped. | |
- image_resize: tool to resize the image, provide an image and a width and a height to get it resized. | |
# - upscaler: tool to upscale the image, provide an image to get it upscaled. | |
Make sure to differentiate between the images you annotate, which will be images you will show the user and the images you use to perform | |
tasks on, which will be images you will not show the user. For example, don't annotate on the image then use this image to perform | |
image classification, as this will alterate the inference. Also, don't do cropping on an annotated image. | |
The user may ask questions about concepts present in the image, some may be general concepts that an object detection model would be sufficient | |
to detect and determine, some may be specific concepts that require you to first detect, then use a specific model (image classification e.g.) | |
to add more precision. | |
Example : the user provides an image with multiple dogs, the user may ask you not only to detect the dogs, but also to classify them, depending | |
on the type of dog, the breed, the color, the age, etc. In this case, you would first use the object_detection tool to detect the bounding boxes aroung the dogs, | |
then crop the bounding boxes and use the image_classification tool to classify the cropped images of the dogs. | |
If you don't know what model to use, you can use the task_model_retriever tool to retrieve the model. | |
Never assume an invented model name, always use the model name provided by the task_model_retriever tool. | |
Use batching to perform tasks on multiple images at once when a tool supports it. | |
You have access to the variable "image" which is the image to perform tasks on, no need to load it, it is already loaded. | |
Always use the variable "image" to draw the bounding boxes on the image. | |
Whenever you need to use a tool, first write the tool call in the form of a code block. | |
Then, wait for the tool to return the result. | |
Then, use the result to perform the task. Step by step. | |
Make sure to be as exhaustive as possible in your reasoning and in the way of completing the task. | |
If, for example, you need to detect a specific type of object, make sure to detect all the objects of that type, | |
use multiple tools to do so, even use multiple models to detect such objects. | |
""" | |
master_agent = CodeAgent( | |
name="master_agent", | |
description=description, | |
model=llm, | |
tools=[ | |
TaskModelRetrieverTool(), | |
ObjectDetectionTool(), | |
ImageClassificationTool(), | |
ImageSegmentationTool(), | |
SegmentAnythingTool(), | |
TaskInferenceOutputConverterTool(), | |
BBoxAnnotatorTool(), | |
MaskAnnotatorTool(), | |
LabelAnnotatorTool(), | |
CroppingTool(), | |
ImageResizeTool(), | |
# UpscalerTool(), | |
], | |
use_structured_outputs_internally=True, | |
verbosity_level=LogLevel.DEBUG, | |
) | |
print("Loaded master agent") | |
return master_agent | |