Spaces:
Running
Running
from smolagents import CodeAgent, LogLevel | |
from tools.bbox_annotator import BBoxAnnotatorTool | |
from tools.cropping_tool import CroppingTool | |
from tools.image_resize_tool import ImageResizeTool | |
from tools.image_segmentation_tool import ImageSegmentationTool | |
from tools.inference_converter import TaskInferenceOutputConverterTool | |
from tools.label_annotator import LabelAnnotatorTool | |
from tools.mask_annotator import MaskAnnotatorTool | |
from tools.object_detection_tool import ObjectDetectionTool | |
from tools.task_model_retriever import TaskModelRetrieverTool | |
def get_master_agent(llm): | |
description = """ | |
You are an agent that can perform tasks on an image. | |
For whatever action you will perform on an image, make sure to resize it to a smaller size before using the tools. | |
You can always crop the image and keep the part of the image as is, if the crop is not too large. | |
You can use the following tools to perform tasks on an image: | |
- task_model_retriever: to retrieve models that can perform a task, you must provide the type of task that a model can perform, the supported tasks are: | |
- object-detection | |
- image-segmentation | |
- object_detection: tool to perform the object-detection task, provide an image and a class of objects to get it detected. | |
- image_segmentation: tool to perform the image-segmentation task, provide an image and a class of objects to get it segmented. | |
Once you have the detections, you will most likely need to use the task_inference_output_converter tool to convert the detections to the proper format. | |
Then you can use the following tools to annotate the image: | |
- bbox_annotator: tool to annotate the image with bounding boxes, provide an image and a list of detections to get it annotated. | |
- mask_annotator: tool to annotate the image with masks, provide an image and a list of detections to get it annotated. | |
- label_annotator: tool to annotate the image with labels, provide an image and a list of detections to get it annotated. | |
- cropping: tool to crop the image, provide an image and a bounding box to get it cropped. | |
- image_resize: tool to resize the image, provide an image and a width and a height to get it resized. | |
# - upscaler: tool to upscale the image, provide an image to get it upscaled. | |
If you don't know what model to use, you can use the task_model_retriever tool to retrieve the model. | |
Never assume an invented model name, always use the model name provided by the task_model_retriever tool. | |
Use batching to perform tasks on multiple images at once when a tool supports it. | |
You have access to the variable "image" which is the image to perform tasks on, no need to load it, it is already loaded. | |
Always use the variable "image" to draw the bounding boxes on the image. | |
Whenever you need to use a tool, first write the tool call in the form of a code block. | |
Then, wait for the tool to return the result. | |
Then, use the result to perform the task. Step by step. | |
""" | |
master_agent = CodeAgent( | |
name="master_agent", | |
description=description, | |
model=llm, | |
tools=[ | |
TaskModelRetrieverTool(), | |
ObjectDetectionTool(), | |
ImageSegmentationTool(), | |
TaskInferenceOutputConverterTool(), | |
BBoxAnnotatorTool(), | |
MaskAnnotatorTool(), | |
LabelAnnotatorTool(), | |
CroppingTool(), | |
ImageResizeTool(), | |
# UpscalerTool(), | |
], | |
verbosity_level=LogLevel.DEBUG, | |
) | |
print("Loaded master agent") | |
return master_agent | |