ScouterAI / agents /all_agents.py
stevenbucaille's picture
Enhance image processing capabilities and update project structure
111afa2
raw
history blame
3.61 kB
from smolagents import CodeAgent, LogLevel
from tools.bbox_annotator import BBoxAnnotatorTool
from tools.cropping_tool import CroppingTool
from tools.image_resize_tool import ImageResizeTool
from tools.image_segmentation_tool import ImageSegmentationTool
from tools.inference_converter import TaskInferenceOutputConverterTool
from tools.label_annotator import LabelAnnotatorTool
from tools.mask_annotator import MaskAnnotatorTool
from tools.object_detection_tool import ObjectDetectionTool
from tools.task_model_retriever import TaskModelRetrieverTool
def get_master_agent(llm):
description = """
You are an agent that can perform tasks on an image.
For whatever action you will perform on an image, make sure to resize it to a smaller size before using the tools.
You can always crop the image and keep the part of the image as is, if the crop is not too large.
You can use the following tools to perform tasks on an image:
- task_model_retriever: to retrieve models that can perform a task, you must provide the type of task that a model can perform, the supported tasks are:
- object-detection
- image-segmentation
- object_detection: tool to perform the object-detection task, provide an image and a class of objects to get it detected.
- image_segmentation: tool to perform the image-segmentation task, provide an image and a class of objects to get it segmented.
Once you have the detections, you will most likely need to use the task_inference_output_converter tool to convert the detections to the proper format.
Then you can use the following tools to annotate the image:
- bbox_annotator: tool to annotate the image with bounding boxes, provide an image and a list of detections to get it annotated.
- mask_annotator: tool to annotate the image with masks, provide an image and a list of detections to get it annotated.
- label_annotator: tool to annotate the image with labels, provide an image and a list of detections to get it annotated.
- cropping: tool to crop the image, provide an image and a bounding box to get it cropped.
- image_resize: tool to resize the image, provide an image and a width and a height to get it resized.
# - upscaler: tool to upscale the image, provide an image to get it upscaled.
If you don't know what model to use, you can use the task_model_retriever tool to retrieve the model.
Never assume an invented model name, always use the model name provided by the task_model_retriever tool.
Use batching to perform tasks on multiple images at once when a tool supports it.
You have access to the variable "image" which is the image to perform tasks on, no need to load it, it is already loaded.
Always use the variable "image" to draw the bounding boxes on the image.
Whenever you need to use a tool, first write the tool call in the form of a code block.
Then, wait for the tool to return the result.
Then, use the result to perform the task. Step by step.
"""
master_agent = CodeAgent(
name="master_agent",
description=description,
model=llm,
tools=[
TaskModelRetrieverTool(),
ObjectDetectionTool(),
ImageSegmentationTool(),
TaskInferenceOutputConverterTool(),
BBoxAnnotatorTool(),
MaskAnnotatorTool(),
LabelAnnotatorTool(),
CroppingTool(),
ImageResizeTool(),
# UpscalerTool(),
],
verbosity_level=LogLevel.DEBUG,
)
print("Loaded master agent")
return master_agent