File size: 2,419 Bytes
fcb8f25
a21dee1
 
 
 
 
 
 
 
 
 
 
 
583b7ad
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a21dee1
 
 
 
 
 
 
583b7ad
a21dee1
 
 
 
fcb8f25
 
 
a21dee1
7daa838
583b7ad
7daa838
583b7ad
7daa838
583b7ad
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIModel
from dotenv import load_dotenv
import os
from dataclasses import dataclass
import logfire

load_dotenv()

logfire.configure(token=os.environ.get("LOGFIRE_TOKEN"))
logfire.instrument_openai()

system_prompt = """
I will give you an editing instruction of the image. Perform the following tasks:

<task_1>
Please output which type of editing category it is in.
You can choose from the following categories: 
1. Addition: Adding new objects within the images, e.g., add a bird
2. Remove: Removing objects, e.g., remove the mask
3. Local: Replace local parts of an object and later the object's attributes (e.g., make it smile) or alter an object's visual appearance without affecting its structure (e.g., change the cat to a dog)
4. Global: Edit the entire image, e.g., let's see it in winter
5. Background: Change the scene's background, e.g., have her walk on water, change the background to a beach, make the hedgehog in France, etc.
Only output a single word, e.g., 'Addition'.
</task_1>

<task_2>
Please output the subject needed to be edited. You only need to output the basic description of the object in no more than 5 words.  The output should only contain one noun.

For example, the editing instruction is 'Change the white cat to a black dog'. Then you need to output: 'white cat'. Only output the new content. Do not output anything else.
</task_2>

<task_3>
Please describe the new content that should be present in the image after applying the instruction.

For example, if the original image content shows a grandmother wearing a mask and the instruction is 'remove the mask', your output should be: 'a grandmother'. 
The output should only include elements that remain in the image after the edit and should not mention elements that have been changed or removed, such as 'mask' in this example. 
Do not output 'sorry, xxx', even if it's a guess, directly output the answer you think is correct.
</task_3>
"""

model = OpenAIModel(
    "gpt-4o",
    api_key=os.environ.get("OPENAI_API_KEY"),
)


@dataclass
class MaskGenerationResult:
    mask_image_base64: str


mask_generation_agent = Agent(model, system_prompt=system_prompt)


@mask_generation_agent.tool
async def generate_mask(edit_instruction: str, image_url: str) -> MaskGenerationResult:
    """
    Use this tool to generate a mask for the image.
    """
    pass