Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Transfer task details to system prompt
Browse files- app.py +1 -17
- e2bqwen.py +8 -3
app.py
CHANGED
@@ -497,27 +497,11 @@ class EnrichedGradioUI(GradioUI):
|
|
497 |
else:
|
498 |
session_state["agent"] = create_agent(data_dir=data_dir, desktop=desktop)
|
499 |
|
500 |
-
|
501 |
-
# Construct the full task with instructions
|
502 |
-
full_task = task_input + dedent(f"""
|
503 |
-
The desktop has a resolution of {WIDTH}x{HEIGHT}, take it into account to decide clicking coordinates.
|
504 |
-
When clicking an element, always make sure to click THE MIDDLE of that element! Else you risk to miss it.
|
505 |
-
|
506 |
-
Always analyze the latest screenshot carefully before performing actions. Make sure to:
|
507 |
-
1. Look at elements on the screen to determine what to click or interact with
|
508 |
-
2. Use precise coordinates for mouse movements and clicks
|
509 |
-
3. Wait for page loads or animations to complete using the wait() tool
|
510 |
-
4. Sometimes you may have missed a click, so never assume that you're on the right page, always make sure that your previous action worked. In the screenshot you can see if the mouse is out of the clickable area. Pay special attention to this.
|
511 |
-
|
512 |
-
When you receive a task, break it down into step-by-step actions. On each step, look at the current screenshot to validate if previous steps worked and decide the next action.
|
513 |
-
We can only execute one action at a time. On each step, answer only a python blob with the action to perform
|
514 |
-
""")
|
515 |
-
|
516 |
try:
|
517 |
stored_messages.append(gr.ChatMessage(role="user", content=task_input))
|
518 |
yield stored_messages
|
519 |
|
520 |
-
for msg in stream_to_gradio(session_state["agent"], task=
|
521 |
if hasattr(session_state["agent"], "last_screenshot") and msg.content == "-----": # Append the last screenshot before the end of step
|
522 |
stored_messages.append(gr.ChatMessage(
|
523 |
role="assistant",
|
|
|
497 |
else:
|
498 |
session_state["agent"] = create_agent(data_dir=data_dir, desktop=desktop)
|
499 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
500 |
try:
|
501 |
stored_messages.append(gr.ChatMessage(role="user", content=task_input))
|
502 |
yield stored_messages
|
503 |
|
504 |
+
for msg in stream_to_gradio(session_state["agent"], task=task_input, reset_agent_memory=False):
|
505 |
if hasattr(session_state["agent"], "last_screenshot") and msg.content == "-----": # Append the last screenshot before the end of step
|
506 |
stored_messages.append(gr.ChatMessage(
|
507 |
role="assistant",
|
e2bqwen.py
CHANGED
@@ -29,7 +29,7 @@ On top of performing computations in the Python code snippets that you create, y
|
|
29 |
Returns an output of type: {{tool.output_type}}
|
30 |
{%- endfor %}
|
31 |
|
32 |
-
The desktop has a resolution of <<resolution_x>>x<<resolution_y
|
33 |
|
34 |
IMPORTANT:
|
35 |
- Remember the tools that you have as those can save you time, for example open_url to enter a website rather than searching for the browser in the OS.
|
@@ -84,9 +84,14 @@ Remember to:
|
|
84 |
Always wait for appropriate loading times
|
85 |
Use precise coordinates based on the current screenshot
|
86 |
Execute one action at a time
|
87 |
-
|
88 |
Use click to move through menus on the desktop and scroll for web and specific applications.
|
89 |
-
|
|
|
|
|
|
|
|
|
|
|
90 |
"""
|
91 |
|
92 |
def draw_marker_on_image(image, click_coordinates):
|
|
|
29 |
Returns an output of type: {{tool.output_type}}
|
30 |
{%- endfor %}
|
31 |
|
32 |
+
The desktop has a resolution of <<resolution_x>>x<<resolution_y>>, take it into account to decide clicking coordinates.
|
33 |
|
34 |
IMPORTANT:
|
35 |
- Remember the tools that you have as those can save you time, for example open_url to enter a website rather than searching for the browser in the OS.
|
|
|
84 |
Always wait for appropriate loading times
|
85 |
Use precise coordinates based on the current screenshot
|
86 |
Execute one action at a time
|
87 |
+
On each step, look at the last screenshot and action to validate if previous steps worked and decide the next action. If you repeated an action already without effect, it means that this action is useless: don't repeat it and try something else.
|
88 |
Use click to move through menus on the desktop and scroll for web and specific applications.
|
89 |
+
When clicking an element, always make sure to click THE MIDDLE of that element! Else you risk to miss it.
|
90 |
+
Always analyze the latest screenshot carefully before performing actions. Make sure to:
|
91 |
+
1. Look at elements on the screen to determine what to click or interact with
|
92 |
+
2. Use precise coordinates for mouse movements and clicks
|
93 |
+
3. You can wait for page loads or animations to complete using the wait() tool
|
94 |
+
4. Sometimes you may have missed a click, so never assume that you're on the right page, always make sure that your previous action worked. In the screenshot you can see if the mouse is out of the clickable area. Pay special attention to this.
|
95 |
"""
|
96 |
|
97 |
def draw_marker_on_image(image, click_coordinates):
|