GenericAgent-AgentTrek-1.0-32b (#5)
Browse files- GenericAgent-AgentTrek-1.0-32b (5d1fb89f6c214429631953c849930825bddcf0ad)
- Fixed formatting (5e482c6860a1a676b3f1083d79d6bd32930ba125)
Co-authored-by: Lu Dunjie <[email protected]>
- results/GenericAgent-AgentTrek-1.0-32b/miniwob.json +16 -0
- results/GenericAgent-AgentTrek-1.0-32b/readme.md +85 -0
- results/GenericAgent-AgentTrek-1.0-32b/webarena.json +16 -0
- results/GenericAgent-AgentTrek-1.0-32b/workarena-l1.json +16 -0
- results/GenericAgent-AgentTrek-1.0-32b/workarena-l2.json +16 -0
- results/GenericAgent-AgentTrek-1.0-32b/workarena-l3.json +16 -0
results/GenericAgent-AgentTrek-1.0-32b/miniwob.json
ADDED
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
[
|
2 |
+
{
|
3 |
+
"agent_name": "GenericAgent-AgentTrek-1.0-32b",
|
4 |
+
"study_id": "4c636aa0-ea52-429d-9d7e-301b7bf0ac74",
|
5 |
+
"date_time": "2025-01-22 04:27:37",
|
6 |
+
"benchmark": "MiniWoB",
|
7 |
+
"score": 60.0,
|
8 |
+
"std_err": 2.0,
|
9 |
+
"benchmark_specific": "No",
|
10 |
+
"benchmark_tuned": "No",
|
11 |
+
"followed_evaluation_protocol": "Yes",
|
12 |
+
"reproducible": "Yes",
|
13 |
+
"comments": "Additional details",
|
14 |
+
"original_or_reproduced": "Original"
|
15 |
+
}
|
16 |
+
]
|
results/GenericAgent-AgentTrek-1.0-32b/readme.md
ADDED
@@ -0,0 +1,85 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
### GenericAgent-AgentTrek-1.0-32b
|
2 |
+
|
3 |
+
this agent is GenericAgent from Agentlab
|
4 |
+
|
5 |
+
- **Base Model:**
|
6 |
+
|
7 |
+
- Qwen/Qwen2.5-32B-Instruct
|
8 |
+
- **Architecture:**
|
9 |
+
|
10 |
+
- Type: Causal Language Models
|
11 |
+
- Training Stage: Pretraining & Post-training
|
12 |
+
- Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
|
13 |
+
- Number of Parameters: 32.5B
|
14 |
+
- Number of Paramaters (Non-Embedding): 31.0B
|
15 |
+
- Number of Layers: 64
|
16 |
+
- Number of Attention Heads (GQA): 40 for Q and 8 for KV
|
17 |
+
- Input/Output Format:
|
18 |
+
|
19 |
+
- with the following flags:
|
20 |
+
```txt
|
21 |
+
flags=GenericPromptFlags(
|
22 |
+
obs=ObsFlags(
|
23 |
+
use_html=True,
|
24 |
+
use_ax_tree=True,
|
25 |
+
use_tabs=False,
|
26 |
+
use_focused_element=False,
|
27 |
+
use_error_logs=True,
|
28 |
+
use_history=True,
|
29 |
+
use_past_error_logs=False,
|
30 |
+
use_action_history=True,
|
31 |
+
use_think_history=False,
|
32 |
+
use_diff=False,
|
33 |
+
html_type='pruned_html',
|
34 |
+
use_screenshot=False,
|
35 |
+
use_som=False,
|
36 |
+
extract_visible_tag=False,
|
37 |
+
extract_clickable_tag=False,
|
38 |
+
extract_coords='False',
|
39 |
+
filter_visible_elements_only=False,
|
40 |
+
openai_vision_detail='auto',
|
41 |
+
filter_with_bid_only=False,
|
42 |
+
filter_som_only=False
|
43 |
+
),
|
44 |
+
action=ActionFlags(
|
45 |
+
action_set=HighLevelActionSetArgs(
|
46 |
+
subsets=('miniwob_all',),
|
47 |
+
multiaction=False,
|
48 |
+
strict=False,
|
49 |
+
retry_with_force=True,
|
50 |
+
demo_mode='off'
|
51 |
+
),
|
52 |
+
long_description=False,
|
53 |
+
individual_examples=False,
|
54 |
+
multi_actions=None,
|
55 |
+
is_strict=None
|
56 |
+
),
|
57 |
+
use_plan=False,
|
58 |
+
use_criticise=False,
|
59 |
+
use_thinking=True,
|
60 |
+
use_memory=True,
|
61 |
+
use_concrete_example=True,
|
62 |
+
use_abstract_example=True,
|
63 |
+
use_hints=False,
|
64 |
+
enable_chat=False,
|
65 |
+
max_prompt_tokens=40000,
|
66 |
+
be_cautious=True,
|
67 |
+
extra_instructions=None,
|
68 |
+
add_missparsed_messages=True,
|
69 |
+
max_trunc_itr=20,
|
70 |
+
flag_group=None
|
71 |
+
)
|
72 |
+
```
|
73 |
+
- Training Details
|
74 |
+
|
75 |
+
- Dataset used: [AgentTrek-6K](https://agenttrek.github.io)
|
76 |
+
- Number of training steps: 3 Epochs
|
77 |
+
- Paper Link:
|
78 |
+
|
79 |
+
- https://arxiv.org/abs/2412.09605
|
80 |
+
- Code Repository:
|
81 |
+
|
82 |
+
- https://agenttrek.github.io
|
83 |
+
- Lisense:
|
84 |
+
|
85 |
+
- apache2.0
|
results/GenericAgent-AgentTrek-1.0-32b/webarena.json
ADDED
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
[
|
2 |
+
{
|
3 |
+
"agent_name": "GenericAgent-AgentTrek-1.0-32b",
|
4 |
+
"study_id": "ac309635-f3fd-417e-ac16-1e0fc943a54f",
|
5 |
+
"date_time": "2025-01-25 10:16:41",
|
6 |
+
"benchmark": "WebArena",
|
7 |
+
"score": 22.4,
|
8 |
+
"std_err": 1.5,
|
9 |
+
"benchmark_specific": "No",
|
10 |
+
"benchmark_tuned": "No",
|
11 |
+
"followed_evaluation_protocol": "Yes",
|
12 |
+
"reproducible": "Yes",
|
13 |
+
"comments": "Additional details",
|
14 |
+
"original_or_reproduced": "Original"
|
15 |
+
}
|
16 |
+
]
|
results/GenericAgent-AgentTrek-1.0-32b/workarena-l1.json
ADDED
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
[
|
2 |
+
{
|
3 |
+
"agent_name": "GenericAgent-AgentTrek-1.0-32b",
|
4 |
+
"study_id": "ed14232c-cd7e-4708-b334-ebaf1f220000",
|
5 |
+
"date_time": "2025-01-12 00:37:04",
|
6 |
+
"benchmark": "WorkArena-L1",
|
7 |
+
"score": 38.29,
|
8 |
+
"std_err": 2.70,
|
9 |
+
"benchmark_specific": "No",
|
10 |
+
"benchmark_tuned": "No",
|
11 |
+
"followed_evaluation_protocol": "Yes",
|
12 |
+
"reproducible": "Yes",
|
13 |
+
"comments": "Additional details",
|
14 |
+
"original_or_reproduced": "Original"
|
15 |
+
}
|
16 |
+
]
|
results/GenericAgent-AgentTrek-1.0-32b/workarena-l2.json
ADDED
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
[
|
2 |
+
{
|
3 |
+
"agent_name": "GenericAgent-AgentTrek-1.0-32b",
|
4 |
+
"study_id": "957fb895-8548-46f4-92f0-5de6be7ceb61",
|
5 |
+
"date_time": "2025-01-12 09:39:21",
|
6 |
+
"benchmark": "WorkArena-L2",
|
7 |
+
"score": 2.98,
|
8 |
+
"std_err": 1.10,
|
9 |
+
"benchmark_specific": "No",
|
10 |
+
"benchmark_tuned": "No",
|
11 |
+
"followed_evaluation_protocol": "Yes",
|
12 |
+
"reproducible": "Yes",
|
13 |
+
"comments": "Additional details",
|
14 |
+
"original_or_reproduced": "Original"
|
15 |
+
}
|
16 |
+
]
|
results/GenericAgent-AgentTrek-1.0-32b/workarena-l3.json
ADDED
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
[
|
2 |
+
{
|
3 |
+
"agent_name": "GenericAgent-AgentTrek-1.0-32b",
|
4 |
+
"study_id": "a951b33f-d118-4cf4-a2ef-cc2ef204eeb0",
|
5 |
+
"date_time": "2025-01-13 12:11:45",
|
6 |
+
"benchmark": "WorkArena-L3",
|
7 |
+
"score": 0.0,
|
8 |
+
"std_err": 0.0,
|
9 |
+
"benchmark_specific": "No",
|
10 |
+
"benchmark_tuned": "No",
|
11 |
+
"followed_evaluation_protocol": "Yes",
|
12 |
+
"reproducible": "Yes",
|
13 |
+
"comments": "Additional details",
|
14 |
+
"original_or_reproduced": "Original"
|
15 |
+
}
|
16 |
+
]
|