meghsn ludunjie commited on
Commit
8d4c52c
·
verified ·
1 Parent(s): 52facf3

GenericAgent-AgentTrek-1.0-32b (#5)

Browse files

- GenericAgent-AgentTrek-1.0-32b (5d1fb89f6c214429631953c849930825bddcf0ad)
- Fixed formatting (5e482c6860a1a676b3f1083d79d6bd32930ba125)


Co-authored-by: Lu Dunjie <[email protected]>

results/GenericAgent-AgentTrek-1.0-32b/miniwob.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "agent_name": "GenericAgent-AgentTrek-1.0-32b",
4
+ "study_id": "4c636aa0-ea52-429d-9d7e-301b7bf0ac74",
5
+ "date_time": "2025-01-22 04:27:37",
6
+ "benchmark": "MiniWoB",
7
+ "score": 60.0,
8
+ "std_err": 2.0,
9
+ "benchmark_specific": "No",
10
+ "benchmark_tuned": "No",
11
+ "followed_evaluation_protocol": "Yes",
12
+ "reproducible": "Yes",
13
+ "comments": "Additional details",
14
+ "original_or_reproduced": "Original"
15
+ }
16
+ ]
results/GenericAgent-AgentTrek-1.0-32b/readme.md ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ### GenericAgent-AgentTrek-1.0-32b
2
+
3
+ this agent is GenericAgent from Agentlab
4
+
5
+ - **Base Model:**
6
+
7
+ - Qwen/Qwen2.5-32B-Instruct
8
+ - **Architecture:**
9
+
10
+ - Type: Causal Language Models
11
+ - Training Stage: Pretraining & Post-training
12
+ - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
13
+ - Number of Parameters: 32.5B
14
+ - Number of Paramaters (Non-Embedding): 31.0B
15
+ - Number of Layers: 64
16
+ - Number of Attention Heads (GQA): 40 for Q and 8 for KV
17
+ - Input/Output Format:
18
+
19
+ - with the following flags:
20
+ ```txt
21
+ flags=GenericPromptFlags(
22
+ obs=ObsFlags(
23
+ use_html=True,
24
+ use_ax_tree=True,
25
+ use_tabs=False,
26
+ use_focused_element=False,
27
+ use_error_logs=True,
28
+ use_history=True,
29
+ use_past_error_logs=False,
30
+ use_action_history=True,
31
+ use_think_history=False,
32
+ use_diff=False,
33
+ html_type='pruned_html',
34
+ use_screenshot=False,
35
+ use_som=False,
36
+ extract_visible_tag=False,
37
+ extract_clickable_tag=False,
38
+ extract_coords='False',
39
+ filter_visible_elements_only=False,
40
+ openai_vision_detail='auto',
41
+ filter_with_bid_only=False,
42
+ filter_som_only=False
43
+ ),
44
+ action=ActionFlags(
45
+ action_set=HighLevelActionSetArgs(
46
+ subsets=('miniwob_all',),
47
+ multiaction=False,
48
+ strict=False,
49
+ retry_with_force=True,
50
+ demo_mode='off'
51
+ ),
52
+ long_description=False,
53
+ individual_examples=False,
54
+ multi_actions=None,
55
+ is_strict=None
56
+ ),
57
+ use_plan=False,
58
+ use_criticise=False,
59
+ use_thinking=True,
60
+ use_memory=True,
61
+ use_concrete_example=True,
62
+ use_abstract_example=True,
63
+ use_hints=False,
64
+ enable_chat=False,
65
+ max_prompt_tokens=40000,
66
+ be_cautious=True,
67
+ extra_instructions=None,
68
+ add_missparsed_messages=True,
69
+ max_trunc_itr=20,
70
+ flag_group=None
71
+ )
72
+ ```
73
+ - Training Details
74
+
75
+ - Dataset used: [AgentTrek-6K](https://agenttrek.github.io)
76
+ - Number of training steps: 3 Epochs
77
+ - Paper Link:
78
+
79
+ - https://arxiv.org/abs/2412.09605
80
+ - Code Repository:
81
+
82
+ - https://agenttrek.github.io
83
+ - Lisense:
84
+
85
+ - apache2.0
results/GenericAgent-AgentTrek-1.0-32b/webarena.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "agent_name": "GenericAgent-AgentTrek-1.0-32b",
4
+ "study_id": "ac309635-f3fd-417e-ac16-1e0fc943a54f",
5
+ "date_time": "2025-01-25 10:16:41",
6
+ "benchmark": "WebArena",
7
+ "score": 22.4,
8
+ "std_err": 1.5,
9
+ "benchmark_specific": "No",
10
+ "benchmark_tuned": "No",
11
+ "followed_evaluation_protocol": "Yes",
12
+ "reproducible": "Yes",
13
+ "comments": "Additional details",
14
+ "original_or_reproduced": "Original"
15
+ }
16
+ ]
results/GenericAgent-AgentTrek-1.0-32b/workarena-l1.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "agent_name": "GenericAgent-AgentTrek-1.0-32b",
4
+ "study_id": "ed14232c-cd7e-4708-b334-ebaf1f220000",
5
+ "date_time": "2025-01-12 00:37:04",
6
+ "benchmark": "WorkArena-L1",
7
+ "score": 38.29,
8
+ "std_err": 2.70,
9
+ "benchmark_specific": "No",
10
+ "benchmark_tuned": "No",
11
+ "followed_evaluation_protocol": "Yes",
12
+ "reproducible": "Yes",
13
+ "comments": "Additional details",
14
+ "original_or_reproduced": "Original"
15
+ }
16
+ ]
results/GenericAgent-AgentTrek-1.0-32b/workarena-l2.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "agent_name": "GenericAgent-AgentTrek-1.0-32b",
4
+ "study_id": "957fb895-8548-46f4-92f0-5de6be7ceb61",
5
+ "date_time": "2025-01-12 09:39:21",
6
+ "benchmark": "WorkArena-L2",
7
+ "score": 2.98,
8
+ "std_err": 1.10,
9
+ "benchmark_specific": "No",
10
+ "benchmark_tuned": "No",
11
+ "followed_evaluation_protocol": "Yes",
12
+ "reproducible": "Yes",
13
+ "comments": "Additional details",
14
+ "original_or_reproduced": "Original"
15
+ }
16
+ ]
results/GenericAgent-AgentTrek-1.0-32b/workarena-l3.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "agent_name": "GenericAgent-AgentTrek-1.0-32b",
4
+ "study_id": "a951b33f-d118-4cf4-a2ef-cc2ef204eeb0",
5
+ "date_time": "2025-01-13 12:11:45",
6
+ "benchmark": "WorkArena-L3",
7
+ "score": 0.0,
8
+ "std_err": 0.0,
9
+ "benchmark_specific": "No",
10
+ "benchmark_tuned": "No",
11
+ "followed_evaluation_protocol": "Yes",
12
+ "reproducible": "Yes",
13
+ "comments": "Additional details",
14
+ "original_or_reproduced": "Original"
15
+ }
16
+ ]