Spaces:

ServiceNow
/

browsergym-leaderboard

Running

App Files Files Community

meghsn

ludunjie commited on Feb 3

Commit

8d4c52c

verified ·

1 Parent(s): 52facf3

GenericAgent-AgentTrek-1.0-32b (#5)

Browse files

- GenericAgent-AgentTrek-1.0-32b (5d1fb89f6c214429631953c849930825bddcf0ad)
- Fixed formatting (5e482c6860a1a676b3f1083d79d6bd32930ba125)

Co-authored-by: Lu Dunjie <[email protected]>

Files changed (6) hide show

results/GenericAgent-AgentTrek-1.0-32b/miniwob.json +16 -0
results/GenericAgent-AgentTrek-1.0-32b/readme.md +85 -0
results/GenericAgent-AgentTrek-1.0-32b/webarena.json +16 -0
results/GenericAgent-AgentTrek-1.0-32b/workarena-l1.json +16 -0
results/GenericAgent-AgentTrek-1.0-32b/workarena-l2.json +16 -0
results/GenericAgent-AgentTrek-1.0-32b/workarena-l3.json +16 -0

results/GenericAgent-AgentTrek-1.0-32b/miniwob.json ADDED Viewed

	@@ -0,0 +1,16 @@

+[
+    {
+        "agent_name": "GenericAgent-AgentTrek-1.0-32b",
+        "study_id": "4c636aa0-ea52-429d-9d7e-301b7bf0ac74",
+        "date_time": "2025-01-22 04:27:37",
+        "benchmark": "MiniWoB",
+        "score": 60.0,
+        "std_err": 2.0,
+        "benchmark_specific": "No",
+        "benchmark_tuned": "No",
+        "followed_evaluation_protocol": "Yes",
+        "reproducible": "Yes",
+        "comments": "Additional details",
+        "original_or_reproduced": "Original"
+    }
+]

results/GenericAgent-AgentTrek-1.0-32b/readme.md ADDED Viewed

	@@ -0,0 +1,85 @@

+### GenericAgent-AgentTrek-1.0-32b
+this agent is GenericAgent from Agentlab
+- **Base Model:**
+  - Qwen/Qwen2.5-32B-Instruct
+- **Architecture:**
+  - Type: Causal Language Models
+  - Training Stage: Pretraining & Post-training
+  - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
+  - Number of Parameters: 32.5B
+  - Number of Paramaters (Non-Embedding): 31.0B
+  - Number of Layers: 64
+  - Number of Attention Heads (GQA): 40 for Q and 8 for KV
+- Input/Output Format:
+  - with the following flags:
+    ```txt
+    flags=GenericPromptFlags(
+        obs=ObsFlags(
+            use_html=True,
+            use_ax_tree=True,
+            use_tabs=False,
+            use_focused_element=False,
+            use_error_logs=True,
+            use_history=True,
+            use_past_error_logs=False,
+            use_action_history=True,
+            use_think_history=False,
+            use_diff=False,
+            html_type='pruned_html',
+            use_screenshot=False,
+            use_som=False,
+            extract_visible_tag=False,
+            extract_clickable_tag=False,
+            extract_coords='False',
+            filter_visible_elements_only=False,
+            openai_vision_detail='auto',
+            filter_with_bid_only=False,
+            filter_som_only=False
+        ),
+        action=ActionFlags(
+            action_set=HighLevelActionSetArgs(
+                subsets=('miniwob_all',),
+                multiaction=False,
+                strict=False,
+                retry_with_force=True,
+                demo_mode='off'
+            ),
+            long_description=False,
+            individual_examples=False,
+            multi_actions=None,
+            is_strict=None
+        ),
+        use_plan=False,
+        use_criticise=False,
+        use_thinking=True,
+        use_memory=True,
+        use_concrete_example=True,
+        use_abstract_example=True,
+        use_hints=False,
+        enable_chat=False,
+        max_prompt_tokens=40000,
+        be_cautious=True,
+        extra_instructions=None,
+        add_missparsed_messages=True,
+        max_trunc_itr=20,
+        flag_group=None
+    )
+    ```
+- Training Details
+  - Dataset used: [AgentTrek-6K](https://agenttrek.github.io)
+  - Number of training steps: 3 Epochs
+- Paper Link:
+  - https://arxiv.org/abs/2412.09605
+- Code Repository:
+  - https://agenttrek.github.io
+- Lisense:
+  - apache2.0

results/GenericAgent-AgentTrek-1.0-32b/webarena.json ADDED Viewed

	@@ -0,0 +1,16 @@

+[
+    {
+        "agent_name": "GenericAgent-AgentTrek-1.0-32b",
+        "study_id": "ac309635-f3fd-417e-ac16-1e0fc943a54f",
+        "date_time": "2025-01-25 10:16:41",
+        "benchmark": "WebArena",
+        "score": 22.4,
+        "std_err": 1.5,
+        "benchmark_specific": "No",
+        "benchmark_tuned": "No",
+        "followed_evaluation_protocol": "Yes",
+        "reproducible": "Yes",
+        "comments": "Additional details",
+        "original_or_reproduced": "Original"
+    }
+]

results/GenericAgent-AgentTrek-1.0-32b/workarena-l1.json ADDED Viewed

	@@ -0,0 +1,16 @@

+[
+    {
+        "agent_name": "GenericAgent-AgentTrek-1.0-32b",
+        "study_id": "ed14232c-cd7e-4708-b334-ebaf1f220000",
+        "date_time": "2025-01-12 00:37:04",
+        "benchmark": "WorkArena-L1",
+        "score": 38.29,
+        "std_err": 2.70,
+        "benchmark_specific": "No",
+        "benchmark_tuned": "No",
+        "followed_evaluation_protocol": "Yes",
+        "reproducible": "Yes",
+        "comments": "Additional details",
+        "original_or_reproduced": "Original"
+    }
+]

results/GenericAgent-AgentTrek-1.0-32b/workarena-l2.json ADDED Viewed

	@@ -0,0 +1,16 @@

+[
+    {
+        "agent_name": "GenericAgent-AgentTrek-1.0-32b",
+        "study_id": "957fb895-8548-46f4-92f0-5de6be7ceb61",
+        "date_time": "2025-01-12 09:39:21",
+        "benchmark": "WorkArena-L2",
+        "score": 2.98,
+        "std_err": 1.10,
+        "benchmark_specific": "No",
+        "benchmark_tuned": "No",
+        "followed_evaluation_protocol": "Yes",
+        "reproducible": "Yes",
+        "comments": "Additional details",
+        "original_or_reproduced": "Original"
+    }
+]

results/GenericAgent-AgentTrek-1.0-32b/workarena-l3.json ADDED Viewed

	@@ -0,0 +1,16 @@

+[
+    {
+        "agent_name": "GenericAgent-AgentTrek-1.0-32b",
+        "study_id": "a951b33f-d118-4cf4-a2ef-cc2ef204eeb0",
+        "date_time": "2025-01-13 12:11:45",
+        "benchmark": "WorkArena-L3",
+        "score": 0.0,
+        "std_err": 0.0,
+        "benchmark_specific": "No",
+        "benchmark_tuned": "No",
+        "followed_evaluation_protocol": "Yes",
+        "reproducible": "Yes",
+        "comments": "Additional details",
+        "original_or_reproduced": "Original"
+    }
+]