Safetensors
huangzhiyuan commited on
Commit
82632e3
·
1 Parent(s): f1a226b

update readme

Browse files
Files changed (1) hide show
  1. README.md +49 -1
README.md CHANGED
@@ -6,12 +6,37 @@ base_model:
6
 
7
  ## SpiritSight Agent: Advanced GUI Agent with One Look
8
 
 
 
 
 
 
 
 
9
  ## Introduction
10
- SpiritSight id a vision-based, end-to-end GUI agent that excels in GUI navigation tasks across various GUI platforms.
 
11
 
12
  ![](results.png)
13
  ![](results2.png)
14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  ## Inference
16
 
17
  ```shell
@@ -23,8 +48,31 @@ pip install flash-attn==2.3.6 --no-build-isolation
23
  python infer_SSAgent-8B.py
24
  ```
25
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  ## Acknowledgments
27
 
28
  We thank the following amazing projects that truly inspired us:
29
 
30
  - [InternVL2](https://huggingface.co/OpenGVLab/InternVL2-8B)
 
 
 
 
 
 
 
6
 
7
  ## SpiritSight Agent: Advanced GUI Agent with One Look
8
 
9
+ <p align="center">
10
+ <a href="https://arxiv.org/abs/2503.03196">📄 Paper</a> •
11
+ <a href="https://huggingface.co/SenseLLM/SpiritSight-Agent-8B">🤖 Models</a> •
12
+ <a href="" style="pointer-events: none">📚 Datasets (Coming soon…)</a>
13
+ </p>
14
+
15
+
16
  ## Introduction
17
+
18
+ SpiritSight-Agent is a vision-based, end-to-end GUI agent that excels in GUI navigation tasks across various GUI platforms.
19
 
20
  ![](results.png)
21
  ![](results2.png)
22
 
23
+
24
+ ## Models
25
+
26
+ We recommend fine-tuning the base model on custom data.
27
+
28
+ | Model | Checkpoint | Size | License|
29
+ |:-------|:------------|:------|:--------|
30
+ | SpiritSight-Agent-2B-base | 🤗 [HF Link](https://huggingface.co/SenseLLM/SpiritSight-Agent-2B) | 2B | [InternVL](https://github.com/OpenGVLab/InternVL/blob/main/LICENSE) |
31
+ | SpiritSight-Agent-8B-base | 🤗 [HF Link](https://huggingface.co/SenseLLM/SpiritSight-Agent-8B) | 8B | [InternVL](https://github.com/OpenGVLab/InternVL/blob/main/LICENSE) |
32
+ | SpiritSight-Agent-26B-base | 🤗 [HF Link](https://huggingface.co/SenseLLM/SpiritSight-Agent-26B) | 26B | [InternVL](https://github.com/OpenGVLab/InternVL/blob/main/LICENSE) |
33
+
34
+
35
+ ## Datasets
36
+
37
+ Coming soon.
38
+
39
+
40
  ## Inference
41
 
42
  ```shell
 
48
  python infer_SSAgent-8B.py
49
  ```
50
 
51
+
52
+ ## Citation
53
+
54
+ If you find this repo useful for your research, please kindly cite our paper:
55
+ ```
56
+ @misc{huang2025spiritsightagentadvancedgui,
57
+ title={SpiritSight Agent: Advanced GUI Agent with One Look},
58
+ author={Zhiyuan Huang and Ziming Cheng and Junting Pan and Zhaohui Hou and Mingjie Zhan},
59
+ year={2025},
60
+ eprint={2503.03196},
61
+ archivePrefix={arXiv},
62
+ primaryClass={cs.CV},
63
+ url={https://arxiv.org/abs/2503.03196},
64
+ }
65
+ ```
66
+
67
+
68
  ## Acknowledgments
69
 
70
  We thank the following amazing projects that truly inspired us:
71
 
72
  - [InternVL2](https://huggingface.co/OpenGVLab/InternVL2-8B)
73
+ - [SeeClick]( https://github.com/njucckevin/SeeClick)
74
+ - [Mind2Web](https://huggingface.co/datasets/osunlp/Multimodal-Mind2Web)
75
+ - [GUI-Odyssey](https://github.com/OpenGVLab/GUI-Odyssey)
76
+ - [AMEX](https://huggingface.co/datasets/Yuxiang007/AMEX)
77
+ - [AndroidControl](https://github.com/google-research/google-research/tree/master/android_control)
78
+ - [GUICourse](https://github.com/yiye3/GUICourse)