File size: 2,603 Bytes
f280342
2ad48f3
 
f280342
 
 
36a599e
2ad48f3
f280342
2ad48f3
 
 
 
 
 
 
36a599e
2ad48f3
36a599e
2ad48f3
 
36a599e
2ad48f3
36a599e
 
 
2ad48f3
 
36a599e
2ad48f3
 
 
 
 
 
36a599e
 
 
2ad48f3
 
 
 
 
 
36a599e
 
2ad48f3
 
36a599e
2ad48f3
36a599e
2ad48f3
 
 
 
 
36a599e
2ad48f3
 
36a599e
2ad48f3
 
36a599e
2ad48f3
 
 
 
 
36a599e
2ad48f3
 
36a599e
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
---
title: ui-coordinates-finder
app_file: gradio_demo.py
sdk: gradio
sdk_version: 5.4.0
---

# OmniParser: Screen Parsing tool for Pure Vision Based GUI Agent

<p align="center">
  <img src="imgs/logo.png" alt="Logo">
</p>

[![arXiv](https://img.shields.io/badge/Paper-green)](https://arxiv.org/abs/2408.00203)
[![License](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

πŸ“’ [[Project Page](https://microsoft.github.io/OmniParser/)] [[Blog Post](https://www.microsoft.com/en-us/research/articles/omniparser-for-pure-vision-based-gui-agent/)] [[Models](https://huggingface.co/microsoft/OmniParser)]

**OmniParser** is a comprehensive method for parsing user interface screenshots into structured and easy-to-understand elements, which significantly enhances the ability of GPT-4V to generate actions that can be accurately grounded in the corresponding regions of the interface.

## News

- [2024/10] Both Interactive Region Detection Model and Icon functional description model are released! [Hugginface models](https://huggingface.co/microsoft/OmniParser)
- [2024/09] OmniParser achieves the best performance on [Windows Agent Arena](https://microsoft.github.io/WindowsAgentArena/)!

## Install

Install environment:

```python
conda create -n "omni" python==3.12
conda activate omni
pip install -r requirements.txt
```

Then download the model ckpts files in: https://huggingface.co/microsoft/OmniParser, and put them under weights/, default folder structure is: weights/icon_detect, weights/icon_caption_florence, weights/icon_caption_blip2.

Finally, convert the safetensor to .pt file.

```python
python weights/convert_safetensor_to_pt.py
```

## Examples:

We put together a few simple examples in the demo.ipynb.

## Gradio Demo

To run gradio demo, simply run:

```python
python gradio_demo.py
```

## πŸ“š Citation

Our technical report can be found [here](https://arxiv.org/abs/2408.00203).
If you find our work useful, please consider citing our work:

```
@misc{lu2024omniparserpurevisionbased,
      title={OmniParser for Pure Vision Based GUI Agent},
      author={Yadong Lu and Jianwei Yang and Yelong Shen and Ahmed Awadallah},
      year={2024},
      eprint={2408.00203},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2408.00203},
}
```

title: Ui Element Coordinates Finder
emoji: 🏒
colorFrom: pink
colorTo: red
sdk: gradio
sdk_version: 5.4.0
app_file: app.py
pinned: false
license: mit

---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference