Spaces:
Build error
Build error
File size: 2,603 Bytes
f280342 2ad48f3 f280342 36a599e 2ad48f3 f280342 2ad48f3 36a599e 2ad48f3 36a599e 2ad48f3 36a599e 2ad48f3 36a599e 2ad48f3 36a599e 2ad48f3 36a599e 2ad48f3 36a599e 2ad48f3 36a599e 2ad48f3 36a599e 2ad48f3 36a599e 2ad48f3 36a599e 2ad48f3 36a599e 2ad48f3 36a599e 2ad48f3 36a599e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
---
title: ui-coordinates-finder
app_file: gradio_demo.py
sdk: gradio
sdk_version: 5.4.0
---
# OmniParser: Screen Parsing tool for Pure Vision Based GUI Agent
<p align="center">
<img src="imgs/logo.png" alt="Logo">
</p>
[](https://arxiv.org/abs/2408.00203)
[](https://opensource.org/licenses/MIT)
π’ [[Project Page](https://microsoft.github.io/OmniParser/)] [[Blog Post](https://www.microsoft.com/en-us/research/articles/omniparser-for-pure-vision-based-gui-agent/)] [[Models](https://huggingface.co/microsoft/OmniParser)]
**OmniParser** is a comprehensive method for parsing user interface screenshots into structured and easy-to-understand elements, which significantly enhances the ability of GPT-4V to generate actions that can be accurately grounded in the corresponding regions of the interface.
## News
- [2024/10] Both Interactive Region Detection Model and Icon functional description model are released! [Hugginface models](https://huggingface.co/microsoft/OmniParser)
- [2024/09] OmniParser achieves the best performance on [Windows Agent Arena](https://microsoft.github.io/WindowsAgentArena/)!
## Install
Install environment:
```python
conda create -n "omni" python==3.12
conda activate omni
pip install -r requirements.txt
```
Then download the model ckpts files in: https://huggingface.co/microsoft/OmniParser, and put them under weights/, default folder structure is: weights/icon_detect, weights/icon_caption_florence, weights/icon_caption_blip2.
Finally, convert the safetensor to .pt file.
```python
python weights/convert_safetensor_to_pt.py
```
## Examples:
We put together a few simple examples in the demo.ipynb.
## Gradio Demo
To run gradio demo, simply run:
```python
python gradio_demo.py
```
## π Citation
Our technical report can be found [here](https://arxiv.org/abs/2408.00203).
If you find our work useful, please consider citing our work:
```
@misc{lu2024omniparserpurevisionbased,
title={OmniParser for Pure Vision Based GUI Agent},
author={Yadong Lu and Jianwei Yang and Yelong Shen and Ahmed Awadallah},
year={2024},
eprint={2408.00203},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2408.00203},
}
```
title: Ui Element Coordinates Finder
emoji: π’
colorFrom: pink
colorTo: red
sdk: gradio
sdk_version: 5.4.0
app_file: app.py
pinned: false
license: mit
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|