Sanket17 commited on
Commit
cde88c0
·
verified ·
1 Parent(s): 86125c5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -5
README.md CHANGED
@@ -1,10 +1,62 @@
1
  ---
2
  title: NewOmniParser
3
- emoji:
4
- colorFrom: green
5
- colorTo: green
6
  sdk: docker
7
- pinned: false
 
8
  ---
 
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: NewOmniParser
3
+ emoji: 💻
4
+ colorFrom: yellow
5
+ colorTo: yellow
6
  sdk: docker
7
+ pinned: true
8
+ license: mit
9
  ---
10
+ # OmniParser API
11
 
12
+ Self-hosted version of Microsoft's [OmniParser](https://huggingface.co/microsoft/OmniParser) Image-to-text model.
13
+
14
+ > OmniParser is a general screen parsing tool, which interprets/converts UI screenshot to structured format, to improve existing LLM based UI agent. Training Datasets include: 1) an interactable icon detection dataset, which was curated from popular web pages and automatically annotated to highlight clickable and actionable regions, and 2) an icon description dataset, designed to associate each UI element with its corresponding function.
15
+
16
+ ## Why?
17
+
18
+ There's already a great HuggingFace gradio [app](https://huggingface.co/spaces/microsoft/OmniParser) for this model. It even offers an API. But
19
+
20
+ - Gradio is much slower than serving the model directly (like we do here)
21
+ - HF is rate-limited
22
+
23
+ ## How it works
24
+
25
+ If you look at the Dockerfile, we start off with the HF demo image to retrive all the weights and util functions. Then we add a simple FastAPI server (under main.py) to serve the model.
26
+
27
+ ## Getting Started
28
+
29
+ ### Requirements
30
+
31
+ - GPU
32
+ - 16 GB Ram (swap recommended)
33
+
34
+ ### Locally
35
+
36
+ 1. Clone the repository
37
+ 2. Build the docker image: `docker build -t omni-parser-app .`
38
+ 3. Run the docker container: `docker run -p 7860:7860 omni-parser-app`
39
+
40
+ ### Self-hosted API
41
+
42
+ I suggest hosting on [fly.io](https://fly.io) because it's quick and simple to deploy with a CLI.
43
+
44
+ This repo is ready-made for deployment on fly.io (see fly.toml for configuration). Just run `fly launch` and follow the prompts.
45
+
46
+ ## Docs
47
+
48
+ Visit `http://localhost:7860/docs` for the API documentation. There's only one route `/process_image` which returns
49
+
50
+ - The image with bounding boxes drawn on (in base64) format
51
+ - The parsed elements in a list with text descriptions
52
+ - The bounding box coordinates of the parsed elements
53
+
54
+ ## Examples
55
+
56
+ | Before Image | After Image |
57
+ | ---------------------------------- | ----------------------------- |
58
+ | ![Before](examples/screenshot.png) | ![After](examples/after.webp) |
59
+
60
+ ## Related Projects
61
+
62
+ Check out [OneQuery](https://query-rho.vercel.app), an agent that browses the web and returns structured responses for any query, simple or complex. OneQuery is built using OmniParser to enhance its capabilities.