BoyuNLP commited on
Commit
4946fb3
·
verified ·
1 Parent(s): 14c14d5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -0
README.md CHANGED
@@ -38,6 +38,47 @@ UGround is a storng GUI visual grounding model trained with a simple recipe. Che
38
  - [x] Online Demo (HF Spaces)
39
 
40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6500870f1e14749e84f8f887/u5bXFxxAWCXthyXWyZkM4.png)
42
 
43
  ## Citation Information
 
38
  - [x] Online Demo (HF Spaces)
39
 
40
 
41
+ ## Inference
42
+
43
+ ### vLLM server
44
+
45
+ ```bash
46
+ vllm serve osunlp/UGround-V1-7B --api-key token-abc123 --dtype float16
47
+ ```
48
+
49
+ ### Visual Grounding Prompt
50
+ ```python
51
+ def format_openai_template(description: str, base64_image):
52
+ return [
53
+ {
54
+ "role": "user",
55
+ "content": [
56
+ {
57
+ "type": "image_url",
58
+ "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
59
+ },
60
+ {
61
+ "type": "text",
62
+ "text": f"""
63
+ Your task is to help the user identify the precise coordinates (x, y) of a specific area/element/object on the screen based on a description.
64
+
65
+ - Your response should aim to point to the center or a representative point within the described area/element/object as accurately as possible.
66
+ - If the description is unclear or ambiguous, infer the most relevant area or element based on its likely context or purpose.
67
+ - Your answer should be a single string (x, y) corresponding to the point of the interest.
68
+
69
+ Description: {description}
70
+
71
+ Answer:"""
72
+ },
73
+ ],
74
+ },
75
+ ]
76
+ ```
77
+
78
+
79
+
80
+
81
+
82
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6500870f1e14749e84f8f887/u5bXFxxAWCXthyXWyZkM4.png)
83
 
84
  ## Citation Information