DevShubham commited on
Commit
4565871
·
verified ·
1 Parent(s): 8317b22

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +224 -3
README.md CHANGED
@@ -1,3 +1,224 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ # UI-TARS 1.5-7B Model Setup Commands
5
+
6
+ This document contains all the commands executed to download, convert, and quantize the ByteDance-Seed/UI-TARS-1.5-7B model for use with Ollama.
7
+
8
+ ## Prerequisites
9
+
10
+ ### 1. Verify Ollama Installation
11
+ ```bash
12
+ ollama --version
13
+ ```
14
+
15
+ ### 2. Install System Dependencies
16
+ ```bash
17
+ # Install sentencepiece via Homebrew
18
+ brew install sentencepiece
19
+
20
+ # Install Python packages
21
+ pip3 install sentencepiece gguf protobuf huggingface_hub
22
+ ```
23
+
24
+ ## Step 1: Download the UI-TARS Model
25
+
26
+ ### Create directory and download model
27
+ ```bash
28
+ # Create directory for the model
29
+ mkdir -p /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b
30
+
31
+ # Change to the directory
32
+ cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b
33
+
34
+ # Download the complete model from HuggingFace
35
+ huggingface-cli download ByteDance-Seed/UI-TARS-1.5-7B --local-dir . --local-dir-use-symlinks False
36
+
37
+ # Verify download
38
+ ls -la
39
+ ```
40
+
41
+ ## Step 2: Setup llama.cpp for Conversion
42
+
43
+ ### Clone and build llama.cpp
44
+ ```bash
45
+ # Navigate to AI directory
46
+ cd /Users/qoneqt/Desktop/shubham/ai
47
+
48
+ # Clone llama.cpp repository
49
+ git clone https://github.com/ggerganov/llama.cpp.git
50
+
51
+ # Navigate to llama.cpp directory
52
+ cd llama.cpp
53
+
54
+ # Create build directory and configure with CMake
55
+ mkdir build
56
+ cd build
57
+ cmake ..
58
+
59
+ # Build the project (this will take a few minutes)
60
+ make -j$(sysctl -n hw.ncpu)
61
+
62
+ # Verify the quantize tool was built
63
+ ls -la bin/llama-quantize
64
+ ```
65
+
66
+ ## Step 3: Convert Safetensors to GGUF Format
67
+
68
+ ### Create output directory and convert to F16 GGUF
69
+ ```bash
70
+ # Create directory for GGUF files
71
+ mkdir -p /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf
72
+
73
+ # Navigate to llama.cpp directory
74
+ cd /Users/qoneqt/Desktop/shubham/ai/llama.cpp
75
+
76
+ # Convert safetensors to F16 GGUF (this takes ~5-10 minutes)
77
+ python convert_hf_to_gguf.py /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b \
78
+ --outfile /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf \
79
+ --outtype f16
80
+
81
+ # Check the F16 file size
82
+ ls -lh /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf
83
+ ```
84
+
85
+ ## Step 4: Quantize to Q4_K_M Format
86
+
87
+ ### Quantize the F16 model to reduce size
88
+ ```bash
89
+ # Navigate to the build directory
90
+ cd /Users/qoneqt/Desktop/shubham/ai/llama.cpp/build
91
+
92
+ # Quantize F16 to Q4_K_M (this takes ~1-2 minutes)
93
+ ./bin/llama-quantize \
94
+ /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf \
95
+ /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf \
96
+ q4_k_m
97
+
98
+ # Check the quantized file size
99
+ ls -lh /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf
100
+ ```
101
+
102
+ ## Step 5: Create Modelfiles for Ollama
103
+
104
+ ### Create Modelfile for F16 version
105
+ ```bash
106
+ cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf
107
+
108
+ cat > Modelfile << 'EOF'
109
+ FROM ./ui-tars-1.5-7b-f16.gguf
110
+
111
+ TEMPLATE """<|im_start|>system
112
+ You are UI-TARS, an advanced AI assistant specialized in user interface automation and interaction. You can analyze screenshots, understand UI elements, and provide precise instructions for automating user interface tasks. When provided with a screenshot, analyze the visual elements and provide detailed, actionable guidance.
113
+
114
+ Key capabilities:
115
+ - Screenshot analysis and UI element detection
116
+ - Step-by-step automation instructions
117
+ - Precise coordinate identification for clicks and interactions
118
+ - Understanding of various UI frameworks and applications<|im_end|>
119
+ <|im_start|>user
120
+ {{ .Prompt }}<|im_end|>
121
+ <|im_start|>assistant
122
+ """
123
+
124
+ PARAMETER stop "<|end|>"
125
+ PARAMETER stop "<|user|>"
126
+ PARAMETER stop "<|assistant|>"
127
+ PARAMETER temperature 0.7
128
+ PARAMETER top_p 0.9
129
+ EOF
130
+ ```
131
+
132
+ ### Create Modelfile for quantized version
133
+ ```bash
134
+ cat > Modelfile-q4 << 'EOF'
135
+ FROM ./ui-tars-1.5-7b-q4_k_m.gguf
136
+
137
+ TEMPLATE """<|im_start|>system
138
+ You are UI-TARS, an advanced AI assistant specialized in user interface automation and interaction. You can analyze screenshots, understand UI elements, and provide precise instructions for automating user interface tasks. When provided with a screenshot, analyze the visual elements and provide detailed, actionable guidance.
139
+
140
+ Key capabilities:
141
+ - Screenshot analysis and UI element detection
142
+ - Step-by-step automation instructions
143
+ - Precise coordinate identification for clicks and interactions
144
+ - Understanding of various UI frameworks and applications<|im_end|>
145
+ <|im_start|>user
146
+ {{ .Prompt }}<|im_end|>
147
+ <|im_start|>assistant
148
+ """
149
+
150
+ PARAMETER stop "<|end|>"
151
+ PARAMETER stop "<|user|>"
152
+ PARAMETER stop "<|assistant|>"
153
+ PARAMETER temperature 0.7
154
+ PARAMETER top_p 0.9
155
+ EOF
156
+ ```
157
+
158
+ ## Step 6: Create Models in Ollama
159
+
160
+ ### Create the F16 model (high quality, larger size)
161
+ ```bash
162
+ cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf
163
+ ollama create ui-tars:latest -f Modelfile
164
+ ```
165
+
166
+ ### Create the quantized model (recommended for daily use)
167
+ ```bash
168
+ ollama create ui-tars:q4 -f Modelfile-q4
169
+ ```
170
+
171
+ ## Step 7: Verify Installation
172
+
173
+ ### List all available models
174
+ ```bash
175
+ ollama list
176
+ ```
177
+
178
+ ### Test the quantized model
179
+ ```bash
180
+ ollama run ui-tars:q4 "Hello! Can you help me with UI automation tasks?"
181
+ ```
182
+
183
+ ### Test with an image (if you have one)
184
+ ```bash
185
+ ollama run ui-tars:q4 "Analyze this screenshot and tell me what UI elements you can see" --image /path/to/your/screenshot.png
186
+ ```
187
+
188
+ ## File Sizes and Results
189
+
190
+ After completion, you should have:
191
+
192
+ - **Original model**: `/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b/` (~15GB, 19 files)
193
+ - **F16 GGUF**: `/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf` (~14.5GB)
194
+ - **Quantized GGUF**: `/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf` (~4.4GB)
195
+ - **Ollama models**:
196
+ - `ui-tars:latest` (~15GB in Ollama)
197
+ - `ui-tars:q4` (~4.7GB in Ollama) ⭐ **Recommended for daily use**
198
+
199
+ ## Usage Tips
200
+
201
+ 1. **Use the quantized model (`ui-tars:q4`)** for regular use - it's 69% smaller with minimal quality loss
202
+ 2. **The model supports vision capabilities** - you can send screenshots for UI analysis
203
+ 3. **Proper image formats**: PNG, JPEG, WebP are supported
204
+ 4. **For UI automation**: Provide clear screenshots and specific questions about what you want to automate
205
+
206
+ ## Cleanup (Optional)
207
+
208
+ If you want to save disk space after setup:
209
+
210
+ ```bash
211
+ # Remove the original downloaded files (optional)
212
+ rm -rf /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b
213
+
214
+ # Remove the F16 GGUF if you only need the quantized version (optional)
215
+ rm /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf
216
+
217
+ # Remove llama.cpp if no longer needed (optional)
218
+ rm -rf /Users/qoneqt/Desktop/shubham/ai/llama.cpp
219
+ ```
220
+
221
+ ---
222
+
223
+ **Total Setup Time**: ~20-30 minutes (depending on download and conversion speeds)
224
+ **Final Model Size**: 4.7GB (quantized) vs 15GB (original) - 69% size reduction!