DevShubham commited on
Commit
6b2dc4f
·
verified ·
1 Parent(s): 4565871

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. ui-tars-setup-commands.md +221 -0
ui-tars-setup-commands.md ADDED
@@ -0,0 +1,221 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # UI-TARS 1.5-7B Model Setup Commands
2
+
3
+ This document contains all the commands executed to download, convert, and quantize the ByteDance-Seed/UI-TARS-1.5-7B model for use with Ollama.
4
+
5
+ ## Prerequisites
6
+
7
+ ### 1. Verify Ollama Installation
8
+ ```bash
9
+ ollama --version
10
+ ```
11
+
12
+ ### 2. Install System Dependencies
13
+ ```bash
14
+ # Install sentencepiece via Homebrew
15
+ brew install sentencepiece
16
+
17
+ # Install Python packages
18
+ pip3 install sentencepiece gguf protobuf huggingface_hub
19
+ ```
20
+
21
+ ## Step 1: Download the UI-TARS Model
22
+
23
+ ### Create directory and download model
24
+ ```bash
25
+ # Create directory for the model
26
+ mkdir -p /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b
27
+
28
+ # Change to the directory
29
+ cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b
30
+
31
+ # Download the complete model from HuggingFace
32
+ huggingface-cli download ByteDance-Seed/UI-TARS-1.5-7B --local-dir . --local-dir-use-symlinks False
33
+
34
+ # Verify download
35
+ ls -la
36
+ ```
37
+
38
+ ## Step 2: Setup llama.cpp for Conversion
39
+
40
+ ### Clone and build llama.cpp
41
+ ```bash
42
+ # Navigate to AI directory
43
+ cd /Users/qoneqt/Desktop/shubham/ai
44
+
45
+ # Clone llama.cpp repository
46
+ git clone https://github.com/ggerganov/llama.cpp.git
47
+
48
+ # Navigate to llama.cpp directory
49
+ cd llama.cpp
50
+
51
+ # Create build directory and configure with CMake
52
+ mkdir build
53
+ cd build
54
+ cmake ..
55
+
56
+ # Build the project (this will take a few minutes)
57
+ make -j$(sysctl -n hw.ncpu)
58
+
59
+ # Verify the quantize tool was built
60
+ ls -la bin/llama-quantize
61
+ ```
62
+
63
+ ## Step 3: Convert Safetensors to GGUF Format
64
+
65
+ ### Create output directory and convert to F16 GGUF
66
+ ```bash
67
+ # Create directory for GGUF files
68
+ mkdir -p /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf
69
+
70
+ # Navigate to llama.cpp directory
71
+ cd /Users/qoneqt/Desktop/shubham/ai/llama.cpp
72
+
73
+ # Convert safetensors to F16 GGUF (this takes ~5-10 minutes)
74
+ python convert_hf_to_gguf.py /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b \
75
+ --outfile /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf \
76
+ --outtype f16
77
+
78
+ # Check the F16 file size
79
+ ls -lh /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf
80
+ ```
81
+
82
+ ## Step 4: Quantize to Q4_K_M Format
83
+
84
+ ### Quantize the F16 model to reduce size
85
+ ```bash
86
+ # Navigate to the build directory
87
+ cd /Users/qoneqt/Desktop/shubham/ai/llama.cpp/build
88
+
89
+ # Quantize F16 to Q4_K_M (this takes ~1-2 minutes)
90
+ ./bin/llama-quantize \
91
+ /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf \
92
+ /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf \
93
+ q4_k_m
94
+
95
+ # Check the quantized file size
96
+ ls -lh /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf
97
+ ```
98
+
99
+ ## Step 5: Create Modelfiles for Ollama
100
+
101
+ ### Create Modelfile for F16 version
102
+ ```bash
103
+ cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf
104
+
105
+ cat > Modelfile << 'EOF'
106
+ FROM ./ui-tars-1.5-7b-f16.gguf
107
+
108
+ TEMPLATE """<|im_start|>system
109
+ You are UI-TARS, an advanced AI assistant specialized in user interface automation and interaction. You can analyze screenshots, understand UI elements, and provide precise instructions for automating user interface tasks. When provided with a screenshot, analyze the visual elements and provide detailed, actionable guidance.
110
+
111
+ Key capabilities:
112
+ - Screenshot analysis and UI element detection
113
+ - Step-by-step automation instructions
114
+ - Precise coordinate identification for clicks and interactions
115
+ - Understanding of various UI frameworks and applications<|im_end|>
116
+ <|im_start|>user
117
+ {{ .Prompt }}<|im_end|>
118
+ <|im_start|>assistant
119
+ """
120
+
121
+ PARAMETER stop "<|end|>"
122
+ PARAMETER stop "<|user|>"
123
+ PARAMETER stop "<|assistant|>"
124
+ PARAMETER temperature 0.7
125
+ PARAMETER top_p 0.9
126
+ EOF
127
+ ```
128
+
129
+ ### Create Modelfile for quantized version
130
+ ```bash
131
+ cat > Modelfile-q4 << 'EOF'
132
+ FROM ./ui-tars-1.5-7b-q4_k_m.gguf
133
+
134
+ TEMPLATE """<|im_start|>system
135
+ You are UI-TARS, an advanced AI assistant specialized in user interface automation and interaction. You can analyze screenshots, understand UI elements, and provide precise instructions for automating user interface tasks. When provided with a screenshot, analyze the visual elements and provide detailed, actionable guidance.
136
+
137
+ Key capabilities:
138
+ - Screenshot analysis and UI element detection
139
+ - Step-by-step automation instructions
140
+ - Precise coordinate identification for clicks and interactions
141
+ - Understanding of various UI frameworks and applications<|im_end|>
142
+ <|im_start|>user
143
+ {{ .Prompt }}<|im_end|>
144
+ <|im_start|>assistant
145
+ """
146
+
147
+ PARAMETER stop "<|end|>"
148
+ PARAMETER stop "<|user|>"
149
+ PARAMETER stop "<|assistant|>"
150
+ PARAMETER temperature 0.7
151
+ PARAMETER top_p 0.9
152
+ EOF
153
+ ```
154
+
155
+ ## Step 6: Create Models in Ollama
156
+
157
+ ### Create the F16 model (high quality, larger size)
158
+ ```bash
159
+ cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf
160
+ ollama create ui-tars:latest -f Modelfile
161
+ ```
162
+
163
+ ### Create the quantized model (recommended for daily use)
164
+ ```bash
165
+ ollama create ui-tars:q4 -f Modelfile-q4
166
+ ```
167
+
168
+ ## Step 7: Verify Installation
169
+
170
+ ### List all available models
171
+ ```bash
172
+ ollama list
173
+ ```
174
+
175
+ ### Test the quantized model
176
+ ```bash
177
+ ollama run ui-tars:q4 "Hello! Can you help me with UI automation tasks?"
178
+ ```
179
+
180
+ ### Test with an image (if you have one)
181
+ ```bash
182
+ ollama run ui-tars:q4 "Analyze this screenshot and tell me what UI elements you can see" --image /path/to/your/screenshot.png
183
+ ```
184
+
185
+ ## File Sizes and Results
186
+
187
+ After completion, you should have:
188
+
189
+ - **Original model**: `/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b/` (~15GB, 19 files)
190
+ - **F16 GGUF**: `/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf` (~14.5GB)
191
+ - **Quantized GGUF**: `/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf` (~4.4GB)
192
+ - **Ollama models**:
193
+ - `ui-tars:latest` (~15GB in Ollama)
194
+ - `ui-tars:q4` (~4.7GB in Ollama) ⭐ **Recommended for daily use**
195
+
196
+ ## Usage Tips
197
+
198
+ 1. **Use the quantized model (`ui-tars:q4`)** for regular use - it's 69% smaller with minimal quality loss
199
+ 2. **The model supports vision capabilities** - you can send screenshots for UI analysis
200
+ 3. **Proper image formats**: PNG, JPEG, WebP are supported
201
+ 4. **For UI automation**: Provide clear screenshots and specific questions about what you want to automate
202
+
203
+ ## Cleanup (Optional)
204
+
205
+ If you want to save disk space after setup:
206
+
207
+ ```bash
208
+ # Remove the original downloaded files (optional)
209
+ rm -rf /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b
210
+
211
+ # Remove the F16 GGUF if you only need the quantized version (optional)
212
+ rm /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf
213
+
214
+ # Remove llama.cpp if no longer needed (optional)
215
+ rm -rf /Users/qoneqt/Desktop/shubham/ai/llama.cpp
216
+ ```
217
+
218
+ ---
219
+
220
+ **Total Setup Time**: ~20-30 minutes (depending on download and conversion speeds)
221
+ **Final Model Size**: 4.7GB (quantized) vs 15GB (original) - 69% size reduction!