File size: 6,697 Bytes
6b2dc4f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
# UI-TARS 1.5-7B Model Setup Commands

This document contains all the commands executed to download, convert, and quantize the ByteDance-Seed/UI-TARS-1.5-7B model for use with Ollama.

## Prerequisites

### 1. Verify Ollama Installation
```bash
ollama --version
```

### 2. Install System Dependencies
```bash
# Install sentencepiece via Homebrew
brew install sentencepiece

# Install Python packages
pip3 install sentencepiece gguf protobuf huggingface_hub
```

## Step 1: Download the UI-TARS Model

### Create directory and download model
```bash
# Create directory for the model
mkdir -p /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b

# Change to the directory
cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b

# Download the complete model from HuggingFace
huggingface-cli download ByteDance-Seed/UI-TARS-1.5-7B --local-dir . --local-dir-use-symlinks False

# Verify download
ls -la
```

## Step 2: Setup llama.cpp for Conversion

### Clone and build llama.cpp
```bash
# Navigate to AI directory
cd /Users/qoneqt/Desktop/shubham/ai

# Clone llama.cpp repository
git clone https://github.com/ggerganov/llama.cpp.git

# Navigate to llama.cpp directory
cd llama.cpp

# Create build directory and configure with CMake
mkdir build
cd build
cmake ..

# Build the project (this will take a few minutes)
make -j$(sysctl -n hw.ncpu)

# Verify the quantize tool was built
ls -la bin/llama-quantize
```

## Step 3: Convert Safetensors to GGUF Format

### Create output directory and convert to F16 GGUF
```bash
# Create directory for GGUF files
mkdir -p /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf

# Navigate to llama.cpp directory
cd /Users/qoneqt/Desktop/shubham/ai/llama.cpp

# Convert safetensors to F16 GGUF (this takes ~5-10 minutes)
python convert_hf_to_gguf.py /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b \
  --outfile /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf \
  --outtype f16

# Check the F16 file size
ls -lh /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf
```

## Step 4: Quantize to Q4_K_M Format

### Quantize the F16 model to reduce size
```bash
# Navigate to the build directory
cd /Users/qoneqt/Desktop/shubham/ai/llama.cpp/build

# Quantize F16 to Q4_K_M (this takes ~1-2 minutes)
./bin/llama-quantize \
  /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf \
  /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf \
  q4_k_m

# Check the quantized file size
ls -lh /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf
```

## Step 5: Create Modelfiles for Ollama

### Create Modelfile for F16 version
```bash
cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf

cat > Modelfile << 'EOF'
FROM ./ui-tars-1.5-7b-f16.gguf

TEMPLATE """<|im_start|>system
You are UI-TARS, an advanced AI assistant specialized in user interface automation and interaction. You can analyze screenshots, understand UI elements, and provide precise instructions for automating user interface tasks. When provided with a screenshot, analyze the visual elements and provide detailed, actionable guidance.

Key capabilities:
- Screenshot analysis and UI element detection
- Step-by-step automation instructions
- Precise coordinate identification for clicks and interactions
- Understanding of various UI frameworks and applications<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""

PARAMETER stop "<|end|>"
PARAMETER stop "<|user|>"
PARAMETER stop "<|assistant|>"
PARAMETER temperature 0.7
PARAMETER top_p 0.9
EOF
```

### Create Modelfile for quantized version
```bash
cat > Modelfile-q4 << 'EOF'
FROM ./ui-tars-1.5-7b-q4_k_m.gguf

TEMPLATE """<|im_start|>system
You are UI-TARS, an advanced AI assistant specialized in user interface automation and interaction. You can analyze screenshots, understand UI elements, and provide precise instructions for automating user interface tasks. When provided with a screenshot, analyze the visual elements and provide detailed, actionable guidance.

Key capabilities:
- Screenshot analysis and UI element detection
- Step-by-step automation instructions
- Precise coordinate identification for clicks and interactions
- Understanding of various UI frameworks and applications<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""

PARAMETER stop "<|end|>"
PARAMETER stop "<|user|>"
PARAMETER stop "<|assistant|>"
PARAMETER temperature 0.7
PARAMETER top_p 0.9
EOF
```

## Step 6: Create Models in Ollama

### Create the F16 model (high quality, larger size)
```bash
cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf
ollama create ui-tars:latest -f Modelfile
```

### Create the quantized model (recommended for daily use)
```bash
ollama create ui-tars:q4 -f Modelfile-q4
```

## Step 7: Verify Installation

### List all available models
```bash
ollama list
```

### Test the quantized model
```bash
ollama run ui-tars:q4 "Hello! Can you help me with UI automation tasks?"
```

### Test with an image (if you have one)
```bash
ollama run ui-tars:q4 "Analyze this screenshot and tell me what UI elements you can see" --image /path/to/your/screenshot.png
```

## File Sizes and Results

After completion, you should have:

- **Original model**: `/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b/` (~15GB, 19 files)
- **F16 GGUF**: `/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf` (~14.5GB)
- **Quantized GGUF**: `/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf` (~4.4GB)
- **Ollama models**:
  - `ui-tars:latest` (~15GB in Ollama)
  - `ui-tars:q4` (~4.7GB in Ollama) ⭐ **Recommended for daily use**

## Usage Tips

1. **Use the quantized model (`ui-tars:q4`)** for regular use - it's 69% smaller with minimal quality loss
2. **The model supports vision capabilities** - you can send screenshots for UI analysis
3. **Proper image formats**: PNG, JPEG, WebP are supported
4. **For UI automation**: Provide clear screenshots and specific questions about what you want to automate

## Cleanup (Optional)

If you want to save disk space after setup:

```bash
# Remove the original downloaded files (optional)
rm -rf /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b

# Remove the F16 GGUF if you only need the quantized version (optional)
rm /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf

# Remove llama.cpp if no longer needed (optional)
rm -rf /Users/qoneqt/Desktop/shubham/ai/llama.cpp
```

---

**Total Setup Time**: ~20-30 minutes (depending on download and conversion speeds)
**Final Model Size**: 4.7GB (quantized) vs 15GB (original) - 69% size reduction!