File size: 6,725 Bytes
4565871 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 |
---
license: apache-2.0
---
# UI-TARS 1.5-7B Model Setup Commands
This document contains all the commands executed to download, convert, and quantize the ByteDance-Seed/UI-TARS-1.5-7B model for use with Ollama.
## Prerequisites
### 1. Verify Ollama Installation
```bash
ollama --version
```
### 2. Install System Dependencies
```bash
# Install sentencepiece via Homebrew
brew install sentencepiece
# Install Python packages
pip3 install sentencepiece gguf protobuf huggingface_hub
```
## Step 1: Download the UI-TARS Model
### Create directory and download model
```bash
# Create directory for the model
mkdir -p /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b
# Change to the directory
cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b
# Download the complete model from HuggingFace
huggingface-cli download ByteDance-Seed/UI-TARS-1.5-7B --local-dir . --local-dir-use-symlinks False
# Verify download
ls -la
```
## Step 2: Setup llama.cpp for Conversion
### Clone and build llama.cpp
```bash
# Navigate to AI directory
cd /Users/qoneqt/Desktop/shubham/ai
# Clone llama.cpp repository
git clone https://github.com/ggerganov/llama.cpp.git
# Navigate to llama.cpp directory
cd llama.cpp
# Create build directory and configure with CMake
mkdir build
cd build
cmake ..
# Build the project (this will take a few minutes)
make -j$(sysctl -n hw.ncpu)
# Verify the quantize tool was built
ls -la bin/llama-quantize
```
## Step 3: Convert Safetensors to GGUF Format
### Create output directory and convert to F16 GGUF
```bash
# Create directory for GGUF files
mkdir -p /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf
# Navigate to llama.cpp directory
cd /Users/qoneqt/Desktop/shubham/ai/llama.cpp
# Convert safetensors to F16 GGUF (this takes ~5-10 minutes)
python convert_hf_to_gguf.py /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b \
--outfile /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf \
--outtype f16
# Check the F16 file size
ls -lh /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf
```
## Step 4: Quantize to Q4_K_M Format
### Quantize the F16 model to reduce size
```bash
# Navigate to the build directory
cd /Users/qoneqt/Desktop/shubham/ai/llama.cpp/build
# Quantize F16 to Q4_K_M (this takes ~1-2 minutes)
./bin/llama-quantize \
/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf \
/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf \
q4_k_m
# Check the quantized file size
ls -lh /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf
```
## Step 5: Create Modelfiles for Ollama
### Create Modelfile for F16 version
```bash
cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf
cat > Modelfile << 'EOF'
FROM ./ui-tars-1.5-7b-f16.gguf
TEMPLATE """<|im_start|>system
You are UI-TARS, an advanced AI assistant specialized in user interface automation and interaction. You can analyze screenshots, understand UI elements, and provide precise instructions for automating user interface tasks. When provided with a screenshot, analyze the visual elements and provide detailed, actionable guidance.
Key capabilities:
- Screenshot analysis and UI element detection
- Step-by-step automation instructions
- Precise coordinate identification for clicks and interactions
- Understanding of various UI frameworks and applications<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""
PARAMETER stop "<|end|>"
PARAMETER stop "<|user|>"
PARAMETER stop "<|assistant|>"
PARAMETER temperature 0.7
PARAMETER top_p 0.9
EOF
```
### Create Modelfile for quantized version
```bash
cat > Modelfile-q4 << 'EOF'
FROM ./ui-tars-1.5-7b-q4_k_m.gguf
TEMPLATE """<|im_start|>system
You are UI-TARS, an advanced AI assistant specialized in user interface automation and interaction. You can analyze screenshots, understand UI elements, and provide precise instructions for automating user interface tasks. When provided with a screenshot, analyze the visual elements and provide detailed, actionable guidance.
Key capabilities:
- Screenshot analysis and UI element detection
- Step-by-step automation instructions
- Precise coordinate identification for clicks and interactions
- Understanding of various UI frameworks and applications<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""
PARAMETER stop "<|end|>"
PARAMETER stop "<|user|>"
PARAMETER stop "<|assistant|>"
PARAMETER temperature 0.7
PARAMETER top_p 0.9
EOF
```
## Step 6: Create Models in Ollama
### Create the F16 model (high quality, larger size)
```bash
cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf
ollama create ui-tars:latest -f Modelfile
```
### Create the quantized model (recommended for daily use)
```bash
ollama create ui-tars:q4 -f Modelfile-q4
```
## Step 7: Verify Installation
### List all available models
```bash
ollama list
```
### Test the quantized model
```bash
ollama run ui-tars:q4 "Hello! Can you help me with UI automation tasks?"
```
### Test with an image (if you have one)
```bash
ollama run ui-tars:q4 "Analyze this screenshot and tell me what UI elements you can see" --image /path/to/your/screenshot.png
```
## File Sizes and Results
After completion, you should have:
- **Original model**: `/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b/` (~15GB, 19 files)
- **F16 GGUF**: `/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf` (~14.5GB)
- **Quantized GGUF**: `/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf` (~4.4GB)
- **Ollama models**:
- `ui-tars:latest` (~15GB in Ollama)
- `ui-tars:q4` (~4.7GB in Ollama) ⭐ **Recommended for daily use**
## Usage Tips
1. **Use the quantized model (`ui-tars:q4`)** for regular use - it's 69% smaller with minimal quality loss
2. **The model supports vision capabilities** - you can send screenshots for UI analysis
3. **Proper image formats**: PNG, JPEG, WebP are supported
4. **For UI automation**: Provide clear screenshots and specific questions about what you want to automate
## Cleanup (Optional)
If you want to save disk space after setup:
```bash
# Remove the original downloaded files (optional)
rm -rf /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b
# Remove the F16 GGUF if you only need the quantized version (optional)
rm /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf
# Remove llama.cpp if no longer needed (optional)
rm -rf /Users/qoneqt/Desktop/shubham/ai/llama.cpp
```
---
**Total Setup Time**: ~20-30 minutes (depending on download and conversion speeds)
**Final Model Size**: 4.7GB (quantized) vs 15GB (original) - 69% size reduction!
|