File size: 6,697 Bytes

6b2dc4f

# UI-TARS 1.5-7B Model Setup Commands

This document contains all the commands executed to download, convert, and quantize the ByteDance-Seed/UI-TARS-1.5-7B model for use with Ollama.

## Prerequisites

### 1. Verify Ollama Installation
```bash
ollama --version
```

### 2. Install System Dependencies
```bash
# Install sentencepiece via Homebrew
brew install sentencepiece

# Install Python packages
pip3 install sentencepiece gguf protobuf huggingface_hub
```

## Step 1: Download the UI-TARS Model

### Create directory and download model
```bash
# Create directory for the model
mkdir -p /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b

# Change to the directory
cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b

# Download the complete model from HuggingFace
huggingface-cli download ByteDance-Seed/UI-TARS-1.5-7B --local-dir . --local-dir-use-symlinks False

# Verify download
ls -la
```

## Step 2: Setup llama.cpp for Conversion

### Clone and build llama.cpp
```bash
# Navigate to AI directory
cd /Users/qoneqt/Desktop/shubham/ai

# Clone llama.cpp repository
git clone https://github.com/ggerganov/llama.cpp.git

# Navigate to llama.cpp directory
cd llama.cpp

# Create build directory and configure with CMake
mkdir build
cd build
cmake ..

# Build the project (this will take a few minutes)
make -j$(sysctl -n hw.ncpu)

# Verify the quantize tool was built
ls -la bin/llama-quantize
```

## Step 3: Convert Safetensors to GGUF Format

### Create output directory and convert to F16 GGUF
```bash
# Create directory for GGUF files
mkdir -p /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf

# Navigate to llama.cpp directory
cd /Users/qoneqt/Desktop/shubham/ai/llama.cpp

# Convert safetensors to F16 GGUF (this takes ~5-10 minutes)
python convert_hf_to_gguf.py /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b \
  --outfile /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf \
  --outtype f16

# Check the F16 file size
ls -lh /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf
```

## Step 4: Quantize to Q4_K_M Format

### Quantize the F16 model to reduce size
```bash
# Navigate to the build directory
cd /Users/qoneqt/Desktop/shubham/ai/llama.cpp/build

# Quantize F16 to Q4_K_M (this takes ~1-2 minutes)
./bin/llama-quantize \
  /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf \
  /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf \
  q4_k_m

# Check the quantized file size
ls -lh /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf
```

## Step 5: Create Modelfiles for Ollama

### Create Modelfile for F16 version
```bash
cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf

cat > Modelfile << 'EOF'
FROM ./ui-tars-1.5-7b-f16.gguf

TEMPLATE """<|im_start|>system
You are UI-TARS, an advanced AI assistant specialized in user interface automation and interaction. You can analyze screenshots, understand UI elements, and provide precise instructions for automating user interface tasks. When provided with a screenshot, analyze the visual elements and provide detailed, actionable guidance.

Key capabilities:
- Screenshot analysis and UI element detection
- Step-by-step automation instructions
- Precise coordinate identification for clicks and interactions
- Understanding of various UI frameworks and applications<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""

PARAMETER stop "<|end|>"
PARAMETER stop "<|user|>"
PARAMETER stop "<|assistant|>"
PARAMETER temperature 0.7
PARAMETER top_p 0.9
EOF
```

### Create Modelfile for quantized version
```bash
cat > Modelfile-q4 << 'EOF'
FROM ./ui-tars-1.5-7b-q4_k_m.gguf

TEMPLATE """<|im_start|>system
You are UI-TARS, an advanced AI assistant specialized in user interface automation and interaction. You can analyze screenshots, understand UI elements, and provide precise instructions for automating user interface tasks. When provided with a screenshot, analyze the visual elements and provide detailed, actionable guidance.

Key capabilities:
- Screenshot analysis and UI element detection
- Step-by-step automation instructions
- Precise coordinate identification for clicks and interactions
- Understanding of various UI frameworks and applications<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""

PARAMETER stop "<|end|>"
PARAMETER stop "<|user|>"
PARAMETER stop "<|assistant|>"
PARAMETER temperature 0.7
PARAMETER top_p 0.9
EOF
```

## Step 6: Create Models in Ollama

### Create the F16 model (high quality, larger size)
```bash
cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf
ollama create ui-tars:latest -f Modelfile
```

### Create the quantized model (recommended for daily use)
```bash
ollama create ui-tars:q4 -f Modelfile-q4
```

## Step 7: Verify Installation

### List all available models
```bash
ollama list
```

### Test the quantized model
```bash
ollama run ui-tars:q4 "Hello! Can you help me with UI automation tasks?"
```

### Test with an image (if you have one)
```bash
ollama run ui-tars:q4 "Analyze this screenshot and tell me what UI elements you can see" --image /path/to/your/screenshot.png
```

## File Sizes and Results

After completion, you should have:

- **Original model**: `/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b/` (~15GB, 19 files)
- **F16 GGUF**: `/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf` (~14.5GB)
- **Quantized GGUF**: `/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf` (~4.4GB)
- **Ollama models**:
  - `ui-tars:latest` (~15GB in Ollama)
  - `ui-tars:q4` (~4.7GB in Ollama) ⭐ **Recommended for daily use**

## Usage Tips

1. **Use the quantized model (`ui-tars:q4`)** for regular use - it's 69% smaller with minimal quality loss
2. **The model supports vision capabilities** - you can send screenshots for UI analysis
3. **Proper image formats**: PNG, JPEG, WebP are supported
4. **For UI automation**: Provide clear screenshots and specific questions about what you want to automate

## Cleanup (Optional)

If you want to save disk space after setup:

```bash
# Remove the original downloaded files (optional)
rm -rf /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b

# Remove the F16 GGUF if you only need the quantized version (optional)
rm /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf

# Remove llama.cpp if no longer needed (optional)
rm -rf /Users/qoneqt/Desktop/shubham/ai/llama.cpp
```

---

**Total Setup Time**: ~20-30 minutes (depending on download and conversion speeds)
**Final Model Size**: 4.7GB (quantized) vs 15GB (original) - 69% size reduction!