UITARSSS / README.md
DevShubham's picture
Update README.md
4565871 verified
---
license: apache-2.0
---
# UI-TARS 1.5-7B Model Setup Commands
This document contains all the commands executed to download, convert, and quantize the ByteDance-Seed/UI-TARS-1.5-7B model for use with Ollama.
## Prerequisites
### 1. Verify Ollama Installation
```bash
ollama --version
```
### 2. Install System Dependencies
```bash
# Install sentencepiece via Homebrew
brew install sentencepiece
# Install Python packages
pip3 install sentencepiece gguf protobuf huggingface_hub
```
## Step 1: Download the UI-TARS Model
### Create directory and download model
```bash
# Create directory for the model
mkdir -p /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b
# Change to the directory
cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b
# Download the complete model from HuggingFace
huggingface-cli download ByteDance-Seed/UI-TARS-1.5-7B --local-dir . --local-dir-use-symlinks False
# Verify download
ls -la
```
## Step 2: Setup llama.cpp for Conversion
### Clone and build llama.cpp
```bash
# Navigate to AI directory
cd /Users/qoneqt/Desktop/shubham/ai
# Clone llama.cpp repository
git clone https://github.com/ggerganov/llama.cpp.git
# Navigate to llama.cpp directory
cd llama.cpp
# Create build directory and configure with CMake
mkdir build
cd build
cmake ..
# Build the project (this will take a few minutes)
make -j$(sysctl -n hw.ncpu)
# Verify the quantize tool was built
ls -la bin/llama-quantize
```
## Step 3: Convert Safetensors to GGUF Format
### Create output directory and convert to F16 GGUF
```bash
# Create directory for GGUF files
mkdir -p /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf
# Navigate to llama.cpp directory
cd /Users/qoneqt/Desktop/shubham/ai/llama.cpp
# Convert safetensors to F16 GGUF (this takes ~5-10 minutes)
python convert_hf_to_gguf.py /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b \
--outfile /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf \
--outtype f16
# Check the F16 file size
ls -lh /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf
```
## Step 4: Quantize to Q4_K_M Format
### Quantize the F16 model to reduce size
```bash
# Navigate to the build directory
cd /Users/qoneqt/Desktop/shubham/ai/llama.cpp/build
# Quantize F16 to Q4_K_M (this takes ~1-2 minutes)
./bin/llama-quantize \
/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf \
/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf \
q4_k_m
# Check the quantized file size
ls -lh /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf
```
## Step 5: Create Modelfiles for Ollama
### Create Modelfile for F16 version
```bash
cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf
cat > Modelfile << 'EOF'
FROM ./ui-tars-1.5-7b-f16.gguf
TEMPLATE """<|im_start|>system
You are UI-TARS, an advanced AI assistant specialized in user interface automation and interaction. You can analyze screenshots, understand UI elements, and provide precise instructions for automating user interface tasks. When provided with a screenshot, analyze the visual elements and provide detailed, actionable guidance.
Key capabilities:
- Screenshot analysis and UI element detection
- Step-by-step automation instructions
- Precise coordinate identification for clicks and interactions
- Understanding of various UI frameworks and applications<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""
PARAMETER stop "<|end|>"
PARAMETER stop "<|user|>"
PARAMETER stop "<|assistant|>"
PARAMETER temperature 0.7
PARAMETER top_p 0.9
EOF
```
### Create Modelfile for quantized version
```bash
cat > Modelfile-q4 << 'EOF'
FROM ./ui-tars-1.5-7b-q4_k_m.gguf
TEMPLATE """<|im_start|>system
You are UI-TARS, an advanced AI assistant specialized in user interface automation and interaction. You can analyze screenshots, understand UI elements, and provide precise instructions for automating user interface tasks. When provided with a screenshot, analyze the visual elements and provide detailed, actionable guidance.
Key capabilities:
- Screenshot analysis and UI element detection
- Step-by-step automation instructions
- Precise coordinate identification for clicks and interactions
- Understanding of various UI frameworks and applications<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""
PARAMETER stop "<|end|>"
PARAMETER stop "<|user|>"
PARAMETER stop "<|assistant|>"
PARAMETER temperature 0.7
PARAMETER top_p 0.9
EOF
```
## Step 6: Create Models in Ollama
### Create the F16 model (high quality, larger size)
```bash
cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf
ollama create ui-tars:latest -f Modelfile
```
### Create the quantized model (recommended for daily use)
```bash
ollama create ui-tars:q4 -f Modelfile-q4
```
## Step 7: Verify Installation
### List all available models
```bash
ollama list
```
### Test the quantized model
```bash
ollama run ui-tars:q4 "Hello! Can you help me with UI automation tasks?"
```
### Test with an image (if you have one)
```bash
ollama run ui-tars:q4 "Analyze this screenshot and tell me what UI elements you can see" --image /path/to/your/screenshot.png
```
## File Sizes and Results
After completion, you should have:
- **Original model**: `/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b/` (~15GB, 19 files)
- **F16 GGUF**: `/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf` (~14.5GB)
- **Quantized GGUF**: `/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf` (~4.4GB)
- **Ollama models**:
- `ui-tars:latest` (~15GB in Ollama)
- `ui-tars:q4` (~4.7GB in Ollama) ⭐ **Recommended for daily use**
## Usage Tips
1. **Use the quantized model (`ui-tars:q4`)** for regular use - it's 69% smaller with minimal quality loss
2. **The model supports vision capabilities** - you can send screenshots for UI analysis
3. **Proper image formats**: PNG, JPEG, WebP are supported
4. **For UI automation**: Provide clear screenshots and specific questions about what you want to automate
## Cleanup (Optional)
If you want to save disk space after setup:
```bash
# Remove the original downloaded files (optional)
rm -rf /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b
# Remove the F16 GGUF if you only need the quantized version (optional)
rm /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf
# Remove llama.cpp if no longer needed (optional)
rm -rf /Users/qoneqt/Desktop/shubham/ai/llama.cpp
```
---
**Total Setup Time**: ~20-30 minutes (depending on download and conversion speeds)
**Final Model Size**: 4.7GB (quantized) vs 15GB (original) - 69% size reduction!