|
--- |
|
license: apache-2.0 |
|
--- |
|
# UI-TARS 1.5-7B Model Setup Commands |
|
|
|
This document contains all the commands executed to download, convert, and quantize the ByteDance-Seed/UI-TARS-1.5-7B model for use with Ollama. |
|
|
|
## Prerequisites |
|
|
|
### 1. Verify Ollama Installation |
|
```bash |
|
ollama --version |
|
``` |
|
|
|
### 2. Install System Dependencies |
|
```bash |
|
# Install sentencepiece via Homebrew |
|
brew install sentencepiece |
|
|
|
# Install Python packages |
|
pip3 install sentencepiece gguf protobuf huggingface_hub |
|
``` |
|
|
|
## Step 1: Download the UI-TARS Model |
|
|
|
### Create directory and download model |
|
```bash |
|
# Create directory for the model |
|
mkdir -p /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b |
|
|
|
# Change to the directory |
|
cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b |
|
|
|
# Download the complete model from HuggingFace |
|
huggingface-cli download ByteDance-Seed/UI-TARS-1.5-7B --local-dir . --local-dir-use-symlinks False |
|
|
|
# Verify download |
|
ls -la |
|
``` |
|
|
|
## Step 2: Setup llama.cpp for Conversion |
|
|
|
### Clone and build llama.cpp |
|
```bash |
|
# Navigate to AI directory |
|
cd /Users/qoneqt/Desktop/shubham/ai |
|
|
|
# Clone llama.cpp repository |
|
git clone https://github.com/ggerganov/llama.cpp.git |
|
|
|
# Navigate to llama.cpp directory |
|
cd llama.cpp |
|
|
|
# Create build directory and configure with CMake |
|
mkdir build |
|
cd build |
|
cmake .. |
|
|
|
# Build the project (this will take a few minutes) |
|
make -j$(sysctl -n hw.ncpu) |
|
|
|
# Verify the quantize tool was built |
|
ls -la bin/llama-quantize |
|
``` |
|
|
|
## Step 3: Convert Safetensors to GGUF Format |
|
|
|
### Create output directory and convert to F16 GGUF |
|
```bash |
|
# Create directory for GGUF files |
|
mkdir -p /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf |
|
|
|
# Navigate to llama.cpp directory |
|
cd /Users/qoneqt/Desktop/shubham/ai/llama.cpp |
|
|
|
# Convert safetensors to F16 GGUF (this takes ~5-10 minutes) |
|
python convert_hf_to_gguf.py /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b \ |
|
--outfile /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf \ |
|
--outtype f16 |
|
|
|
# Check the F16 file size |
|
ls -lh /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf |
|
``` |
|
|
|
## Step 4: Quantize to Q4_K_M Format |
|
|
|
### Quantize the F16 model to reduce size |
|
```bash |
|
# Navigate to the build directory |
|
cd /Users/qoneqt/Desktop/shubham/ai/llama.cpp/build |
|
|
|
# Quantize F16 to Q4_K_M (this takes ~1-2 minutes) |
|
./bin/llama-quantize \ |
|
/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf \ |
|
/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf \ |
|
q4_k_m |
|
|
|
# Check the quantized file size |
|
ls -lh /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf |
|
``` |
|
|
|
## Step 5: Create Modelfiles for Ollama |
|
|
|
### Create Modelfile for F16 version |
|
```bash |
|
cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf |
|
|
|
cat > Modelfile << 'EOF' |
|
FROM ./ui-tars-1.5-7b-f16.gguf |
|
|
|
TEMPLATE """<|im_start|>system |
|
You are UI-TARS, an advanced AI assistant specialized in user interface automation and interaction. You can analyze screenshots, understand UI elements, and provide precise instructions for automating user interface tasks. When provided with a screenshot, analyze the visual elements and provide detailed, actionable guidance. |
|
|
|
Key capabilities: |
|
- Screenshot analysis and UI element detection |
|
- Step-by-step automation instructions |
|
- Precise coordinate identification for clicks and interactions |
|
- Understanding of various UI frameworks and applications<|im_end|> |
|
<|im_start|>user |
|
{{ .Prompt }}<|im_end|> |
|
<|im_start|>assistant |
|
""" |
|
|
|
PARAMETER stop "<|end|>" |
|
PARAMETER stop "<|user|>" |
|
PARAMETER stop "<|assistant|>" |
|
PARAMETER temperature 0.7 |
|
PARAMETER top_p 0.9 |
|
EOF |
|
``` |
|
|
|
### Create Modelfile for quantized version |
|
```bash |
|
cat > Modelfile-q4 << 'EOF' |
|
FROM ./ui-tars-1.5-7b-q4_k_m.gguf |
|
|
|
TEMPLATE """<|im_start|>system |
|
You are UI-TARS, an advanced AI assistant specialized in user interface automation and interaction. You can analyze screenshots, understand UI elements, and provide precise instructions for automating user interface tasks. When provided with a screenshot, analyze the visual elements and provide detailed, actionable guidance. |
|
|
|
Key capabilities: |
|
- Screenshot analysis and UI element detection |
|
- Step-by-step automation instructions |
|
- Precise coordinate identification for clicks and interactions |
|
- Understanding of various UI frameworks and applications<|im_end|> |
|
<|im_start|>user |
|
{{ .Prompt }}<|im_end|> |
|
<|im_start|>assistant |
|
""" |
|
|
|
PARAMETER stop "<|end|>" |
|
PARAMETER stop "<|user|>" |
|
PARAMETER stop "<|assistant|>" |
|
PARAMETER temperature 0.7 |
|
PARAMETER top_p 0.9 |
|
EOF |
|
``` |
|
|
|
## Step 6: Create Models in Ollama |
|
|
|
### Create the F16 model (high quality, larger size) |
|
```bash |
|
cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf |
|
ollama create ui-tars:latest -f Modelfile |
|
``` |
|
|
|
### Create the quantized model (recommended for daily use) |
|
```bash |
|
ollama create ui-tars:q4 -f Modelfile-q4 |
|
``` |
|
|
|
## Step 7: Verify Installation |
|
|
|
### List all available models |
|
```bash |
|
ollama list |
|
``` |
|
|
|
### Test the quantized model |
|
```bash |
|
ollama run ui-tars:q4 "Hello! Can you help me with UI automation tasks?" |
|
``` |
|
|
|
### Test with an image (if you have one) |
|
```bash |
|
ollama run ui-tars:q4 "Analyze this screenshot and tell me what UI elements you can see" --image /path/to/your/screenshot.png |
|
``` |
|
|
|
## File Sizes and Results |
|
|
|
After completion, you should have: |
|
|
|
- **Original model**: `/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b/` (~15GB, 19 files) |
|
- **F16 GGUF**: `/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf` (~14.5GB) |
|
- **Quantized GGUF**: `/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf` (~4.4GB) |
|
- **Ollama models**: |
|
- `ui-tars:latest` (~15GB in Ollama) |
|
- `ui-tars:q4` (~4.7GB in Ollama) ⭐ **Recommended for daily use** |
|
|
|
## Usage Tips |
|
|
|
1. **Use the quantized model (`ui-tars:q4`)** for regular use - it's 69% smaller with minimal quality loss |
|
2. **The model supports vision capabilities** - you can send screenshots for UI analysis |
|
3. **Proper image formats**: PNG, JPEG, WebP are supported |
|
4. **For UI automation**: Provide clear screenshots and specific questions about what you want to automate |
|
|
|
## Cleanup (Optional) |
|
|
|
If you want to save disk space after setup: |
|
|
|
```bash |
|
# Remove the original downloaded files (optional) |
|
rm -rf /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b |
|
|
|
# Remove the F16 GGUF if you only need the quantized version (optional) |
|
rm /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf |
|
|
|
# Remove llama.cpp if no longer needed (optional) |
|
rm -rf /Users/qoneqt/Desktop/shubham/ai/llama.cpp |
|
``` |
|
|
|
--- |
|
|
|
**Total Setup Time**: ~20-30 minutes (depending on download and conversion speeds) |
|
**Final Model Size**: 4.7GB (quantized) vs 15GB (original) - 69% size reduction! |
|
|