# UI-TARS 1.5-7B Model Setup Commands This document contains all the commands executed to download, convert, and quantize the ByteDance-Seed/UI-TARS-1.5-7B model for use with Ollama. ## Prerequisites ### 1. Verify Ollama Installation ```bash ollama --version ``` ### 2. Install System Dependencies ```bash # Install sentencepiece via Homebrew brew install sentencepiece # Install Python packages pip3 install sentencepiece gguf protobuf huggingface_hub ``` ## Step 1: Download the UI-TARS Model ### Create directory and download model ```bash # Create directory for the model mkdir -p /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b # Change to the directory cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b # Download the complete model from HuggingFace huggingface-cli download ByteDance-Seed/UI-TARS-1.5-7B --local-dir . --local-dir-use-symlinks False # Verify download ls -la ``` ## Step 2: Setup llama.cpp for Conversion ### Clone and build llama.cpp ```bash # Navigate to AI directory cd /Users/qoneqt/Desktop/shubham/ai # Clone llama.cpp repository git clone https://github.com/ggerganov/llama.cpp.git # Navigate to llama.cpp directory cd llama.cpp # Create build directory and configure with CMake mkdir build cd build cmake .. # Build the project (this will take a few minutes) make -j$(sysctl -n hw.ncpu) # Verify the quantize tool was built ls -la bin/llama-quantize ``` ## Step 3: Convert Safetensors to GGUF Format ### Create output directory and convert to F16 GGUF ```bash # Create directory for GGUF files mkdir -p /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf # Navigate to llama.cpp directory cd /Users/qoneqt/Desktop/shubham/ai/llama.cpp # Convert safetensors to F16 GGUF (this takes ~5-10 minutes) python convert_hf_to_gguf.py /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b \ --outfile /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf \ --outtype f16 # Check the F16 file size ls -lh /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf ``` ## Step 4: Quantize to Q4_K_M Format ### Quantize the F16 model to reduce size ```bash # Navigate to the build directory cd /Users/qoneqt/Desktop/shubham/ai/llama.cpp/build # Quantize F16 to Q4_K_M (this takes ~1-2 minutes) ./bin/llama-quantize \ /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf \ /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf \ q4_k_m # Check the quantized file size ls -lh /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf ``` ## Step 5: Create Modelfiles for Ollama ### Create Modelfile for F16 version ```bash cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf cat > Modelfile << 'EOF' FROM ./ui-tars-1.5-7b-f16.gguf TEMPLATE """<|im_start|>system You are UI-TARS, an advanced AI assistant specialized in user interface automation and interaction. You can analyze screenshots, understand UI elements, and provide precise instructions for automating user interface tasks. When provided with a screenshot, analyze the visual elements and provide detailed, actionable guidance. Key capabilities: - Screenshot analysis and UI element detection - Step-by-step automation instructions - Precise coordinate identification for clicks and interactions - Understanding of various UI frameworks and applications<|im_end|> <|im_start|>user {{ .Prompt }}<|im_end|> <|im_start|>assistant """ PARAMETER stop "<|end|>" PARAMETER stop "<|user|>" PARAMETER stop "<|assistant|>" PARAMETER temperature 0.7 PARAMETER top_p 0.9 EOF ``` ### Create Modelfile for quantized version ```bash cat > Modelfile-q4 << 'EOF' FROM ./ui-tars-1.5-7b-q4_k_m.gguf TEMPLATE """<|im_start|>system You are UI-TARS, an advanced AI assistant specialized in user interface automation and interaction. You can analyze screenshots, understand UI elements, and provide precise instructions for automating user interface tasks. When provided with a screenshot, analyze the visual elements and provide detailed, actionable guidance. Key capabilities: - Screenshot analysis and UI element detection - Step-by-step automation instructions - Precise coordinate identification for clicks and interactions - Understanding of various UI frameworks and applications<|im_end|> <|im_start|>user {{ .Prompt }}<|im_end|> <|im_start|>assistant """ PARAMETER stop "<|end|>" PARAMETER stop "<|user|>" PARAMETER stop "<|assistant|>" PARAMETER temperature 0.7 PARAMETER top_p 0.9 EOF ``` ## Step 6: Create Models in Ollama ### Create the F16 model (high quality, larger size) ```bash cd /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf ollama create ui-tars:latest -f Modelfile ``` ### Create the quantized model (recommended for daily use) ```bash ollama create ui-tars:q4 -f Modelfile-q4 ``` ## Step 7: Verify Installation ### List all available models ```bash ollama list ``` ### Test the quantized model ```bash ollama run ui-tars:q4 "Hello! Can you help me with UI automation tasks?" ``` ### Test with an image (if you have one) ```bash ollama run ui-tars:q4 "Analyze this screenshot and tell me what UI elements you can see" --image /path/to/your/screenshot.png ``` ## File Sizes and Results After completion, you should have: - **Original model**: `/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b/` (~15GB, 19 files) - **F16 GGUF**: `/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf` (~14.5GB) - **Quantized GGUF**: `/Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-q4_k_m.gguf` (~4.4GB) - **Ollama models**: - `ui-tars:latest` (~15GB in Ollama) - `ui-tars:q4` (~4.7GB in Ollama) ⭐ **Recommended for daily use** ## Usage Tips 1. **Use the quantized model (`ui-tars:q4`)** for regular use - it's 69% smaller with minimal quality loss 2. **The model supports vision capabilities** - you can send screenshots for UI analysis 3. **Proper image formats**: PNG, JPEG, WebP are supported 4. **For UI automation**: Provide clear screenshots and specific questions about what you want to automate ## Cleanup (Optional) If you want to save disk space after setup: ```bash # Remove the original downloaded files (optional) rm -rf /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b # Remove the F16 GGUF if you only need the quantized version (optional) rm /Users/qoneqt/Desktop/shubham/ai/ui-tars-1.5-7b-gguf/ui-tars-1.5-7b-f16.gguf # Remove llama.cpp if no longer needed (optional) rm -rf /Users/qoneqt/Desktop/shubham/ai/llama.cpp ``` --- **Total Setup Time**: ~20-30 minutes (depending on download and conversion speeds) **Final Model Size**: 4.7GB (quantized) vs 15GB (original) - 69% size reduction!