L-Operator: Android Device Control with LFM2-VL

Lightweight Multimodal Android Device Control Agent

🌟 Overview

L-Operator is a fine-tuned multimodal AI agent based on LiquidAI's LFM2-VL-1.6B model, optimized for Android device control through visual understanding and action generation. This lightweight model provides efficient Android automation capabilities while maintaining high accuracy in action generation.

🔐 Investment Access Control

This model is proprietary technology available exclusively to qualified investors under NDA restrictions. Access is granted solely for investment evaluation purposes.

📋 Model Details

Property	Value
Base Model	LiquidAI/LFM2-VL-1.6B
Architecture	LFM2-VL (1.6B parameters)
Fine-tuning Method	LoRA (Low-Rank Adaptation)
LoRA Rank	16
LoRA Alpha	32
Target Modules	q_proj, v_proj, fc1, fc2, linear, gate_proj, up_proj, down_proj
Training Data	Android control episodes with screenshots and actions
License	Proprietary

🛠️ Installation

Prerequisites

Before installing the model, you must:

Request Access: Click the "Request Access" button on this page and fill out the form
Wait for Approval: Access requests are typically reviewed within 1-2 business days
Authenticate: Once approved, you'll need to authenticate with Hugging Face

🚀 Quick Start

Authentication Required

Important: You must be authenticated with Hugging Face to access this gated model. Ensure you have:

Received access approval
Logged in using huggingface-cli login or login() from huggingface_hub

Basic Usage

import torch
from PIL import Image
from transformers import AutoProcessor, AutoModelForImageTextToText

# Load model and processor
model_id = "Tonic/l-operator"
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto"
)

# Prepare input
image = Image.open("android_screenshot.png").convert("RGB")
goal = "Open the Settings app"
instruction = "Navigate to the Settings app on the home screen"

# Build conversation
conversation = [
    {
        "role": "system",
        "content": [
            {"type": "text", "text": "You are a helpful multimodal assistant by Liquid AI."}
        ]
    },
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": f"Goal: {goal}\nStep: {instruction}\nRespond with a JSON action containing relevant keys (e.g., action_type, x, y, text, app_name, direction)."}
        ]
    }
]

# Generate response
inputs = processor.apply_chat_template(
    conversation,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

with torch.no_grad():
    outputs = model.generate(
        inputs,
        max_new_tokens=128,
        do_sample=True,
        temperature=0.7,
        top_p=0.9
    )

response = processor.tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(response)

Expected Output Format

The model generates JSON actions in the following format:

{
  "action_type": "tap",
  "x": 540,
  "y": 1200,
  "text": "Settings",
  "app_name": "com.android.settings",
  "confidence": 0.92
}

📊 Training Configuration

Training Parameters

Parameter	Value
Learning Rate	5e-4
Batch Size	1 (per device)
Gradient Accumulation	8
Epochs	1.0
Warmup Ratio	0.1
Weight Decay	0.01
Optimizer	AdamW
Scheduler	Cosine
Mixed Precision	bfloat16

Vision Configuration

Parameter	Value
Max Image Tokens	256
Min Image Tokens	64
Image Splitting	Enabled
Image Format	RGB

🎯 Use Cases

1. Mobile App Testing

Automated UI testing for Android applications
Cross-device compatibility validation
Regression testing with visual verification

2. Accessibility Applications

Voice-controlled device navigation
Assistive technology integration
Screen reader enhancement tools

3. Remote Support

Remote device troubleshooting
Automated device configuration
Support ticket automation

4. Development Workflows

UI/UX testing automation
User flow validation
Performance testing integration

📄 License & Terms

This model is proprietary technology owned by Tonic and is subject to strict licensing terms:

Investment Evaluation License

Purpose: Access granted solely for investment evaluation and due diligence
Restrictions: No commercial use, reproduction, or distribution without written consent
NDA Required: All access is subject to Non-Disclosure Agreement
Confidentiality: All technical details, training methodologies, and performance characteristics are confidential

Base Model Attribution

LFM2-VL-1.6B: Licensed under MIT License from LiquidAI
Fine-tuning: Proprietary to Tonic, subject to separate licensing terms

Investment Terms

Access is granted exclusively to qualified investors
Technology evaluation for investment purposes only
Strict confidentiality of all proprietary information
No reverse engineering or unauthorized analysis permitted

For licensing inquiries, contact us

🙏 Acknowledgments

LiquidAI: For the base LFM2-VL model
Hugging Face: For the transformers library and hosting

📞 Investment Support

Investment Inquiries: For investment-related questions and due diligence, contact us

🔗 Related Links

🔄 Model Comparison

Feature	L-Operator (LFM2-VL)	G-Operator (Gemma 3N)
Model Size	1.6B parameters	4B parameters
Inference Speed	⚡ Fast	🐌 Slower
Memory Usage	💾 Low (~4GB)	💾 High (~8GB)
Accuracy	✅ Good	✅ Excellent
Real-time Use	✅ Optimized	⚠️ Limited
Edge Deployment	✅ Suitable	❌ Challenging

Made with ❤️ by Tonic

Tonic
/

l-android-control

Investment Access Request - L-Operator