You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

FinetunedQWEN Overlay Text Extractor

A specialized vision-language model that extracts overlaid text from images like captions, titles, and promotional text while ignoring background text.

Features

  • Specialized Text Extraction: Focuses on deliberately overlaid text elements
  • Real-time Processing: Deployed on Hugging Face Inference Endpoints
  • Simple JSON Interface: Easy to integrate with existing workflows
  • Lightweight Model: Based on Qwen2.5-VL-3B-Instruct with a fine-tuned adapter

Use Cases

  • Video caption extraction
  • Content moderation
  • Graphic design analysis
  • Accessibility improvements
  • Marketing analytics

Technical Details

  • Base Model: Qwen/Qwen2.5-VL-3B-Instruct
  • Fine-tuned Adapter: MohammedSameerSyed/FinetunedQWEN
  • Input: Base64-encoded image
  • Output: JSON with extracted text or "{none}" indicator

Quick Start

Test the model with this simple Python code:

import requests
import base64
import json

def test_model(image_path, endpoint_url):
    with open(image_path, "rb") as f:
        base64_image = base64.b64encode(f.read()).decode("utf-8")
    
    payload = json.dumps({"inputs": base64_image})
    headers = {"Content-Type": "application/json"}

    response = requests.post(endpoint_url, data=payload, headers=headers)
    return response.json()

image_path = "your_image.jpg"
endpoint_url = "YOUR_ENDPOINT_URL"
result = test_model(image_path, endpoint_url)
print(f"Extracted text: {result.get('overlay_text', 'None found')}")

API Usage

Basic request:

curl -X POST \
  -H "Content-Type: application/json" \
  -d '{"inputs": "BASE64_ENCODED_IMAGE"}' \
  YOUR_ENDPOINT_URL

With custom prefix:

{
  "inputs": "BASE64_ENCODED_IMAGE", 
  "parameters": {"prefix": "Extract overlay text: "}
}

Limitations

  • Works best with clear, deliberate text overlays
  • May struggle with noisy backgrounds or complex overlapping text
  • Limited support for non-Latin scripts
  • Performance varies with image quality

Performance Tips

  • Use high-contrast text for best results
  • Ensure overlay text is clearly distinguished from background
  • Avoid highly stylized fonts when possible
  • Test with your specific image types for optimal results

Ethical Considerations

  • Respect copyright when extracting text from images
  • Be mindful of privacy when processing images with personal information
  • Consider bias in text recognition performance across different languages

Contact

Acknowledgements

  • Qwen Team for the base Qwen2.5-VL-3B-Instruct model
  • Hugging Face for the infrastructure and tools
Downloads last month
121
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for MohammedSameerSyed/FinetunedQWEN

Unable to build the model tree, the base model loops to the model itself. Learn more.