FinetunedQWEN Overlay Text Extractor
A specialized vision-language model that extracts overlaid text from images like captions, titles, and promotional text while ignoring background text.
Features
- Specialized Text Extraction: Focuses on deliberately overlaid text elements
- Real-time Processing: Deployed on Hugging Face Inference Endpoints
- Simple JSON Interface: Easy to integrate with existing workflows
- Lightweight Model: Based on Qwen2.5-VL-3B-Instruct with a fine-tuned adapter
Use Cases
- Video caption extraction
- Content moderation
- Graphic design analysis
- Accessibility improvements
- Marketing analytics
Technical Details
- Base Model: Qwen/Qwen2.5-VL-3B-Instruct
- Fine-tuned Adapter: MohammedSameerSyed/FinetunedQWEN
- Input: Base64-encoded image
- Output: JSON with extracted text or "{none}" indicator
Quick Start
Test the model with this simple Python code:
import requests
import base64
import json
def test_model(image_path, endpoint_url):
with open(image_path, "rb") as f:
base64_image = base64.b64encode(f.read()).decode("utf-8")
payload = json.dumps({"inputs": base64_image})
headers = {"Content-Type": "application/json"}
response = requests.post(endpoint_url, data=payload, headers=headers)
return response.json()
image_path = "your_image.jpg"
endpoint_url = "YOUR_ENDPOINT_URL"
result = test_model(image_path, endpoint_url)
print(f"Extracted text: {result.get('overlay_text', 'None found')}")
API Usage
Basic request:
curl -X POST \
-H "Content-Type: application/json" \
-d '{"inputs": "BASE64_ENCODED_IMAGE"}' \
YOUR_ENDPOINT_URL
With custom prefix:
{
"inputs": "BASE64_ENCODED_IMAGE",
"parameters": {"prefix": "Extract overlay text: "}
}
Limitations
- Works best with clear, deliberate text overlays
- May struggle with noisy backgrounds or complex overlapping text
- Limited support for non-Latin scripts
- Performance varies with image quality
Performance Tips
- Use high-contrast text for best results
- Ensure overlay text is clearly distinguished from background
- Avoid highly stylized fonts when possible
- Test with your specific image types for optimal results
Ethical Considerations
- Respect copyright when extracting text from images
- Be mindful of privacy when processing images with personal information
- Consider bias in text recognition performance across different languages
Contact
- Maintainer: Mohammed Sameer Syed
- Github: https://github.com/SyedMohammedSameer
- Repository: MohammedSameerSyed/FinetunedQWEN
Acknowledgements
- Qwen Team for the base Qwen2.5-VL-3B-Instruct model
- Hugging Face for the infrastructure and tools
- Downloads last month
- 121
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.
Model tree for MohammedSameerSyed/FinetunedQWEN
Unable to build the model tree, the base model loops to the model itself. Learn more.