Qaari 0.1 Urdu: OCR Model for Urdu Language

Model Description

Qaari 0.1 Urdu is a fine-tuned version of Qwen/Qwen2-VL-2B specifically optimized for Optical Character Recognition (OCR) of Urdu text. It represents a significant advancement in Urdu language OCR capabilities, dramatically outperforming both the base model and traditional OCR solutions like Tesseract.

Key Features

Specialized for Urdu OCR: Optimized for recognizing Urdu script with high accuracy
Superior Performance: Achieves 97.35% reduction in Word Error Rate compared to the base model
High Accuracy: 0.048 WER and 0.029 CER, with a BLEU score of 0.916
Balanced Output Length: Near-perfect length ratio of 0.978 (ideal is 1.0)

Performance Metrics

Model	WER ↓	CER ↓	BLEU ↑	Length Ratio
Qaari 0.1 Urdu	0.048	0.029	0.916	0.978
Tesseract	0.352	0.227	0.518	0.770
Qwen Base	1.823	1.739	0.009	1.288

Improvement Percentages

Comparison	WER Improvement	CER Improvement	BLEU Improvement
vs. Qwen Base	97.35%	98.32%	91.55%
vs. Tesseract	86.25%	87.11%	82.60%

Supported Fonts

The model was fine-tuned on the following fonts:

AlQalam Taj Nastaleeq Regular
Alvi Nastaleeq Regular
Gandhara Suls Regular
Jameel Noori Nastaleeq Regular
NotoNastaliqUrdu-Regular

Supported Font Sizes

The model has been tested and optimized for the following font sizes:

14pt
16pt
18pt
20pt
24pt
32pt
40pt

Usage

Try Qaari - Google Colab

You can load this model using the transformers and qwen_vl_utils library:

!pip install transformers qwen_vl_utils accelerate>=0.26.0 PEFT -U
!pip install -U bitsandbytes

from PIL import Image
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
import torch
import os
from qwen_vl_utils import process_vision_info



model_name = "oddadmix/Qaari-0.1-Urdu-OCR-VL-2B-Instruct"
model = Qwen2VLForConditionalGeneration.from_pretrained(
                model_name,
                torch_dtype="auto",
                device_map="auto"
            )
processor = AutoProcessor.from_pretrained(model_name)
max_tokens = 2000

prompt = "Below is the image of one page of a document, as well as some raw textual content that was previously extracted for it. Just return the plain text representation of this document as if you were reading it naturally. Do not hallucinate."
image.save("image.png")

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": f"file://{src}"},
            {"type": "text", "text": prompt},
        ],
    }
]
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
inputs = inputs.to("cuda")
generated_ids = model.generate(**inputs, max_new_tokens=max_tokens)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)[0]
os.remove(src)
print(output_text)

Limitations

Performance may degrade when using fonts not included in the fine-tuning dataset
Font sizes outside the supported range may result in suboptimal rendering
The model may not handle complex ligatures in non-Nastaleeq scripts effectively
Performance on digital-only displays has not been fully optimized
Low-resolution print environments might experience quality degradation
Custom font modifications or non-standard Nastaleeq variants might not render as expected

Training Details

This model was fine-tuned from Qwen2-VL-2B using a dataset of Urdu text images with paired transcriptions. The training process focused on optimizing for accurate Urdu character recognition and natural language understanding.

Training Dataset

Dataset Type: Paired Urdu text images with ground truth transcriptions
Size: 10,000
Source: Syntehtic Dataset

Training Configuration

Base Model: Qwen/Qwen2-VL-2B
Hardware: A6000 GPU
Training Time: 24 Hours

Citation

If you use this model in your research, please cite:

@misc{qaari-0.1-urdu,
  author = {Ahmed Wasfy},
  title = {Qaari 0.1 Urdu: OCR Model for Urdu Language},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/oddadmix/Qaari-0.1-Urdu-OCR-VL-2B-Instruct}}
}

License

This model is subject to the license terms of the base Qwen2-VL-2B model.

oddadmix
/

Qaari-0.1-Urdu-OCR-VL-2B-Instruct