metadata

license: apache-2.0
pipeline_tag: image-text-to-text
language:
  - en
base_model:
  - prithivMLmods/Qwen2-VL-OCR-2B-Instruct
library_name: peft
tags:
  - ocr_test
  - qwen
  - qvq
  - kie
  - trl
  - text-generation-inference
  - qwen2_vl

QvQ KiE [Key Information Extractor] Adapter for Qwen2-VL-OCR-2B-Instruct

The QvQ KiE adapter is a fine-tuned version of the Qwen/Qwen2-VL-2B-Instruct model, specifically tailored for tasks involving Optical Character Recognition (OCR), image-to-text conversion, and math problem-solving with LaTeX formatting. This adapter enhances the model’s performance for multi-modal tasks by integrating vision and language capabilities in a conversational framework.

Key Features

1. Vision-Language Integration

Seamlessly combines image understanding with natural language processing, enabling accurate image-to-text conversion.

2. Optical Character Recognition (OCR)

Extracts and processes textual content from images with high precision, making it ideal for document analysis and information extraction.

3. Math and LaTeX Support

Efficiently handles complex math problem-solving, outputting results in LaTeX format for easy integration into scientific and academic workflows.

4. Conversational Capabilities

Equipped with multi-turn conversational capabilities, providing context-aware responses during interactions. This makes it suitable for tasks requiring ongoing dialogue and clarification.

5. Image-Text-to-Text Generation

Supports input in various forms:
- Images
- Text
- Image + Text (multi-modal)
Outputs include descriptive or problem-solving text, depending on the input type.

6. Secure Weight Format

Utilizes Safetensors for fast and secure model weight loading, ensuring both performance and safety during deployment.