--- license: apache-2.0 pipeline_tag: image-text-to-text language: - en base_model: - prithivMLmods/Qwen2-VL-OCR-2B-Instruct library_name: peft tags: - ocr_test - qwen - qvq - kie - trl - text-generation-inference - qwen2_vl --- # **QvQ KiE [Key Information Extractor] Adapter for Qwen2-VL-OCR-2B-Instruct** The **QvQ KiE adapter** is a fine-tuned version of the **Qwen/Qwen2-VL-2B-Instruct** model, specifically tailored for tasks involving **Optical Character Recognition (OCR)**, **image-to-text conversion**, and **math problem-solving** with **LaTeX formatting**. This adapter enhances the model’s performance for multi-modal tasks by integrating vision and language capabilities in a conversational framework. # **Key Features** ### 1. **Vision-Language Integration** - Seamlessly combines **image understanding** with **natural language processing**, enabling accurate image-to-text conversion. ### 2. **Optical Character Recognition (OCR)** - Extracts and processes textual content from images with high precision, making it ideal for document analysis and information extraction. ### 3. **Math and LaTeX Support** - Efficiently handles complex **math problem-solving**, outputting results in **LaTeX format** for easy integration into scientific and academic workflows. ### 4. **Conversational Capabilities** - Equipped with multi-turn conversational capabilities, providing context-aware responses during interactions. This makes it suitable for tasks requiring ongoing dialogue and clarification. ### 5. **Image-Text-to-Text Generation** - Supports input in various forms: - **Images** - **Text** - **Image + Text (multi-modal)** - Outputs include descriptive or problem-solving text, depending on the input type. ### 6. **Secure Weight Format** - Utilizes **Safetensors** for fast and secure model weight loading, ensuring both performance and safety during deployment. ---