---
title: Data Extraction OCR
emoji: 🐠
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.38.0
app_file: app.py
pinned: false
short_description: Data Extraction from image using ocr
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

# OCR Text Extraction Tool  

This is a simple OCR (Optical Character Recognition) tool implemented using Streamlit. It allows users to upload an image and extract text from it using Tesseract OCR. Additionally, users can search for specific keywords within the extracted text.  

## How to Use  
1. Upload an image file (JPG, JPEG, PNG).  
2. The extracted text will be displayed.  
3. Enter a keyword to search within the extracted text.  

## Prerequisites  
- Streamlit  
- OpenCV  
- Tesseract OCR  

## Deployment  
This application is deployed on Hugging Face Spaces.

## Notes
:-In the assignment we are using OCR so it can not Extract HANDWRITTEN DATA PROPERLY FROM THE IMAGE SO FOR EXTRACTING HANDWRITTEN DATA WE CAN USE ICR(INTELLIGENT CHARACTER RECOGNITION) WHICH CAN GIVE ACCURACY LIKE 98-100%.
:-The application uses the TrOCR large model, which provides good results for mixed Hindi and English text.
:-This is a prototype and may require further optimization for production use.
:-The OCR model is loaded using Streamlit's caching to improve performance.