metadata

title: Data Extraction OCR
emoji: 🐠
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.38.0
app_file: app.py
pinned: false
short_description: Data Extraction from image using ocr

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

OCR Text Extraction Tool

This is a simple OCR (Optical Character Recognition) tool implemented using Streamlit. It allows users to upload an image and extract text from it using Tesseract OCR. Additionally, users can search for specific keywords within the extracted text.

How to Use

Upload an image file (JPG, JPEG, PNG).
The extracted text will be displayed.
Enter a keyword to search within the extracted text.

Prerequisites

Streamlit
OpenCV
Tesseract OCR

Deployment

This application is deployed on Hugging Face Spaces.

Notes

:-In the assignment we are using OCR so it can not Extract HANDWRITTEN DATA PROPERLY FROM THE IMAGE SO FOR EXTRACTING HANDWRITTEN DATA WE CAN USE ICR(INTELLIGENT CHARACTER RECOGNITION) WHICH CAN GIVE ACCURACY LIKE 98-100%. :-The application uses the TrOCR large model, which provides good results for mixed Hindi and English text. :-This is a prototype and may require further optimization for production use. :-The OCR model is loaded using Streamlit's caching to improve performance.