Spaces:
Sleeping
A newer version of the Streamlit SDK is available:
1.44.0
title: Data Extraction OCR
emoji: 🐠
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.38.0
app_file: app.py
pinned: false
short_description: Data Extraction from image using ocr
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
OCR Text Extraction Tool
This is a simple OCR (Optical Character Recognition) tool implemented using Streamlit. It allows users to upload an image and extract text from it using Tesseract OCR. Additionally, users can search for specific keywords within the extracted text.
How to Use
- Upload an image file (JPG, JPEG, PNG).
- The extracted text will be displayed.
- Enter a keyword to search within the extracted text.
Prerequisites
- Streamlit
- OpenCV
- Tesseract OCR
Deployment
This application is deployed on Hugging Face Spaces.
Notes
:-In the assignment we are using OCR so it can not Extract HANDWRITTEN DATA PROPERLY FROM THE IMAGE SO FOR EXTRACTING HANDWRITTEN DATA WE CAN USE ICR(INTELLIGENT CHARACTER RECOGNITION) WHICH CAN GIVE ACCURACY LIKE 98-100%. :-The application uses the TrOCR large model, which provides good results for mixed Hindi and English text. :-This is a prototype and may require further optimization for production use. :-The OCR model is loaded using Streamlit's caching to improve performance.