--- title: Data Extraction OCR emoji: 🐠 colorFrom: blue colorTo: green sdk: streamlit sdk_version: 1.38.0 app_file: app.py pinned: false short_description: Data Extraction from image using ocr --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference # OCR Text Extraction Tool This is a simple OCR (Optical Character Recognition) tool implemented using Streamlit. It allows users to upload an image and extract text from it using Tesseract OCR. Additionally, users can search for specific keywords within the extracted text. ## How to Use 1. Upload an image file (JPG, JPEG, PNG). 2. The extracted text will be displayed. 3. Enter a keyword to search within the extracted text. ## Prerequisites - Streamlit - OpenCV - Tesseract OCR ## Deployment This application is deployed on Hugging Face Spaces. ## Notes :-In the assignment we are using OCR so it can not Extract HANDWRITTEN DATA PROPERLY FROM THE IMAGE SO FOR EXTRACTING HANDWRITTEN DATA WE CAN USE ICR(INTELLIGENT CHARACTER RECOGNITION) WHICH CAN GIVE ACCURACY LIKE 98-100%. :-The application uses the TrOCR large model, which provides good results for mixed Hindi and English text. :-This is a prototype and may require further optimization for production use. :-The OCR model is loaded using Streamlit's caching to improve performance.