Spaces:

PRIYANSHUDHAKED
/

Data_Extraction_OCR

Sleeping

App Files Files Community

PRIYANSHUDHAKED commited on Sep 28, 2024

Commit

f247c1b

verified ·

1 Parent(s): ce990a6

Update README.md

Browse files

Files changed (1) hide show

README.md +59 -0

README.md CHANGED Viewed

@@ -11,3 +11,62 @@ short_description: Data Extraction from image using ocr
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
+OCR and Text Search Web Application
+This web application allows users to upload an image containing text in Hindi and English, perform Optical Character Recognition (OCR) on the image, and search for keywords within the extracted text.
+Features
+OCR processing for images containing Hindi and English text
+Keyword search with context highlighting
+Error handling and user feedback
+Deployed on Hugging Face Spaces
+Deployment on Hugging Face Spaces
+This application is designed to be deployed on Hugging Face Spaces. To deploy:
+Create a new Space on Hugging Face:
+Go to https://huggingface.co/spaces
+Click on "Create new Space"
+Choose "Streamlit" as the SDK
+Give your Space a name and set visibility settings
+Upload the following files to your Space:
+app.py (main application script)
+requirements.txt
+Hugging Face will automatically detect the Streamlit app and deploy it.
+Local Development
+If you want to run the application locally for development:
+Clone this repository:
+git clone https://github.com/1PD-IS-NO-1/ocr-web-app.git
+cd ocr-web-app
+Create a virtual environment and activate it:
+python -m venv venv
+source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
+Install the required dependencies:
+pip install -r requirements.txt
+Run the Streamlit app:
+streamlit run app.py
+Usage
+Upload an image containing text in Hindi and English.
+Click the "Perform OCR" button to extract text from the image.
+Enter keywords in the search box to find them within the extracted text.
+View the search results with highlighted context.
+Notes
+:-In the assignment we are using OCR so it can not Extract HANDWRITTEN DATA PROPERLY FROM THE IMAGE SO FOR EXTRACTING HANDWRITTEN DATA WE CAN USE ICR(INTELLIGENT CHARACTER RECOGNITION) WHICH CAN GIVE ACCURACY LIKE 98-100%.
+:-The application uses the TrOCR large model, which provides good results for mixed Hindi and English text.
+:-This is a prototype and may require further optimization for production use.
+:-The OCR model is loaded using Streamlit's caching to improve performance.