PRIYANSHUDHAKED commited on
Commit
f247c1b
·
verified ·
1 Parent(s): ce990a6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +59 -0
README.md CHANGED
@@ -11,3 +11,62 @@ short_description: Data Extraction from image using ocr
11
  ---
12
 
13
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
14
+
15
+ OCR and Text Search Web Application
16
+ This web application allows users to upload an image containing text in Hindi and English, perform Optical Character Recognition (OCR) on the image, and search for keywords within the extracted text.
17
+ Features
18
+
19
+ OCR processing for images containing Hindi and English text
20
+ Keyword search with context highlighting
21
+ Error handling and user feedback
22
+ Deployed on Hugging Face Spaces
23
+
24
+ Deployment on Hugging Face Spaces
25
+ This application is designed to be deployed on Hugging Face Spaces. To deploy:
26
+
27
+ Create a new Space on Hugging Face:
28
+
29
+ Go to https://huggingface.co/spaces
30
+ Click on "Create new Space"
31
+ Choose "Streamlit" as the SDK
32
+ Give your Space a name and set visibility settings
33
+
34
+
35
+ Upload the following files to your Space:
36
+
37
+ app.py (main application script)
38
+ requirements.txt
39
+
40
+
41
+ Hugging Face will automatically detect the Streamlit app and deploy it.
42
+
43
+ Local Development
44
+ If you want to run the application locally for development:
45
+
46
+ Clone this repository:
47
+ git clone https://github.com/1PD-IS-NO-1/ocr-web-app.git
48
+ cd ocr-web-app
49
+
50
+ Create a virtual environment and activate it:
51
+ python -m venv venv
52
+ source venv/bin/activate # On Windows, use `venv\Scripts\activate`
53
+
54
+ Install the required dependencies:
55
+ pip install -r requirements.txt
56
+
57
+ Run the Streamlit app:
58
+ streamlit run app.py
59
+
60
+
61
+ Usage
62
+
63
+ Upload an image containing text in Hindi and English.
64
+ Click the "Perform OCR" button to extract text from the image.
65
+ Enter keywords in the search box to find them within the extracted text.
66
+ View the search results with highlighted context.
67
+
68
+ Notes
69
+ :-In the assignment we are using OCR so it can not Extract HANDWRITTEN DATA PROPERLY FROM THE IMAGE SO FOR EXTRACTING HANDWRITTEN DATA WE CAN USE ICR(INTELLIGENT CHARACTER RECOGNITION) WHICH CAN GIVE ACCURACY LIKE 98-100%.
70
+ :-The application uses the TrOCR large model, which provides good results for mixed Hindi and English text.
71
+ :-This is a prototype and may require further optimization for production use.
72
+ :-The OCR model is loaded using Streamlit's caching to improve performance.