Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
@@ -12,13 +12,9 @@ short_description: Data Extraction from image using ocr
|
|
12 |
|
13 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
14 |
|
15 |
-
OCR and Text Search Web Application
|
16 |
-
This web application allows users to upload an image containing text in Hindi and English, perform Optical Character Recognition (OCR) on the image, and search for keywords within the extracted text.
|
17 |
-
Features
|
18 |
-
|
19 |
# OCR Text Extraction Tool
|
20 |
|
21 |
-
This is a simple OCR (Optical Character Recognition) tool implemented using Streamlit. It allows users to upload an image and extract text from it using Tesseract OCR. Additionally,
|
22 |
|
23 |
## How to Use
|
24 |
1. Upload an image file (JPG, JPEG, PNG).
|
@@ -29,54 +25,11 @@ This is a simple OCR (Optical Character Recognition) tool implemented using Stre
|
|
29 |
- Streamlit
|
30 |
- OpenCV
|
31 |
- Tesseract OCR
|
32 |
-
- Numpy
|
33 |
-
- Pillow
|
34 |
-
|
35 |
-
## Deployment on Hugging Face Spaces
|
36 |
-
This application is designed to be deployed on Hugging Face Spaces. To deploy:
|
37 |
-
|
38 |
-
Create a new Space on Hugging Face:
|
39 |
-
|
40 |
-
Go to https://huggingface.co/spaces
|
41 |
-
Click on "Create new Space"
|
42 |
-
Choose "Streamlit" as the SDK
|
43 |
-
Give your Space a name and set visibility settings
|
44 |
-
|
45 |
-
|
46 |
-
Upload the following files to your Space:
|
47 |
-
|
48 |
-
app.py (main application script)
|
49 |
-
requirements.txt
|
50 |
-
|
51 |
-
|
52 |
-
Hugging Face will automatically detect the Streamlit app and deploy it.
|
53 |
-
|
54 |
-
Local Development
|
55 |
-
If you want to run the application locally for development:
|
56 |
-
|
57 |
-
Clone this repository:
|
58 |
-
git clone https://github.com/1PD-IS-NO-1/ocr-web-app.git
|
59 |
-
cd ocr-web-app
|
60 |
-
|
61 |
-
Create a virtual environment and activate it:
|
62 |
-
python -m venv venv
|
63 |
-
source venv/bin/activate # On Windows, use `venv\Scripts\activate`
|
64 |
-
|
65 |
-
Install the required dependencies:
|
66 |
-
pip install -r requirements.txt
|
67 |
-
|
68 |
-
Run the Streamlit app:
|
69 |
-
streamlit run app.py
|
70 |
-
|
71 |
-
|
72 |
-
Usage
|
73 |
|
74 |
-
|
75 |
-
|
76 |
-
Enter keywords in the search box to find them within the extracted text.
|
77 |
-
View the search results with highlighted context.
|
78 |
|
79 |
-
Notes
|
80 |
:-In the assignment we are using OCR so it can not Extract HANDWRITTEN DATA PROPERLY FROM THE IMAGE SO FOR EXTRACTING HANDWRITTEN DATA WE CAN USE ICR(INTELLIGENT CHARACTER RECOGNITION) WHICH CAN GIVE ACCURACY LIKE 98-100%.
|
81 |
:-The application uses the TrOCR large model, which provides good results for mixed Hindi and English text.
|
82 |
:-This is a prototype and may require further optimization for production use.
|
|
|
12 |
|
13 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
14 |
|
|
|
|
|
|
|
|
|
15 |
# OCR Text Extraction Tool
|
16 |
|
17 |
+
This is a simple OCR (Optical Character Recognition) tool implemented using Streamlit. It allows users to upload an image and extract text from it using Tesseract OCR. Additionally, users can search for specific keywords within the extracted text.
|
18 |
|
19 |
## How to Use
|
20 |
1. Upload an image file (JPG, JPEG, PNG).
|
|
|
25 |
- Streamlit
|
26 |
- OpenCV
|
27 |
- Tesseract OCR
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
|
29 |
+
## Deployment
|
30 |
+
This application is deployed on Hugging Face Spaces.
|
|
|
|
|
31 |
|
32 |
+
## Notes
|
33 |
:-In the assignment we are using OCR so it can not Extract HANDWRITTEN DATA PROPERLY FROM THE IMAGE SO FOR EXTRACTING HANDWRITTEN DATA WE CAN USE ICR(INTELLIGENT CHARACTER RECOGNITION) WHICH CAN GIVE ACCURACY LIKE 98-100%.
|
34 |
:-The application uses the TrOCR large model, which provides good results for mixed Hindi and English text.
|
35 |
:-This is a prototype and may require further optimization for production use.
|