PRIYANSHUDHAKED commited on
Commit
49e52dc
·
verified ·
1 Parent(s): dbf63c3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -51
README.md CHANGED
@@ -12,13 +12,9 @@ short_description: Data Extraction from image using ocr
12
 
13
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
14
 
15
- OCR and Text Search Web Application
16
- This web application allows users to upload an image containing text in Hindi and English, perform Optical Character Recognition (OCR) on the image, and search for keywords within the extracted text.
17
- Features
18
-
19
  # OCR Text Extraction Tool
20
 
21
- This is a simple OCR (Optical Character Recognition) tool implemented using Streamlit. It allows users to upload an image and extract text from it using Tesseract OCR. Additionally, you can search for specific keywords within the extracted text.
22
 
23
  ## How to Use
24
  1. Upload an image file (JPG, JPEG, PNG).
@@ -29,54 +25,11 @@ This is a simple OCR (Optical Character Recognition) tool implemented using Stre
29
  - Streamlit
30
  - OpenCV
31
  - Tesseract OCR
32
- - Numpy
33
- - Pillow
34
-
35
- ## Deployment on Hugging Face Spaces
36
- This application is designed to be deployed on Hugging Face Spaces. To deploy:
37
-
38
- Create a new Space on Hugging Face:
39
-
40
- Go to https://huggingface.co/spaces
41
- Click on "Create new Space"
42
- Choose "Streamlit" as the SDK
43
- Give your Space a name and set visibility settings
44
-
45
-
46
- Upload the following files to your Space:
47
-
48
- app.py (main application script)
49
- requirements.txt
50
-
51
-
52
- Hugging Face will automatically detect the Streamlit app and deploy it.
53
-
54
- Local Development
55
- If you want to run the application locally for development:
56
-
57
- Clone this repository:
58
- git clone https://github.com/1PD-IS-NO-1/ocr-web-app.git
59
- cd ocr-web-app
60
-
61
- Create a virtual environment and activate it:
62
- python -m venv venv
63
- source venv/bin/activate # On Windows, use `venv\Scripts\activate`
64
-
65
- Install the required dependencies:
66
- pip install -r requirements.txt
67
-
68
- Run the Streamlit app:
69
- streamlit run app.py
70
-
71
-
72
- Usage
73
 
74
- Upload an image containing text in Hindi and English.
75
- Click the "Perform OCR" button to extract text from the image.
76
- Enter keywords in the search box to find them within the extracted text.
77
- View the search results with highlighted context.
78
 
79
- Notes
80
  :-In the assignment we are using OCR so it can not Extract HANDWRITTEN DATA PROPERLY FROM THE IMAGE SO FOR EXTRACTING HANDWRITTEN DATA WE CAN USE ICR(INTELLIGENT CHARACTER RECOGNITION) WHICH CAN GIVE ACCURACY LIKE 98-100%.
81
  :-The application uses the TrOCR large model, which provides good results for mixed Hindi and English text.
82
  :-This is a prototype and may require further optimization for production use.
 
12
 
13
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
14
 
 
 
 
 
15
  # OCR Text Extraction Tool
16
 
17
+ This is a simple OCR (Optical Character Recognition) tool implemented using Streamlit. It allows users to upload an image and extract text from it using Tesseract OCR. Additionally, users can search for specific keywords within the extracted text.
18
 
19
  ## How to Use
20
  1. Upload an image file (JPG, JPEG, PNG).
 
25
  - Streamlit
26
  - OpenCV
27
  - Tesseract OCR
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
+ ## Deployment
30
+ This application is deployed on Hugging Face Spaces.
 
 
31
 
32
+ ## Notes
33
  :-In the assignment we are using OCR so it can not Extract HANDWRITTEN DATA PROPERLY FROM THE IMAGE SO FOR EXTRACTING HANDWRITTEN DATA WE CAN USE ICR(INTELLIGENT CHARACTER RECOGNITION) WHICH CAN GIVE ACCURACY LIKE 98-100%.
34
  :-The application uses the TrOCR large model, which provides good results for mixed Hindi and English text.
35
  :-This is a prototype and may require further optimization for production use.