Spaces:
Sleeping
Sleeping
Commit
Β·
ce2c75c
1
Parent(s):
6fd5cfa
update readme
Browse files
README.md
CHANGED
@@ -0,0 +1,69 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# OCR Web Application
|
2 |
+
|
3 |
+
## Project Overview
|
4 |
+
This is a **web-based Optical Character Recognition (OCR) application** built using Streamlit. The app supports both English and Hindi languages, allowing users to upload images and extract text using advanced OCR models.
|
5 |
+
|
6 |
+
## How the Application Works
|
7 |
+
1. Choose Language: Select either English or Hindi using the sidebar instructions.
|
8 |
+
2. Upload Image: Use the file uploader to input an image in JPG, PNG, or JPEG format.
|
9 |
+
3. Text Extraction: For English, the app uses the GOT OCR 2.0 model to extract text, while for Hindi, it leverages EasyOCR.
|
10 |
+
4. Keyword Search: After text extraction, you can search for specific keywords within the extracted text. Matching keywords will be highlighted, and any missing keywords will be displayed in a warning message.
|
11 |
+
5. Reset: If needed, reset the session and upload a new image to start over.
|
12 |
+
|
13 |
+
## Installation and Setup
|
14 |
+
|
15 |
+
### Prerequisites:
|
16 |
+
- **Python 3.8 or higher**
|
17 |
+
- Required libraries listed in `requirements.txt`
|
18 |
+
|
19 |
+
### Installation Steps:
|
20 |
+
1. **Clone the repository**:
|
21 |
+
```bash
|
22 |
+
git clone https://github.com/Trisandhyadevi/OCR.git
|
23 |
+
|
24 |
+
2. **Navigate to the project directory**
|
25 |
+
```bash
|
26 |
+
cd OCR
|
27 |
+
|
28 |
+
3. **Install the required dependencies:**
|
29 |
+
```bash
|
30 |
+
pip install -r requirements.txt
|
31 |
+
|
32 |
+
4. **Run the application:**
|
33 |
+
```bash
|
34 |
+
streamlit run app.py
|
35 |
+
|
36 |
+
|
37 |
+
# Description
|
38 |
+
|
39 |
+
This web application supports converting images to text using the GOT OCR 2.0 Model. Below are some key features of the GOT OCR 2.0 model
|
40 |
+
|
41 |
+
# GOT OCR 2.0 Model Overview
|
42 |
+
|
43 |
+
The GOT OCR 2.0 Model is a state-of-the-art OCR system designed for accurate text extraction from images. Key features include:
|
44 |
+
|
45 |
+
- **Multi-task Learning**: The model supports various tasks beyond OCR, including layout analysis and object detection, making it versatile for diverse text recognition needs.
|
46 |
+
- **End-to-End Pipeline**: It efficiently processes entire images, identifying and extracting text without the need for additional preprocessing steps.
|
47 |
+
|
48 |
+
Note: Currently, the model does not support all languages. Fine-tuning is required for languages not included in the pre-trained model. For more information on fine-tuning, visit the [GOT OCR 2.0 Fine-tuning Guide](https://github.com/Ucas-HaoranWei/GOT-OCR2.0/?tab=readme-ov-file#fine-tune).
|
49 |
+
|
50 |
+
For more technical details about the model architecture and usage, visit the [GOT OCR 2.0 Model Documentation](https://github.com/Ucas-HaoranWei/GOT-OCR2.0/?tab=readme-ov-file#general-ocr-theory-towards-ocr-20-via-a-unified-end-to-end-model).
|
51 |
+
|
52 |
+
|
53 |
+
## Deployment
|
54 |
+
To deploy the application to a cloud platform(Hugging Face)
|
55 |
+
|
56 |
+
## Folder Structure
|
57 |
+
1.```bash
|
58 |
+
.
|
59 |
+
βββ app.py # Main application file
|
60 |
+
βββ requirements.txt # Python dependencies
|
61 |
+
βββ README.md # Projectdocumentation
|
62 |
+
|
63 |
+
|
64 |
+
## Dependencies
|
65 |
+
1. Streamlit: Web framework to create the interactive interface.
|
66 |
+
2. Transformers: To load the GOT OCR 2.0 model.
|
67 |
+
3. EasyOCR: To process Hindi text extraction.
|
68 |
+
4. Torchvision: To handle image transformations.
|
69 |
+
5. Pillow: Image processing library.
|
app.py
CHANGED
@@ -75,6 +75,15 @@ if 'reset' not in st.session_state:
|
|
75 |
|
76 |
if 'language' not in st.session_state:
|
77 |
st.session_state.language = False
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
78 |
|
79 |
st.header("Optical Character Recognition ")
|
80 |
col1, col2 = st.columns(2)
|
|
|
75 |
|
76 |
if 'language' not in st.session_state:
|
77 |
st.session_state.language = False
|
78 |
+
|
79 |
+
with st.sidebar:
|
80 |
+
st.header("Instructions")
|
81 |
+
st.write("1. Choose a language (English or Hindi).")
|
82 |
+
st.write("2. Upload an image in JPG, PNG, or JPEG format.")
|
83 |
+
st.write("3. The app will extract text from the image using OCR.")
|
84 |
+
st.write("4. Enter keywords to search within the extracted text.")
|
85 |
+
st.write("5. If needed, click 'Reset' to upload a new image.")
|
86 |
+
|
87 |
|
88 |
st.header("Optical Character Recognition ")
|
89 |
col1, col2 = st.columns(2)
|