Update README.md
Browse files
README.md
CHANGED
@@ -1,13 +1,60 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Image to Text Response with RAG
|
2 |
+
|
3 |
+
This project is a web application that allows users to upload an image, process it to extract details, and then interact with the application by asking questions about the image. The application uses the Salesforce BLIP model for image captioning and OpenAI's GPT-3.5-turbo for generating responses to user queries.
|
4 |
+
|
5 |
+
## Description
|
6 |
+
|
7 |
+
This project was built to demonstrate the capabilities of using advanced machine learning models for image processing and natural language generation. The application provides an interactive way to understand and interact with the content of an image by using state-of-the-art models for image captioning and conversational AI.
|
8 |
+
|
9 |
+
### Technologies Used
|
10 |
+
|
11 |
+
- **Streamlit**: For building the interactive web interface.
|
12 |
+
- **Salesforce BLIP Model**: For generating image captions.
|
13 |
+
- **OpenAI GPT-3.5-turbo**: For generating responses to user questions.
|
14 |
+
- **Pillow**: For image processing.
|
15 |
+
- **Python Dotenv**: For managing environment variables.
|
16 |
+
|
17 |
+
## How to Run
|
18 |
+
|
19 |
+
1. **Clone the repository**:
|
20 |
+
```sh
|
21 |
+
git clone https://github.com/your-username/your-repo-name.git
|
22 |
+
cd your-repo-name
|
23 |
+
```
|
24 |
+
|
25 |
+
2. **Create a virtual environment**:
|
26 |
+
```sh
|
27 |
+
python -m venv venv
|
28 |
+
source venv/bin/activate # On Windows, use `venv\Scripts\activate`
|
29 |
+
```
|
30 |
+
|
31 |
+
3. **Install the required packages**:
|
32 |
+
```sh
|
33 |
+
pip install -r requirements.txt
|
34 |
+
```
|
35 |
+
|
36 |
+
4. **Set up your OpenAI API key**:
|
37 |
+
- Create a `.env` file in the root directory of the project.
|
38 |
+
- Add your OpenAI API key to the `.env` file:
|
39 |
+
```env
|
40 |
+
OPENAI_API_KEY=your_openai_api_key
|
41 |
+
```
|
42 |
+
|
43 |
+
5. **Run the Streamlit application**:
|
44 |
+
```sh
|
45 |
+
streamlit run app.py
|
46 |
+
```
|
47 |
+
|
48 |
+
6. **Open the application**:
|
49 |
+
- Open your web browser and go to `http://localhost:8501`.
|
50 |
+
|
51 |
+
## Usage
|
52 |
+
|
53 |
+
1. Upload an image in JPG, JPEG, or PNG format.
|
54 |
+
2. Wait for the image to be processed.
|
55 |
+
3. Ask questions about the image in the chat interface.
|
56 |
+
4. View the responses generated by the application based on the image details.
|
57 |
+
|
58 |
+
## License
|
59 |
+
|
60 |
+
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.
|