shukdevdatta123 commited on
Commit
8faefa1
·
verified ·
1 Parent(s): f24d29c

Upload 8 files

Browse files
Files changed (8) hide show
  1. README.md +172 -14
  2. abc.txt +118 -0
  3. abc2.txt +299 -0
  4. abc3.txt +523 -0
  5. app.py +522 -0
  6. generate_answer.py +95 -0
  7. helpers.py +42 -0
  8. requirements.txt +3 -0
README.md CHANGED
@@ -1,14 +1,172 @@
1
- ---
2
- title: Multi Modal O1 Chatbot
3
- emoji: 🔥
4
- colorFrom: green
5
- colorTo: blue
6
- sdk: gradio
7
- sdk_version: 5.32.0
8
- app_file: app.py
9
- pinned: false
10
- license: mit
11
- short_description: Omni-1 and Omni-3 Chatbot
12
- ---
13
-
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Multi Modal Omni Chatbot
3
+ emoji: 🐠
4
+ colorFrom: blue
5
+ colorTo: pink
6
+ sdk: gradio
7
+ sdk_version: 5.20.1
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ short_description: A chatbot that supports text, image, voice and pdf chat.
12
+ ---
13
+
14
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
15
+
16
+ ---
17
+
18
+ # Building a Multimodal Chatbot with Gradio and OpenAI
19
+
20
+ In recent years, the field of artificial intelligence (AI) has seen an exciting leap in multimodal capabilities. Multimodal systems can understand and generate multiple types of input — like text and images — to provide richer, more dynamic responses. One such example is a multimodal chatbot that can process both text and image inputs using the OpenAI API.
21
+
22
+ In this article, we’ll walk through how to create a multimodal chatbot using **Gradio** and the **OpenAI API** that allows users to input both text and images, interact with the model, and receive insightful responses.
23
+
24
+ ## Key Components
25
+
26
+ Before we dive into the code, let's break down the core components of this chatbot:
27
+
28
+ - **Gradio**: A simple, open-source Python library for building UIs for machine learning models. It allows you to quickly create and deploy interfaces for any ML model, including those that take images, text, or audio as input.
29
+
30
+ - **OpenAI API**: This is the engine behind our chatbot. OpenAI provides models like `gpt-3.5`, `gpt-4`, and specialized image models such as `o1` for handling multimodal tasks (image and text inputs).
31
+
32
+ - **Python and PIL**: To handle image preprocessing, we use Python's `PIL` (Python Imaging Library) to convert uploaded images into a format that can be passed into the OpenAI model.
33
+
34
+ ## The Chatbot Overview
35
+
36
+ The chatbot can take two main types of input:
37
+ 1. **Text Input**: Ask a question or give a prompt to the model.
38
+ 2. **Image Input**: Upload an image, and the model will interpret the image and provide a response based on its content.
39
+
40
+ The interface offers the user the ability to adjust two main settings:
41
+ - **Reasoning Effort**: This controls how complex or detailed the assistant’s answers should be. The options are `low`, `medium`, and `high`.
42
+ - **Model Choice**: Users can select between two models: `o1` (optimized for image input) and `o3-mini` (focused on text input).
43
+
44
+ The interface is simple, intuitive, and interactive, with the chat history displayed on the side.
45
+
46
+ ## Step-by-Step Code Explanation
47
+
48
+ ### 1. Set Up Gradio UI
49
+
50
+ Gradio makes it easy to create beautiful interfaces for your AI models. We start by defining a custom interface with the following components:
51
+
52
+ - **Textbox for OpenAI API Key**: Users provide their OpenAI API key to authenticate their request.
53
+ - **Image Upload and Text Input Fields**: Users can choose to upload an image or input text.
54
+ - **Dropdowns for Reasoning Effort and Model Selection**: Choose the complexity of the responses and the model to use.
55
+ - **Submit and Clear Buttons**: These trigger the logic to process user inputs and clear chat history, respectively.
56
+
57
+ ```python
58
+ with gr.Blocks(css=custom_css) as demo:
59
+ gr.Markdown("""
60
+ <div class="gradio-header">
61
+ <h1>Multimodal Chatbot (Text + Image)</h1>
62
+ <h3>Interact with a chatbot using text or image inputs</h3>
63
+ </div>
64
+ """)
65
+
66
+ # User inputs and chat history
67
+ openai_api_key = gr.Textbox(label="Enter OpenAI API Key", type="password", placeholder="sk-...", interactive=True)
68
+ image_input = gr.Image(label="Upload an Image", type="pil")
69
+ input_text = gr.Textbox(label="Enter Text Question", placeholder="Ask a question or provide text", lines=2)
70
+
71
+ # Reasoning effort and model selection
72
+ reasoning_effort = gr.Dropdown(label="Reasoning Effort", choices=["low", "medium", "high"], value="medium")
73
+ model_choice = gr.Dropdown(label="Select Model", choices=["o1", "o3-mini"], value="o1")
74
+
75
+ submit_btn = gr.Button("Ask!", elem_id="submit-btn")
76
+ clear_btn = gr.Button("Clear History", elem_id="clear-history")
77
+
78
+ # Chat history display
79
+ chat_history = gr.Chatbot()
80
+ ```
81
+
82
+ ### 2. Handle Image and Text Inputs
83
+
84
+ The function `generate_response` processes both image and text inputs by sending them to OpenAI’s API. If an image is uploaded, it gets converted into a **base64 string** so it can be sent as part of the prompt.
85
+
86
+ For text inputs, the prompt is directly passed to the model.
87
+
88
+ ```python
89
+ def generate_response(input_text, image, openai_api_key, reasoning_effort="medium", model_choice="o1"):
90
+ openai.api_key = openai_api_key
91
+
92
+ if image:
93
+ image_info = get_base64_string_from_image(image)
94
+ input_text = f"data:image/png;base64,{image_info}"
95
+
96
+ if model_choice == "o1":
97
+ messages = [{"role": "user", "content": [{"type": "image_url", "image_url": {"url": input_text}}]}]
98
+ elif model_choice == "o3-mini":
99
+ messages = [{"role": "user", "content": [{"type": "text", "text": input_text}]}]
100
+
101
+ # API request
102
+ response = openai.ChatCompletion.create(
103
+ model=model_choice,
104
+ messages=messages,
105
+ reasoning_effort=reasoning_effort,
106
+ max_completion_tokens=2000
107
+ )
108
+ return response["choices"][0]["message"]["content"]
109
+ ```
110
+
111
+ ### 3. Image-to-Base64 Conversion
112
+
113
+ To ensure the image is properly formatted, we convert it into a **base64** string. This string can then be embedded directly into the OpenAI request. This conversion is handled by the `get_base64_string_from_image` function.
114
+
115
+ ```python
116
+ def get_base64_string_from_image(pil_image):
117
+ buffered = io.BytesIO()
118
+ pil_image.save(buffered, format="PNG")
119
+ img_bytes = buffered.getvalue()
120
+ base64_str = base64.b64encode(img_bytes).decode("utf-8")
121
+ return base64_str
122
+ ```
123
+
124
+ ### 4. Chat History and Interaction
125
+
126
+ The chat history is stored and displayed using Gradio’s `gr.Chatbot`. Each time the user submits a question or image, the conversation history is updated, showing both user and assistant responses in an easy-to-read format.
127
+
128
+ ```python
129
+ def chatbot(input_text, image, openai_api_key, reasoning_effort, model_choice, history=[]):
130
+ response = generate_response(input_text, image, openai_api_key, reasoning_effort, model_choice)
131
+ history.append((f"User: {input_text}", f"Assistant: {response}"))
132
+ return "", history
133
+ ```
134
+
135
+ ### 5. Clear History Function
136
+
137
+ To reset the conversation, we include a simple function that clears the chat history when the "Clear History" button is clicked.
138
+
139
+ ```python
140
+ def clear_history():
141
+ return "", []
142
+ ```
143
+
144
+ ### 6. Custom CSS for Styling
145
+
146
+ To ensure a visually appealing interface, custom CSS is applied. The design includes animations for chat messages and custom button styles to make the interaction smoother.
147
+
148
+ ```css
149
+ /* Custom CSS for the chat interface */
150
+ .gradio-container { ... }
151
+ .gradio-header { ... }
152
+ .gradio-chatbot { ... }
153
+ ```
154
+
155
+ ### 7. Launch the Interface
156
+
157
+ Finally, we call the `create_interface()` function to launch the Gradio interface. This allows users to start interacting with the chatbot by uploading images, entering text, and receiving responses based on the selected model and reasoning effort.
158
+
159
+ ```python
160
+ if __name__ == "__main__":
161
+ demo = create_interface()
162
+ demo.launch()
163
+ ```
164
+
165
+ ## Conclusion
166
+
167
+ This multimodal chatbot can handle both text and image inputs, offering a rich conversational experience. By combining the power of **Gradio** for building intuitive UIs and **OpenAI’s powerful models** for natural language processing and image recognition, this application demonstrates how to seamlessly integrate multiple forms of input into a single, easy-to-use interface.
168
+
169
+ Feel free to try it out yourself and experiment with different settings, including reasoning effort and model selection. Whether you're building a customer support bot or an image-based query system, this framework provides a flexible foundation for creating powerful, multimodal applications.
170
+
171
+ ---
172
+
abc.txt ADDED
@@ -0,0 +1,118 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import openai
3
+ import base64
4
+ from PIL import Image
5
+ import io
6
+
7
+ # Function to send the request to OpenAI API with an image or text input
8
+ def generate_response(input_text, image, openai_api_key, reasoning_effort="medium", model_choice="o1"):
9
+ if not openai_api_key:
10
+ return "Error: No API key provided."
11
+
12
+ openai.api_key = openai_api_key
13
+
14
+ # Process the input depending on whether it's text or an image
15
+ if image:
16
+ # Convert the image to base64 string
17
+ image_info = get_base64_string_from_image(image)
18
+ input_text = f"data:image/png;base64,{image_info}"
19
+
20
+ # Prepare the messages for OpenAI API
21
+ if model_choice == "o1":
22
+ messages = [
23
+ {"role": "user", "content": [{"type": "image_url", "image_url": {"url": input_text}}]}
24
+ ]
25
+ elif model_choice == "o3-mini":
26
+ messages = [
27
+ {"role": "user", "content": [{"type": "text", "text": input_text}]}
28
+ ]
29
+
30
+ try:
31
+ # Call OpenAI API with the selected model
32
+ response = openai.ChatCompletion.create(
33
+ model=model_choice, # Dynamically choose the model (o1 or o3-mini)
34
+ messages=messages,
35
+ reasoning_effort=reasoning_effort, # Set reasoning_effort for the response
36
+ max_completion_tokens=2000 # Limit response tokens to 2000
37
+ )
38
+
39
+ return response["choices"][0]["message"]["content"]
40
+ except Exception as e:
41
+ return f"Error calling OpenAI API: {str(e)}"
42
+
43
+ # Function to convert an uploaded image to a base64 string
44
+ def get_base64_string_from_image(pil_image):
45
+ # Convert PIL Image to bytes
46
+ buffered = io.BytesIO()
47
+ pil_image.save(buffered, format="PNG")
48
+ img_bytes = buffered.getvalue()
49
+ base64_str = base64.b64encode(img_bytes).decode("utf-8")
50
+ return base64_str
51
+
52
+ # The function that will be used by Gradio interface
53
+ def chatbot(input_text, image, openai_api_key, reasoning_effort, model_choice, history=[]):
54
+ response = generate_response(input_text, image, openai_api_key, reasoning_effort, model_choice)
55
+
56
+ # Append the response to the history
57
+ history.append((f"User: {input_text}", f"Assistant: {response}"))
58
+
59
+ return "", history
60
+
61
+ # Function to clear the chat history
62
+ def clear_history():
63
+ return "", []
64
+
65
+ # Gradio interface setup
66
+ def create_interface():
67
+ with gr.Blocks() as demo:
68
+ gr.Markdown("# Multimodal Chatbot (Text + Image)")
69
+
70
+ # Add a description after the title
71
+ gr.Markdown("""
72
+ ### Description:
73
+ This is a multimodal chatbot that can handle both text and image inputs.
74
+ - You can ask questions or provide text, and the assistant will respond.
75
+ - You can also upload an image, and the assistant will process it and answer questions about the image.
76
+ - Enter your OpenAI API key to start interacting with the model.
77
+ - You can use the 'Clear History' button to remove the conversation history.
78
+ - "o1" is for image chat and "o3-mini" is for text chat.
79
+ ### Reasoning Effort:
80
+ The reasoning effort controls how complex or detailed the assistant's answers should be.
81
+ - **Low**: Provides quick, concise answers with minimal reasoning or details.
82
+ - **Medium**: Offers a balanced response with a reasonable level of detail and thought.
83
+ - **High**: Produces more detailed, analytical, or thoughtful responses, requiring deeper reasoning.
84
+ """)
85
+
86
+ with gr.Row():
87
+ openai_api_key = gr.Textbox(label="Enter OpenAI API Key", type="password", placeholder="sk-...", interactive=True)
88
+
89
+ with gr.Row():
90
+ image_input = gr.Image(label="Upload an Image", type="pil") # Image upload input
91
+ input_text = gr.Textbox(label="Enter Text Question", placeholder="Ask a question or provide text", lines=2)
92
+
93
+ with gr.Row():
94
+ reasoning_effort = gr.Dropdown(
95
+ label="Reasoning Effort",
96
+ choices=["low", "medium", "high"],
97
+ value="medium"
98
+ )
99
+ model_choice = gr.Dropdown(
100
+ label="Select Model",
101
+ choices=["o1", "o3-mini"],
102
+ value="o1" # Default to 'o1' for image-related tasks
103
+ )
104
+ submit_btn = gr.Button("Send")
105
+ clear_btn = gr.Button("Clear History")
106
+
107
+ chat_history = gr.Chatbot()
108
+
109
+ # Button interactions
110
+ submit_btn.click(fn=chatbot, inputs=[input_text, image_input, openai_api_key, reasoning_effort, model_choice, chat_history], outputs=[input_text, chat_history])
111
+ clear_btn.click(fn=clear_history, inputs=[], outputs=[chat_history, chat_history])
112
+
113
+ return demo
114
+
115
+ # Run the interface
116
+ if __name__ == "__main__":
117
+ demo = create_interface()
118
+ demo.launch()
abc2.txt ADDED
@@ -0,0 +1,299 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import openai
3
+ import base64
4
+ from PIL import Image
5
+ import io
6
+
7
+ # Function to send the request to OpenAI API with an image or text input
8
+ def generate_response(input_text, image, openai_api_key, reasoning_effort="medium", model_choice="o1"):
9
+ if not openai_api_key:
10
+ return "Error: No API key provided."
11
+
12
+ openai.api_key = openai_api_key
13
+
14
+ # Process the input depending on whether it's text or an image
15
+ if image:
16
+ # Convert the image to base64 string
17
+ image_info = get_base64_string_from_image(image)
18
+ input_text = f"data:image/png;base64,{image_info}"
19
+
20
+ # Prepare the messages for OpenAI API
21
+ if model_choice == "o1":
22
+ if image:
23
+ messages = [
24
+ {"role": "user", "content": [{"type": "image_url", "image_url": {"url": input_text}}]}
25
+ ]
26
+ else:
27
+ messages = [
28
+ {"role": "user", "content": [{"type": "text", "text": input_text}]}
29
+ ]
30
+ elif model_choice == "o3-mini":
31
+ messages = [
32
+ {"role": "user", "content": [{"type": "text", "text": input_text}]}
33
+ ]
34
+
35
+ try:
36
+ # Call OpenAI API with the selected model
37
+ response = openai.ChatCompletion.create(
38
+ model=model_choice, # Dynamically choose the model (o1 or o3-mini)
39
+ messages=messages,
40
+ reasoning_effort=reasoning_effort, # Set reasoning_effort for the response
41
+ max_completion_tokens=2000 # Limit response tokens to 2000
42
+ )
43
+
44
+ return response["choices"][0]["message"]["content"]
45
+ except Exception as e:
46
+ return f"Error calling OpenAI API: {str(e)}"
47
+
48
+ # Function to convert an uploaded image to a base64 string
49
+ def get_base64_string_from_image(pil_image):
50
+ # Convert PIL Image to bytes
51
+ buffered = io.BytesIO()
52
+ pil_image.save(buffered, format="PNG")
53
+ img_bytes = buffered.getvalue()
54
+ base64_str = base64.b64encode(img_bytes).decode("utf-8")
55
+ return base64_str
56
+
57
+ # Function to transcribe audio to text using OpenAI Whisper API
58
+ def transcribe_audio(audio, openai_api_key):
59
+ if not openai_api_key:
60
+ return "Error: No API key provided."
61
+
62
+ openai.api_key = openai_api_key
63
+
64
+ try:
65
+ # Open the audio file and pass it as a file object
66
+ with open(audio, 'rb') as audio_file:
67
+ audio_file_content = audio_file.read()
68
+
69
+ # Use the correct transcription API call
70
+ audio_file_obj = io.BytesIO(audio_file_content)
71
+ audio_file_obj.name = 'audio.wav' # Set a name for the file object (as OpenAI expects it)
72
+
73
+ # Transcribe the audio to text using OpenAI's whisper model
74
+ audio_file_transcription = openai.Audio.transcribe(file=audio_file_obj, model="whisper-1")
75
+ return audio_file_transcription['text']
76
+ except Exception as e:
77
+ return f"Error transcribing audio: {str(e)}"
78
+
79
+ # The function that will be used by Gradio interface
80
+ def chatbot(input_text, image, audio, openai_api_key, reasoning_effort, model_choice, history=[]):
81
+ # If there's audio, transcribe it to text
82
+ if audio:
83
+ input_text = transcribe_audio(audio, openai_api_key)
84
+
85
+ response = generate_response(input_text, image, openai_api_key, reasoning_effort, model_choice)
86
+
87
+ # Append the response to the history
88
+ history.append((f"User: {input_text}", f"Assistant: {response}"))
89
+
90
+ return "", history
91
+
92
+ # Function to clear the chat history
93
+ def clear_history():
94
+ return "", []
95
+
96
+ # Custom CSS styles with animations and button colors
97
+ custom_css = """
98
+ /* General body styles */
99
+ .gradio-container {
100
+ font-family: 'Arial', sans-serif;
101
+ background-color: #f8f9fa;
102
+ color: #333;
103
+ }
104
+ /* Header styles */
105
+ .gradio-header {
106
+ background-color: #007bff;
107
+ color: white;
108
+ padding: 20px;
109
+ text-align: center;
110
+ border-radius: 8px;
111
+ box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);
112
+ animation: fadeIn 1s ease-out;
113
+ }
114
+ .gradio-header h1 {
115
+ font-size: 2.5rem;
116
+ }
117
+ .gradio-header h3 {
118
+ font-size: 1.2rem;
119
+ margin-top: 10px;
120
+ }
121
+ /* Chatbot container styles */
122
+ .gradio-chatbot {
123
+ background-color: #fff;
124
+ border-radius: 10px;
125
+ padding: 20px;
126
+ box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);
127
+ max-height: 500px;
128
+ overflow-y: auto;
129
+ animation: fadeIn 2s ease-out;
130
+ }
131
+ /* Input field styles */
132
+ .gradio-textbox, .gradio-dropdown, .gradio-image, .gradio-audio {
133
+ border-radius: 8px;
134
+ border: 2px solid #ccc;
135
+ padding: 10px;
136
+ margin-bottom: 10px;
137
+ width: 100%;
138
+ font-size: 1rem;
139
+ transition: all 0.3s ease;
140
+ }
141
+ .gradio-textbox:focus, .gradio-dropdown:focus, .gradio-image:focus, .gradio-audio:focus {
142
+ border-color: #007bff;
143
+ }
144
+ /* Button styles */
145
+ /* Send Button: Sky Blue */
146
+ #submit-btn {
147
+ background-color: #00aaff; /* Sky blue */
148
+ color: white;
149
+ border: none;
150
+ border-radius: 8px;
151
+ padding: 10px 19px;
152
+ font-size: 1.1rem;
153
+ cursor: pointer;
154
+ transition: all 0.3s ease;
155
+ margin-left: auto;
156
+ margin-right: auto;
157
+ display: block;
158
+ margin-top: 10px;
159
+ }
160
+ #submit-btn:hover {
161
+ background-color: #0099cc; /* Slightly darker blue */
162
+ }
163
+ #submit-btn:active {
164
+ transform: scale(0.95);
165
+ }
166
+ #clear-history {
167
+ background-color: #f04e4e; /* Slightly Darker red */
168
+ color: white;
169
+ border: none;
170
+ border-radius: 8px;
171
+ padding: 10px 13px;
172
+ font-size: 1.1rem;
173
+ cursor: pointer;
174
+ transition: all 0.3s ease;
175
+ margin-top: 10px;
176
+ }
177
+ #clear-history:hover {
178
+ background-color: #f5a4a4; /* Light red */
179
+ }
180
+ #clear-history:active {
181
+ transform: scale(0.95);
182
+ }
183
+ /* Chat history styles */
184
+ .gradio-chatbot .message {
185
+ margin-bottom: 10px;
186
+ }
187
+ .gradio-chatbot .user {
188
+ background-color: #007bff;
189
+ color: white;
190
+ padding: 10px;
191
+ border-radius: 12px;
192
+ max-width: 70%;
193
+ animation: slideInUser 0.5s ease-out;
194
+ }
195
+ .gradio-chatbot .assistant {
196
+ background-color: #f1f1f1;
197
+ color: #333;
198
+ padding: 10px;
199
+ border-radius: 12px;
200
+ max-width: 70%;
201
+ margin-left: auto;
202
+ animation: slideInAssistant 0.5s ease-out;
203
+ }
204
+ /* Animation keyframes */
205
+ @keyframes fadeIn {
206
+ 0% { opacity: 0; }
207
+ 100% { opacity: 1; }
208
+ }
209
+ @keyframes slideInUser {
210
+ 0% { transform: translateX(-100%); }
211
+ 100% { transform: translateX(0); }
212
+ }
213
+ @keyframes slideInAssistant {
214
+ 0% { transform: translateX(100%); }
215
+ 100% { transform: translateX(0); }
216
+ }
217
+ /* Mobile responsiveness */
218
+ @media (max-width: 768px) {
219
+ .gradio-header h1 {
220
+ font-size: 1.8rem;
221
+ }
222
+ .gradio-header h3 {
223
+ font-size: 1rem;
224
+ }
225
+ .gradio-chatbot {
226
+ max-height: 400px;
227
+ }
228
+ .gradio-textbox, .gradio-dropdown, .gradio-image, .gradio-audio {
229
+ width: 100%;
230
+ }
231
+ #submit-btn, #clear-history {
232
+ width: 100%;
233
+ margin-left: 0;
234
+ }
235
+ }
236
+ """
237
+
238
+ # Gradio interface setup
239
+ def create_interface():
240
+ with gr.Blocks(css=custom_css) as demo:
241
+ gr.Markdown("""
242
+ <div class="gradio-header">
243
+ <h1>Multimodal Chatbot (Text + Image + Voice)</h1>
244
+ <h3>Interact with a chatbot using text, image, or voice inputs</h3>
245
+ </div>
246
+ """)
247
+
248
+ # Add a description with an expandable accordion
249
+ with gr.Accordion("Click to expand for details", open=False):
250
+ gr.Markdown("""
251
+ ### Description:
252
+ This is a multimodal chatbot that can handle text, image, and voice inputs.
253
+ - You can ask questions or provide text, and the assistant will respond.
254
+ - You can also upload an image, and the assistant will process it and answer questions about the image.
255
+ - Voice input is supported: You can upload or record an audio file, and it will be transcribed to text and sent to the assistant.
256
+ - Enter your OpenAI API key to start interacting with the model.
257
+ - You can use the 'Clear History' button to remove the conversation history.
258
+ - "o1" is for image chat and "o3-mini" is for text chat.
259
+ ### Reasoning Effort:
260
+ The reasoning effort controls how complex or detailed the assistant's answers should be.
261
+ - **Low**: Provides quick, concise answers with minimal reasoning or details.
262
+ - **Medium**: Offers a balanced response with a reasonable level of detail and thought.
263
+ - **High**: Produces more detailed, analytical, or thoughtful responses, requiring deeper reasoning.
264
+ """)
265
+
266
+ with gr.Row():
267
+ openai_api_key = gr.Textbox(label="Enter OpenAI API Key", type="password", placeholder="sk-...", interactive=True)
268
+
269
+ with gr.Row():
270
+ image_input = gr.Image(label="Upload an Image", type="pil") # Image upload input
271
+ input_text = gr.Textbox(label="Enter Text Question", placeholder="Ask a question or provide text", lines=2)
272
+ audio_input = gr.Audio(label="Upload or Record Audio", type="filepath") # Audio upload or record input (using filepath)
273
+
274
+ with gr.Row():
275
+ reasoning_effort = gr.Dropdown(
276
+ label="Reasoning Effort",
277
+ choices=["low", "medium", "high"],
278
+ value="medium"
279
+ )
280
+ model_choice = gr.Dropdown(
281
+ label="Select Model",
282
+ choices=["o1", "o3-mini"],
283
+ value="o1" # Default to 'o1' for image-related tasks
284
+ )
285
+ submit_btn = gr.Button("Ask!", elem_id="submit-btn")
286
+ clear_btn = gr.Button("Clear History", elem_id="clear-history")
287
+
288
+ chat_history = gr.Chatbot()
289
+
290
+ # Button interactions
291
+ submit_btn.click(fn=chatbot, inputs=[input_text, image_input, audio_input, openai_api_key, reasoning_effort, model_choice, chat_history], outputs=[input_text, chat_history])
292
+ clear_btn.click(fn=clear_history, inputs=[], outputs=[chat_history, chat_history])
293
+
294
+ return demo
295
+
296
+ # Run the interface
297
+ if __name__ == "__main__":
298
+ demo = create_interface()
299
+ demo.launch()
abc3.txt ADDED
@@ -0,0 +1,523 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import openai
3
+ import base64
4
+ from PIL import Image
5
+ import io
6
+ import os
7
+ import tempfile
8
+ import fitz # PyMuPDF for PDF handling
9
+
10
+ # Function to extract text from PDF files
11
+ def extract_text_from_pdf(pdf_file):
12
+ try:
13
+ text = ""
14
+ pdf_document = fitz.open(pdf_file)
15
+
16
+ for page_num in range(len(pdf_document)):
17
+ page = pdf_document[page_num]
18
+ text += page.get_text()
19
+
20
+ pdf_document.close()
21
+ return text
22
+ except Exception as e:
23
+ return f"Error extracting text from PDF: {str(e)}"
24
+
25
+ # Function to generate MCQ quiz from PDF content
26
+ def generate_mcq_quiz(pdf_content, num_questions, openai_api_key, model_choice):
27
+ if not openai_api_key:
28
+ return "Error: No API key provided."
29
+
30
+ openai.api_key = openai_api_key
31
+
32
+ # Limit content length to avoid token limits
33
+ limited_content = pdf_content[:8000] if len(pdf_content) > 8000 else pdf_content
34
+
35
+ prompt = f"""Based on the following document content, generate {num_questions} multiple-choice quiz questions.
36
+ For each question:
37
+ 1. Create a clear question based on key concepts in the document
38
+ 2. Provide 4 possible answers (A, B, C, D)
39
+ 3. Indicate the correct answer
40
+ 4. Briefly explain why the answer is correct
41
+ Format the output clearly with each question numbered and separated.
42
+ Document content:
43
+ {limited_content}
44
+ """
45
+
46
+ try:
47
+ messages = [
48
+ {"role": "user", "content": prompt}
49
+ ]
50
+
51
+ response = openai.ChatCompletion.create(
52
+ model=model_choice,
53
+ messages=messages
54
+ )
55
+
56
+ return response.choices[0].message.content
57
+ except Exception as e:
58
+ return f"Error generating quiz: {str(e)}"
59
+
60
+ # Function to send the request to OpenAI API with an image, text or PDF input
61
+ def generate_response(input_text, image, pdf_content, openai_api_key, reasoning_effort="medium", model_choice="o1"):
62
+ if not openai_api_key:
63
+ return "Error: No API key provided."
64
+
65
+ openai.api_key = openai_api_key
66
+
67
+ # Process the input depending on whether it's text, image, or a PDF-related query
68
+ if pdf_content and input_text:
69
+ # For PDF queries, we combine the PDF content with the user's question
70
+ prompt = f"Based on the following document content, please answer this question: '{input_text}'\n\nDocument content:\n{pdf_content}"
71
+ input_content = prompt
72
+ elif image:
73
+ # Convert the image to base64 string
74
+ image_info = get_base64_string_from_image(image)
75
+ input_content = f"data:image/png;base64,{image_info}"
76
+ else:
77
+ # Plain text input
78
+ input_content = input_text
79
+
80
+ # Prepare the messages for OpenAI API
81
+ if model_choice == "o1":
82
+ if image and not pdf_content:
83
+ messages = [
84
+ {"role": "user", "content": [{"type": "image_url", "image_url": {"url": input_content}}]}
85
+ ]
86
+ else:
87
+ messages = [
88
+ {"role": "user", "content": input_content}
89
+ ]
90
+ elif model_choice == "o3-mini":
91
+ messages = [
92
+ {"role": "user", "content": input_content}
93
+ ]
94
+
95
+ try:
96
+ # Call OpenAI API with the selected model
97
+ response = openai.ChatCompletion.create(
98
+ model=model_choice,
99
+ messages=messages,
100
+ max_completion_tokens=2000
101
+ )
102
+
103
+ return response.choices[0].message.content
104
+ except Exception as e:
105
+ return f"Error calling OpenAI API: {str(e)}"
106
+
107
+ # Function to convert an uploaded image to a base64 string
108
+ def get_base64_string_from_image(pil_image):
109
+ # Convert PIL Image to bytes
110
+ buffered = io.BytesIO()
111
+ pil_image.save(buffered, format="PNG")
112
+ img_bytes = buffered.getvalue()
113
+ base64_str = base64.b64encode(img_bytes).decode("utf-8")
114
+ return base64_str
115
+
116
+ # Function to transcribe audio to text using OpenAI Whisper API
117
+ def transcribe_audio(audio, openai_api_key):
118
+ if not openai_api_key:
119
+ return "Error: No API key provided."
120
+
121
+ openai.api_key = openai_api_key
122
+
123
+ try:
124
+ # Open the audio file and pass it as a file object
125
+ with open(audio, 'rb') as audio_file:
126
+ audio_file_content = audio_file.read()
127
+
128
+ # Use the correct transcription API call
129
+ audio_file_obj = io.BytesIO(audio_file_content)
130
+ audio_file_obj.name = 'audio.wav' # Set a name for the file object (as OpenAI expects it)
131
+
132
+ # Transcribe the audio to text using OpenAI's whisper model
133
+ audio_file_transcription = openai.Audio.transcribe(file=audio_file_obj, model="whisper-1")
134
+ return audio_file_transcription.text
135
+ except Exception as e:
136
+ return f"Error transcribing audio: {str(e)}"
137
+
138
+ # The function that will be used by Gradio interface
139
+ def chatbot(input_text, image, audio, pdf_file, openai_api_key, reasoning_effort, model_choice, pdf_content, num_quiz_questions, pdf_quiz_mode, history):
140
+ if history is None:
141
+ history = []
142
+
143
+ # If there's audio, transcribe it to text
144
+ if audio:
145
+ input_text = transcribe_audio(audio, openai_api_key)
146
+
147
+ # If a new PDF is uploaded, extract its text
148
+ new_pdf_content = pdf_content
149
+ if pdf_file is not None:
150
+ new_pdf_content = extract_text_from_pdf(pdf_file)
151
+
152
+ # Check if we're in PDF quiz mode
153
+ if pdf_quiz_mode:
154
+ if new_pdf_content:
155
+ # Generate MCQ quiz questions
156
+ quiz_response = generate_mcq_quiz(new_pdf_content, int(num_quiz_questions), openai_api_key, model_choice)
157
+ history.append((f"User: [Uploaded PDF for Quiz - {int(num_quiz_questions)} questions]", f"Assistant: {quiz_response}"))
158
+ else:
159
+ history.append(("User: [Attempted to generate quiz without PDF]", "Assistant: Please upload a PDF file to generate quiz questions."))
160
+ else:
161
+ # Regular chat mode - generate the response
162
+ response = generate_response(input_text, image, new_pdf_content, openai_api_key, reasoning_effort, model_choice)
163
+
164
+ # Append the response to the history
165
+ if input_text:
166
+ history.append((f"User: {input_text}", f"Assistant: {response}"))
167
+ elif image is not None:
168
+ history.append((f"User: [Uploaded image]", f"Assistant: {response}"))
169
+ elif pdf_file is not None:
170
+ history.append((f"User: [Uploaded PDF]", f"Assistant: {response}"))
171
+ else:
172
+ history.append((f"User: [No input provided]", f"Assistant: Please provide some input (text, image, or PDF) for me to respond to."))
173
+
174
+ return "", None, None, None, new_pdf_content, history
175
+
176
+ # Function to clear the chat history and PDF content
177
+ def clear_history():
178
+ return "", None, None, None, "", []
179
+
180
+ # Function to process a newly uploaded PDF
181
+ def process_pdf(pdf_file):
182
+ if pdf_file is None:
183
+ return ""
184
+ return extract_text_from_pdf(pdf_file)
185
+
186
+ # Function to update visible components based on input type selection
187
+ def update_input_type(choice):
188
+ if choice == "Text":
189
+ return gr.update(visible=True), gr.update(visible=False), gr.update(visible=False), gr.update(visible=False), gr.update(visible=False), gr.update(value=False)
190
+ elif choice == "Image":
191
+ return gr.update(visible=True), gr.update(visible=True), gr.update(visible=False), gr.update(visible=False), gr.update(visible=False), gr.update(value=False)
192
+ elif choice == "Voice":
193
+ return gr.update(visible=False), gr.update(visible=False), gr.update(visible=True), gr.update(visible=False), gr.update(visible=False), gr.update(value=False)
194
+ elif choice == "PDF":
195
+ return gr.update(visible=True), gr.update(visible=False), gr.update(visible=False), gr.update(visible=True), gr.update(visible=False), gr.update(value=False)
196
+ elif choice == "PDF(QUIZ)":
197
+ return gr.update(visible=False), gr.update(visible=False), gr.update(visible=False), gr.update(visible=True), gr.update(visible=True), gr.update(value=True)
198
+
199
+ # Custom CSS styles with animations and button colors
200
+ custom_css = """
201
+ /* General body styles */
202
+ .gradio-container {
203
+ font-family: 'Arial', sans-serif;
204
+ background-color: #f8f9fa;
205
+ color: #333;
206
+ }
207
+ /* Header styles */
208
+ .gradio-header {
209
+ background-color: #007bff;
210
+ color: white;
211
+ padding: 20px;
212
+ text-align: center;
213
+ border-radius: 8px;
214
+ box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);
215
+ animation: fadeIn 1s ease-out;
216
+ }
217
+ .gradio-header h1 {
218
+ font-size: 2.5rem;
219
+ }
220
+ .gradio-header h3 {
221
+ font-size: 1.2rem;
222
+ margin-top: 10px;
223
+ }
224
+ /* Chatbot container styles */
225
+ .gradio-chatbot {
226
+ background-color: #fff;
227
+ border-radius: 10px;
228
+ padding: 20px;
229
+ box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);
230
+ max-height: 500px;
231
+ overflow-y: auto;
232
+ animation: fadeIn 2s ease-out;
233
+ }
234
+ /* Input field styles */
235
+ .gradio-textbox, .gradio-dropdown, .gradio-image, .gradio-audio, .gradio-file, .gradio-slider {
236
+ border-radius: 8px;
237
+ border: 2px solid #ccc;
238
+ padding: 10px;
239
+ margin-bottom: 10px;
240
+ width: 100%;
241
+ font-size: 1rem;
242
+ transition: all 0.3s ease;
243
+ }
244
+ .gradio-textbox:focus, .gradio-dropdown:focus, .gradio-image:focus, .gradio-audio:focus, .gradio-file:focus, .gradio-slider:focus {
245
+ border-color: #007bff;
246
+ }
247
+ /* Button styles */
248
+ /* Send Button: Sky Blue */
249
+ #submit-btn {
250
+ background-color: #00aaff; /* Sky blue */
251
+ color: white;
252
+ border: none;
253
+ border-radius: 8px;
254
+ padding: 10px 19px;
255
+ font-size: 1.1rem;
256
+ cursor: pointer;
257
+ transition: all 0.3s ease;
258
+ margin-left: auto;
259
+ margin-right: auto;
260
+ display: block;
261
+ margin-top: 10px;
262
+ }
263
+ #submit-btn:hover {
264
+ background-color: #0099cc; /* Slightly darker blue */
265
+ }
266
+ #submit-btn:active {
267
+ transform: scale(0.95);
268
+ }
269
+ #clear-history {
270
+ background-color: #f04e4e; /* Slightly Darker red */
271
+ color: white;
272
+ border: none;
273
+ border-radius: 8px;
274
+ padding: 10px 13px;
275
+ font-size: 1.1rem;
276
+ cursor: pointer;
277
+ transition: all 0.3s ease;
278
+ margin-top: 10px;
279
+ }
280
+ #clear-history:hover {
281
+ background-color: #f5a4a4; /* Light red */
282
+ }
283
+ #clear-history:active {
284
+ transform: scale(0.95);
285
+ }
286
+ /* Input type selector buttons */
287
+ #input-type-group {
288
+ display: flex;
289
+ justify-content: center;
290
+ gap: 10px;
291
+ margin-bottom: 20px;
292
+ }
293
+ .input-type-btn {
294
+ background-color: #6c757d;
295
+ color: white;
296
+ border: none;
297
+ border-radius: 8px;
298
+ padding: 10px 15px;
299
+ font-size: 1rem;
300
+ cursor: pointer;
301
+ transition: all 0.3s ease;
302
+ }
303
+ .input-type-btn.selected {
304
+ background-color: #007bff;
305
+ }
306
+ .input-type-btn:hover {
307
+ background-color: #5a6268;
308
+ }
309
+ /* Chat history styles */
310
+ .gradio-chatbot .message {
311
+ margin-bottom: 10px;
312
+ }
313
+ .gradio-chatbot .user {
314
+ background-color: #007bff;
315
+ color: white;
316
+ padding: 10px;
317
+ border-radius: 12px;
318
+ max-width: 70%;
319
+ animation: slideInUser 0.5s ease-out;
320
+ }
321
+ .gradio-chatbot .assistant {
322
+ background-color: #f1f1f1;
323
+ color: #333;
324
+ padding: 10px;
325
+ border-radius: 12px;
326
+ max-width: 70%;
327
+ margin-left: auto;
328
+ animation: slideInAssistant 0.5s ease-out;
329
+ }
330
+ /* Animation keyframes */
331
+ @keyframes fadeIn {
332
+ 0% { opacity: 0; }
333
+ 100% { opacity: 1; }
334
+ }
335
+ @keyframes slideInUser {
336
+ 0% { transform: translateX(-100%); }
337
+ 100% { transform: translateX(0); }
338
+ }
339
+ @keyframes slideInAssistant {
340
+ 0% { transform: translateX(100%); }
341
+ 100% { transform: translateX(0); }
342
+ }
343
+ /* Mobile responsiveness */
344
+ @media (max-width: 768px) {
345
+ .gradio-header h1 {
346
+ font-size: 1.8rem;
347
+ }
348
+ .gradio-header h3 {
349
+ font-size: 1rem;
350
+ }
351
+ .gradio-chatbot {
352
+ max-height: 400px;
353
+ }
354
+ .gradio-textbox, .gradio-dropdown, .gradio-image, .gradio-audio, .gradio-file, .gradio-slider {
355
+ width: 100%;
356
+ }
357
+ #submit-btn, #clear-history {
358
+ width: 100%;
359
+ margin-left: 0;
360
+ }
361
+ }
362
+ """
363
+
364
+ # Gradio interface setup
365
+ def create_interface():
366
+ with gr.Blocks(css=custom_css) as demo:
367
+ gr.Markdown("""
368
+ <div class="gradio-header">
369
+ <h1>Multimodal Chatbot (Text + Image + Voice + PDF + Quiz)</h1>
370
+ <h3>Interact with a chatbot using text, image, voice, or PDF inputs</h3>
371
+ </div>
372
+ """)
373
+
374
+ # Add a description with an expandable accordion
375
+ with gr.Accordion("Click to expand for details", open=False):
376
+ gr.Markdown("""
377
+ ### Description:
378
+ This is a multimodal chatbot that can handle text, image, voice, PDF inputs, and generate quizzes from PDFs.
379
+ - You can ask questions or provide text, and the assistant will respond.
380
+ - You can upload an image, and the assistant will process it and answer questions about the image.
381
+ - Voice input is supported: You can upload or record an audio file, and it will be transcribed to text and sent to the assistant.
382
+ - PDF support: Upload a PDF and ask questions about its content.
383
+ - PDF Quiz: Upload a PDF and specify how many MCQ questions you want generated based on the content.
384
+ - Enter your OpenAI API key to start interacting with the model.
385
+ - You can use the 'Clear History' button to remove the conversation history.
386
+ - "o1" is for image, voice, PDF and text chat and "o3-mini" is for text, PDF and voice chat only.
387
+ ### Reasoning Effort:
388
+ The reasoning effort controls how complex or detailed the assistant's answers should be.
389
+ - **Low**: Provides quick, concise answers with minimal reasoning or details.
390
+ - **Medium**: Offers a balanced response with a reasonable level of detail and thought.
391
+ - **High**: Produces more detailed, analytical, or thoughtful responses, requiring deeper reasoning.
392
+ """)
393
+
394
+ # Store PDF content as a state variable
395
+ pdf_content = gr.State("")
396
+
397
+ with gr.Row():
398
+ openai_api_key = gr.Textbox(label="Enter OpenAI API Key", type="password", placeholder="sk-...", interactive=True)
399
+
400
+ # Input type selector
401
+ with gr.Row():
402
+ input_type = gr.Radio(
403
+ ["Text", "Image", "Voice", "PDF", "PDF(QUIZ)"],
404
+ label="Choose Input Type",
405
+ value="Text"
406
+ )
407
+
408
+ # Create the input components (initially text is visible, others are hidden)
409
+ with gr.Row():
410
+ # Text input
411
+ input_text = gr.Textbox(
412
+ label="Enter Text Question",
413
+ placeholder="Ask a question or provide text",
414
+ lines=2,
415
+ visible=True
416
+ )
417
+
418
+ # Image input
419
+ image_input = gr.Image(
420
+ label="Upload an Image",
421
+ type="pil",
422
+ visible=False
423
+ )
424
+
425
+ # Audio input
426
+ audio_input = gr.Audio(
427
+ label="Upload or Record Audio",
428
+ type="filepath",
429
+ visible=False
430
+ )
431
+
432
+ # PDF input
433
+ pdf_input = gr.File(
434
+ label="Upload your PDF",
435
+ file_types=[".pdf"],
436
+ visible=False
437
+ )
438
+
439
+ # Quiz specific components
440
+ quiz_questions_slider = gr.Slider(
441
+ minimum=1,
442
+ maximum=20,
443
+ value=5,
444
+ step=1,
445
+ label="Number of Quiz Questions",
446
+ visible=False
447
+ )
448
+
449
+ # Hidden state for quiz mode
450
+ quiz_mode = gr.Checkbox(
451
+ label="Quiz Mode",
452
+ visible=False,
453
+ value=False
454
+ )
455
+
456
+ with gr.Row():
457
+ reasoning_effort = gr.Dropdown(
458
+ label="Reasoning Effort",
459
+ choices=["low", "medium", "high"],
460
+ value="medium"
461
+ )
462
+ model_choice = gr.Dropdown(
463
+ label="Select Model",
464
+ choices=["o1", "o3-mini"],
465
+ value="o1" # Default to 'o1' for image-related tasks
466
+ )
467
+ submit_btn = gr.Button("Ask!", elem_id="submit-btn")
468
+ clear_btn = gr.Button("Clear History", elem_id="clear-history")
469
+
470
+ chat_history = gr.Chatbot()
471
+
472
+ # Connect the input type selector to the update function
473
+ input_type.change(
474
+ fn=update_input_type,
475
+ inputs=[input_type],
476
+ outputs=[input_text, image_input, audio_input, pdf_input, quiz_questions_slider, quiz_mode]
477
+ )
478
+
479
+ # Process PDF when uploaded
480
+ pdf_input.change(
481
+ fn=process_pdf,
482
+ inputs=[pdf_input],
483
+ outputs=[pdf_content]
484
+ )
485
+
486
+ # Button interactions
487
+ submit_btn.click(
488
+ fn=chatbot,
489
+ inputs=[
490
+ input_text,
491
+ image_input,
492
+ audio_input,
493
+ pdf_input,
494
+ openai_api_key,
495
+ reasoning_effort,
496
+ model_choice,
497
+ pdf_content,
498
+ quiz_questions_slider,
499
+ quiz_mode,
500
+ chat_history
501
+ ],
502
+ outputs=[
503
+ input_text,
504
+ image_input,
505
+ audio_input,
506
+ pdf_input,
507
+ pdf_content,
508
+ chat_history
509
+ ]
510
+ )
511
+
512
+ clear_btn.click(
513
+ fn=clear_history,
514
+ inputs=[],
515
+ outputs=[input_text, image_input, audio_input, pdf_input, pdf_content, chat_history]
516
+ )
517
+
518
+ return demo
519
+
520
+ # Run the interface
521
+ if __name__ == "__main__":
522
+ demo = create_interface()
523
+ demo.launch()
app.py ADDED
@@ -0,0 +1,522 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import openai
3
+ import base64
4
+ from PIL import Image
5
+ import io
6
+ import os
7
+ import tempfile
8
+ import fitz # PyMuPDF for PDF handling
9
+
10
+ # Function to extract text from PDF files
11
+ def extract_text_from_pdf(pdf_file):
12
+ try:
13
+ text = ""
14
+ pdf_document = fitz.open(pdf_file)
15
+
16
+ for page_num in range(len(pdf_document)):
17
+ page = pdf_document[page_num]
18
+ text += page.get_text()
19
+
20
+ pdf_document.close()
21
+ return text
22
+ except Exception as e:
23
+ return f"Error extracting text from PDF: {str(e)}"
24
+
25
+ # Function to generate MCQ quiz from PDF content
26
+ def generate_mcq_quiz(pdf_content, num_questions, openai_api_key, model_choice):
27
+ if not openai_api_key:
28
+ return "Error: No API key provided."
29
+
30
+ openai.api_key = openai_api_key
31
+
32
+ # Limit content length to avoid token limits
33
+ limited_content = pdf_content[:8000] if len(pdf_content) > 8000 else pdf_content
34
+
35
+ prompt = f"""Based on the following document content, generate {num_questions} multiple-choice quiz questions.
36
+ For each question:
37
+ 1. Create a clear question based on key concepts in the document
38
+ 2. Provide 4 possible answers (A, B, C, D)
39
+ 3. Indicate the correct answer
40
+ 4. Briefly explain why the answer is correct
41
+
42
+ Format the output clearly with each question numbered and separated.
43
+
44
+ Document content:
45
+ {limited_content}
46
+ """
47
+
48
+ try:
49
+ messages = [
50
+ {"role": "user", "content": prompt}
51
+ ]
52
+
53
+ response = openai.ChatCompletion.create(
54
+ model=model_choice,
55
+ messages=messages
56
+ )
57
+
58
+ return response.choices[0].message.content
59
+ except Exception as e:
60
+ return f"Error generating quiz: {str(e)}"
61
+
62
+ # Function to send the request to OpenAI API with an image, text or PDF input
63
+ def generate_response(input_text, image, pdf_content, openai_api_key, reasoning_effort="medium", model_choice="o1"):
64
+ if not openai_api_key:
65
+ return "Error: No API key provided."
66
+
67
+ openai.api_key = openai_api_key
68
+
69
+ # Process the input depending on whether it's text, image, or a PDF-related query
70
+ if pdf_content and input_text:
71
+ # For PDF queries, we combine the PDF content with the user's question
72
+ prompt = f"Based on the following document content, please answer this question: '{input_text}'\n\nDocument content:\n{pdf_content}"
73
+ input_content = prompt
74
+ elif image:
75
+ # Convert the image to base64 string
76
+ image_info = get_base64_string_from_image(image)
77
+ input_content = f"data:image/png;base64,{image_info}"
78
+ else:
79
+ # Plain text input
80
+ input_content = input_text
81
+
82
+ # Prepare the messages for OpenAI API
83
+ if model_choice == "o1":
84
+ if image and not pdf_content:
85
+ messages = [
86
+ {"role": "user", "content": [{"type": "image_url", "image_url": {"url": input_content}}]}
87
+ ]
88
+ else:
89
+ messages = [
90
+ {"role": "user", "content": input_content}
91
+ ]
92
+ elif model_choice == "o3-mini":
93
+ messages = [
94
+ {"role": "user", "content": input_content}
95
+ ]
96
+
97
+ try:
98
+ # Call OpenAI API with the selected model
99
+ response = openai.ChatCompletion.create(
100
+ model=model_choice,
101
+ messages=messages,
102
+ max_completion_tokens=2000
103
+ )
104
+
105
+ return response.choices[0].message.content
106
+ except Exception as e:
107
+ return f"Error calling OpenAI API: {str(e)}"
108
+
109
+ # Function to convert an uploaded image to a base64 string
110
+ def get_base64_string_from_image(pil_image):
111
+ # Convert PIL Image to bytes
112
+ buffered = io.BytesIO()
113
+ pil_image.save(buffered, format="PNG")
114
+ img_bytes = buffered.getvalue()
115
+ base64_str = base64.b64encode(img_bytes).decode("utf-8")
116
+ return base64_str
117
+
118
+ # Function to transcribe audio to text using OpenAI Whisper API
119
+ def transcribe_audio(audio, openai_api_key):
120
+ if not openai_api_key:
121
+ return "Error: No API key provided."
122
+
123
+ openai.api_key = openai_api_key
124
+
125
+ try:
126
+ # Open the audio file and pass it as a file object
127
+ with open(audio, 'rb') as audio_file:
128
+ audio_file_content = audio_file.read()
129
+
130
+ # Use the correct transcription API call
131
+ audio_file_obj = io.BytesIO(audio_file_content)
132
+ audio_file_obj.name = 'audio.wav' # Set a name for the file object (as OpenAI expects it)
133
+
134
+ # Transcribe the audio to text using OpenAI's whisper model
135
+ audio_file_transcription = openai.Audio.transcribe(file=audio_file_obj, model="whisper-1")
136
+ return audio_file_transcription.text
137
+ except Exception as e:
138
+ return f"Error transcribing audio: {str(e)}"
139
+
140
+ # The function that will be used by Gradio interface
141
+ def chatbot(input_text, image, audio, pdf_file, openai_api_key, reasoning_effort, model_choice, pdf_content, num_quiz_questions, pdf_quiz_mode, history):
142
+ if history is None:
143
+ history = []
144
+
145
+ # If there's audio, transcribe it to text
146
+ if audio:
147
+ input_text = transcribe_audio(audio, openai_api_key)
148
+
149
+ # If a new PDF is uploaded, extract its text
150
+ new_pdf_content = pdf_content
151
+ if pdf_file is not None:
152
+ new_pdf_content = extract_text_from_pdf(pdf_file)
153
+
154
+ # Check if we're in PDF quiz mode
155
+ if pdf_quiz_mode:
156
+ if new_pdf_content:
157
+ # Generate MCQ quiz questions
158
+ quiz_response = generate_mcq_quiz(new_pdf_content, int(num_quiz_questions), openai_api_key, model_choice)
159
+ history.append((f"👤: [Uploaded PDF for Quiz - {int(num_quiz_questions)} questions]", f"🤖: {quiz_response}"))
160
+ else:
161
+ history.append(("👤: [Attempted to generate quiz without PDF]", "🤖: Please upload a PDF file to generate quiz questions."))
162
+ else:
163
+ # Regular chat mode - generate the response
164
+ response = generate_response(input_text, image, new_pdf_content, openai_api_key, reasoning_effort, model_choice)
165
+
166
+ # Append the response to the history
167
+ if input_text:
168
+ history.append((f"👤: {input_text}", f"🤖: {response}"))
169
+ elif image is not None:
170
+ history.append((f"👤: [Uploaded image]", f"🤖: {response}"))
171
+ elif pdf_file is not None:
172
+ history.append((f"👤: [Uploaded PDF]", f"🤖: {response}"))
173
+ else:
174
+ history.append((f"👤: [No input provided]", f"🤖: Please provide some input (text, image, or PDF) for me to respond to."))
175
+
176
+ return "", None, None, None, new_pdf_content, history
177
+
178
+ # Function to clear the chat history and PDF content
179
+ def clear_history():
180
+ return "", None, None, None, "", []
181
+
182
+ # Function to process a newly uploaded PDF
183
+ def process_pdf(pdf_file):
184
+ if pdf_file is None:
185
+ return ""
186
+ return extract_text_from_pdf(pdf_file)
187
+
188
+ # Function to update visible components based on input type selection
189
+ def update_input_type(choice):
190
+ if choice == "Text":
191
+ return gr.update(visible=True), gr.update(visible=False), gr.update(visible=False), gr.update(visible=False), gr.update(visible=False), gr.update(value=False)
192
+ elif choice == "Image":
193
+ return gr.update(visible=True), gr.update(visible=True), gr.update(visible=False), gr.update(visible=False), gr.update(visible=False), gr.update(value=False)
194
+ elif choice == "Voice":
195
+ return gr.update(visible=False), gr.update(visible=False), gr.update(visible=True), gr.update(visible=False), gr.update(visible=False), gr.update(value=False)
196
+ elif choice == "PDF":
197
+ return gr.update(visible=True), gr.update(visible=False), gr.update(visible=False), gr.update(visible=True), gr.update(visible=False), gr.update(value=False)
198
+ elif choice == "PDF(QUIZ)":
199
+ return gr.update(visible=False), gr.update(visible=False), gr.update(visible=False), gr.update(visible=True), gr.update(visible=True), gr.update(value=True)
200
+
201
+ # Custom CSS styles with animations and button colors
202
+ custom_css = """
203
+ /* General body styles */
204
+ .gradio-container {
205
+ font-family: 'Arial', sans-serif;
206
+ background-color: #f0f4f8; /* Lighter blue-gray background */
207
+ color: #2d3748;;
208
+ }
209
+ /* Header styles */
210
+ .gradio-header {
211
+ background: linear-gradient(135deg, #4a00e0 0%, #8e2de2 100%); /* Purple gradient */
212
+ color: white;
213
+ padding: 20px;
214
+ text-align: center;
215
+ border-radius: 8px;
216
+ box-shadow: 0 4px 15px rgba(0, 0, 0, 0.2);
217
+ animation: fadeIn 1s ease-out;
218
+ }
219
+ .gradio-header h1 {
220
+ font-size: 2.5rem;
221
+ }
222
+ .gradio-header h3 {
223
+ font-size: 1.2rem;
224
+ margin-top: 10px;
225
+ }
226
+ /* Chatbot container styles */
227
+ .gradio-chatbot {
228
+ background-color: #fff;
229
+ border-radius: 10px;
230
+ padding: 20px;
231
+ box-shadow: 0 6px 18px rgba(0, 0, 0, 0.1);
232
+ border-left: 4px solid #4a00e0; /* Accent border */
233
+ }
234
+ /* Input field styles */
235
+ .gradio-textbox, .gradio-dropdown, .gradio-image, .gradio-audio, .gradio-file, .gradio-slider {
236
+ border-radius: 8px;
237
+ border: 2px solid #e2e8f0;
238
+ background-color: #f8fafc;
239
+ }
240
+ .gradio-textbox:focus, .gradio-dropdown:focus, .gradio-image:focus, .gradio-audio:focus, .gradio-file:focus, .gradio-slider:focus {
241
+ border-color: #8e2de2;
242
+ box-shadow: 0 0 0 3px rgba(142, 45, 226, 0.2);
243
+ }
244
+ /* Button styles */
245
+ /* Send Button: Sky Blue */
246
+ #submit-btn {
247
+ background: linear-gradient(135deg, #4a00e0 0%, #8e2de2 100%); /* Purple gradient */
248
+ color: white;
249
+ border: none;
250
+ border-radius: 8px;
251
+ padding: 10px 19px;
252
+ font-size: 1.1rem;
253
+ cursor: pointer;
254
+ transition: all 0.3s ease;
255
+ margin-left: auto;
256
+ margin-right: auto;
257
+ display: block;
258
+ margin-top: 10px;
259
+ }
260
+ #submit-btn:hover {
261
+ background: linear-gradient(135deg, #5b10f1 0%, #9f3ef3 100%); /* Slightly lighter */
262
+ box-shadow: 0 6px 8px rgba(74, 0, 224, 0.4);
263
+ }
264
+ #submit-btn:active {
265
+ transform: scale(0.95);
266
+ }
267
+ #clear-history {
268
+ background: linear-gradient(135deg, #e53e3e 0%, #f56565 100%); /* Red gradient */
269
+ color: white;
270
+ border: none;
271
+ border-radius: 8px;
272
+ padding: 10px 13px;
273
+ font-size: 1.1rem;
274
+ cursor: pointer;
275
+ transition: all 0.3s ease;
276
+ margin-top: 10px;
277
+ }
278
+ #clear-history:hover {
279
+ background: linear-gradient(135deg, #c53030 0%, #e53e3e 100%); /* Slightly darker red gradient on hover */
280
+ box-shadow: 0 6px 8px rgba(229, 62, 62, 0.4);
281
+ }
282
+ #clear-history:active {
283
+ transform: scale(0.95);
284
+ }
285
+ /* Input type selector buttons */
286
+ #input-type-group {
287
+ display: flex;
288
+ justify-content: center;
289
+ gap: 10px;
290
+ margin-bottom: 20px;
291
+ }
292
+ .input-type-btn {
293
+ background-color: #718096; /* Slate gray */
294
+ color: white;
295
+ border: none;
296
+ border-radius: 8px;
297
+ padding: 10px 15px;
298
+ font-size: 1rem;
299
+ cursor: pointer;
300
+ transition: all 0.3s ease;
301
+ }
302
+ .input-type-btn.selected {
303
+ background-color: linear-gradient(135deg, #4a00e0 0%, #8e2de2 100%); /* Purple gradient */
304
+ }
305
+ .input-type-btn:hover {
306
+ background-color: #4a5568; /* Darker slate */
307
+ }
308
+ /* Chat history styles */
309
+ .gradio-chatbot .message {
310
+ margin-bottom: 10px;
311
+ }
312
+ .gradio-chatbot .user {
313
+ background-color: linear-gradient(135deg, #4a00e0 0%, #8e2de2 100%); /* Purple gradient */
314
+ color: white;
315
+ padding: 10px;
316
+ border-radius: 12px;
317
+ max-width: 70%;
318
+ animation: slideInUser 0.5s ease-out;
319
+ }
320
+ .gradio-chatbot .assistant {
321
+ background-color: #f0f4f8; /* Light blue-gray */
322
+ color: #2d3748;
323
+ padding: 10px;
324
+ border-radius: 12px;
325
+ max-width: 70%;
326
+ margin-left: auto;
327
+ animation: slideInAssistant 0.5s ease-out;
328
+ }
329
+ /* Animation keyframes */
330
+ @keyframes fadeIn {
331
+ 0% { opacity: 0; }
332
+ 100% { opacity: 1; }
333
+ }
334
+ @keyframes slideInUser {
335
+ 0% { transform: translateX(-100%); }
336
+ 100% { transform: translateX(0); }
337
+ }
338
+ @keyframes slideInAssistant {
339
+ 0% { transform: translateX(100%); }
340
+ 100% { transform: translateX(0); }
341
+ }
342
+ /* Mobile responsiveness */
343
+ @media (max-width: 768px) {
344
+ .gradio-header h1 {
345
+ font-size: 1.8rem;
346
+ }
347
+ .gradio-header h3 {
348
+ font-size: 1rem;
349
+ }
350
+ .gradio-chatbot {
351
+ max-height: 400px;
352
+ }
353
+ .gradio-textbox, .gradio-dropdown, .gradio-image, .gradio-audio, .gradio-file, .gradio-slider {
354
+ width: 100%;
355
+ }
356
+ #submit-btn, #clear-history {
357
+ width: 100%;
358
+ margin-left: 0;
359
+ }
360
+ }
361
+ """
362
+
363
+ # Gradio interface setup
364
+ def create_interface():
365
+ with gr.Blocks(css=custom_css) as demo:
366
+ gr.Markdown("""
367
+ <div class="gradio-header">
368
+ <h1>Multimodal Chatbot (Text + Image + Voice + PDF + Quiz)</h1>
369
+ <h3>Interact with a chatbot using text, image, voice, or PDF inputs</h3>
370
+ </div>
371
+ """)
372
+
373
+ # Add a description with an expandable accordion
374
+ with gr.Accordion("Click to expand for details", open=False):
375
+ gr.Markdown("""
376
+ ### Description:
377
+ This is a multimodal chatbot that can handle text, image, voice, PDF inputs, and generate quizzes from PDFs.
378
+ - You can ask questions or provide text, and the assistant will respond.
379
+ - You can upload an image, and the assistant will process it and answer questions about the image.
380
+ - Voice input is supported: You can upload or record an audio file, and it will be transcribed to text and sent to the assistant.
381
+ - PDF support: Upload a PDF and ask questions about its content.
382
+ - PDF Quiz: Upload a PDF and specify how many MCQ questions you want generated based on the content.
383
+ - Enter your OpenAI API key to start interacting with the model.
384
+ - You can use the 'Clear History' button to remove the conversation history.
385
+ - "o1" is for image, voice, PDF and text chat and "o3-mini" is for text, PDF and voice chat only.
386
+ ### Reasoning Effort:
387
+ The reasoning effort controls how complex or detailed the assistant's answers should be.
388
+ - **Low**: Provides quick, concise answers with minimal reasoning or details.
389
+ - **Medium**: Offers a balanced response with a reasonable level of detail and thought.
390
+ - **High**: Produces more detailed, analytical, or thoughtful responses, requiring deeper reasoning.
391
+ """)
392
+
393
+ # Store PDF content as a state variable
394
+ pdf_content = gr.State("")
395
+
396
+ with gr.Row():
397
+ openai_api_key = gr.Textbox(label="Enter OpenAI API Key", type="password", placeholder="sk-...", interactive=True)
398
+
399
+ # Input type selector
400
+ with gr.Row():
401
+ input_type = gr.Radio(
402
+ ["Text", "Image", "Voice", "PDF", "PDF(QUIZ)"],
403
+ label="Choose Input Type",
404
+ value="Text"
405
+ )
406
+
407
+ # Create the input components (initially text is visible, others are hidden)
408
+ with gr.Row():
409
+ # Text input
410
+ input_text = gr.Textbox(
411
+ label="Enter Text Question",
412
+ placeholder="Ask a question or provide text",
413
+ lines=2,
414
+ visible=True
415
+ )
416
+
417
+ # Image input
418
+ image_input = gr.Image(
419
+ label="Upload an Image",
420
+ type="pil",
421
+ visible=False
422
+ )
423
+
424
+ # Audio input
425
+ audio_input = gr.Audio(
426
+ label="Upload or Record Audio",
427
+ type="filepath",
428
+ visible=False
429
+ )
430
+
431
+ # PDF input
432
+ pdf_input = gr.File(
433
+ label="Upload your PDF",
434
+ file_types=[".pdf"],
435
+ visible=False
436
+ )
437
+
438
+ # Quiz specific components
439
+ quiz_questions_slider = gr.Slider(
440
+ minimum=1,
441
+ maximum=20,
442
+ value=5,
443
+ step=1,
444
+ label="Number of Quiz Questions",
445
+ visible=False
446
+ )
447
+
448
+ # Hidden state for quiz mode
449
+ quiz_mode = gr.Checkbox(
450
+ label="Quiz Mode",
451
+ visible=False,
452
+ value=False
453
+ )
454
+
455
+ with gr.Row():
456
+ reasoning_effort = gr.Dropdown(
457
+ label="Reasoning Effort",
458
+ choices=["low", "medium", "high"],
459
+ value="medium"
460
+ )
461
+ model_choice = gr.Dropdown(
462
+ label="Select Model",
463
+ choices=["o1", "o3-mini"],
464
+ value="o1" # Default to 'o1' for image-related tasks
465
+ )
466
+ submit_btn = gr.Button("Ask!", elem_id="submit-btn")
467
+ clear_btn = gr.Button("Clear History", elem_id="clear-history")
468
+
469
+ chat_history = gr.Chatbot()
470
+
471
+ # Connect the input type selector to the update function
472
+ input_type.change(
473
+ fn=update_input_type,
474
+ inputs=[input_type],
475
+ outputs=[input_text, image_input, audio_input, pdf_input, quiz_questions_slider, quiz_mode]
476
+ )
477
+
478
+ # Process PDF when uploaded
479
+ pdf_input.change(
480
+ fn=process_pdf,
481
+ inputs=[pdf_input],
482
+ outputs=[pdf_content]
483
+ )
484
+
485
+ # Button interactions
486
+ submit_btn.click(
487
+ fn=chatbot,
488
+ inputs=[
489
+ input_text,
490
+ image_input,
491
+ audio_input,
492
+ pdf_input,
493
+ openai_api_key,
494
+ reasoning_effort,
495
+ model_choice,
496
+ pdf_content,
497
+ quiz_questions_slider,
498
+ quiz_mode,
499
+ chat_history
500
+ ],
501
+ outputs=[
502
+ input_text,
503
+ image_input,
504
+ audio_input,
505
+ pdf_input,
506
+ pdf_content,
507
+ chat_history
508
+ ]
509
+ )
510
+
511
+ clear_btn.click(
512
+ fn=clear_history,
513
+ inputs=[],
514
+ outputs=[input_text, image_input, audio_input, pdf_input, pdf_content, chat_history]
515
+ )
516
+
517
+ return demo
518
+
519
+ # Run the interface
520
+ if __name__ == "__main__":
521
+ demo = create_interface()
522
+ demo.launch()
generate_answer.py ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ from glob import glob
3
+ import openai
4
+ from dotenv import load_dotenv
5
+
6
+ from langchain.embeddings import OpenAIEmbeddings
7
+ from langchain.vectorstores import Chroma
8
+ from langchain.document_loaders import PyPDFLoader
9
+ from langchain.text_splitter import RecursiveCharacterTextSplitter
10
+
11
+ from langchain_community.chat_models import ChatOpenAI
12
+ from langchain.chains import RetrievalQA
13
+ from langchain.memory import ConversationBufferMemory
14
+
15
+ load_dotenv()
16
+ api_key = os.getenv("OPENAI_API_KEY")
17
+ openai.api_key = api_key
18
+
19
+ # Helper function to validate response completeness
20
+ def is_response_complete(response: str) -> bool:
21
+ return response.strip()[-1] in ".!?"
22
+
23
+ # Retry mechanism for incomplete responses
24
+ def retry_response(messages):
25
+ response = openai.ChatCompletion.create(
26
+ model="gpt-4o-mini",
27
+ messages=messages
28
+ ).choices[0].message['content']
29
+ if not is_response_complete(response):
30
+ response += " This is the end of the response. Please let me know if you need further clarification."
31
+ return response
32
+
33
+ def base_model_chatbot(messages):
34
+ system_message = [
35
+ {"role": "system", "content": "You are a helpful AI chatbot that provides clear, complete, and coherent responses to User's questions. Ensure your answers are in full sentences and complete the thought or idea."}
36
+ ]
37
+ messages = system_message + messages
38
+ response = openai.ChatCompletion.create(
39
+ model="gpt-4o-mini",
40
+ messages=messages
41
+ ).choices[0].message['content']
42
+ # Validate response completeness
43
+ if not is_response_complete(response):
44
+ response = retry_response(messages)
45
+ return response
46
+
47
+ class VectorDB:
48
+ """Class to manage document loading and vector database creation."""
49
+
50
+ def __init__(self, docs_directory: str):
51
+ self.docs_directory = docs_directory
52
+
53
+ def create_vector_db(self):
54
+ text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
55
+
56
+ files = glob(os.path.join(self.docs_directory, "*.pdf"))
57
+
58
+ loadPDFs = [PyPDFLoader(pdf_file) for pdf_file in files]
59
+
60
+ pdf_docs = list()
61
+ for loader in loadPDFs:
62
+ pdf_docs.extend(loader.load())
63
+ chunks = text_splitter.split_documents(pdf_docs)
64
+
65
+ return Chroma.from_documents(chunks, OpenAIEmbeddings())
66
+
67
+ class ConversationalRetrievalChain:
68
+ """Class to manage the QA chain setup."""
69
+
70
+ def __init__(self, model_name="gpt-3.5-turbo", temperature=0):
71
+ self.model_name = model_name
72
+ self.temperature = temperature
73
+
74
+ def create_chain(self):
75
+ model = ChatOpenAI(
76
+ model_name=self.model_name,
77
+ temperature=self.temperature,
78
+ system_prompt="You are a knowledgeable AI that answers questions based on provided documents. Always give responses in clear, complete sentences."
79
+ )
80
+ memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
81
+ vector_db = VectorDB('docs/')
82
+ retriever = vector_db.create_vector_db().as_retriever(search_type="similarity", search_kwargs={"k": 2})
83
+ return RetrievalQA.from_chain_type(
84
+ llm=model,
85
+ retriever=retriever,
86
+ memory=memory,
87
+ )
88
+
89
+ def with_pdf_chatbot(messages):
90
+ query = messages[-1]['content'].strip()
91
+ qa_chain = ConversationalRetrievalChain().create_chain()
92
+ result = qa_chain({"query": query})
93
+ if not is_response_complete(result['result']):
94
+ result['result'] += " This is the end of the response. Let me know if you need further clarification."
95
+ return result['result']
helpers.py ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import base64
2
+ import streamlit as st
3
+ import os
4
+ import openai
5
+ from dotenv import load_dotenv
6
+ from gtts import gTTS
7
+
8
+ # Function to accept OpenAI API Key as input from the user
9
+ def get_api_key():
10
+ api_key = st.text_input("Enter your OpenAI API Key", type="password")
11
+ if api_key:
12
+ openai.api_key = api_key
13
+ return api_key
14
+ else:
15
+ return None
16
+
17
+ def speech_to_text(audio_data):
18
+ """Transcribes audio data to text using OpenAI's API."""
19
+ with open(audio_data, "rb") as audio_file:
20
+ transcript = openai.Audio.transcribe(
21
+ model="whisper-1",
22
+ file=audio_file
23
+ )
24
+ return transcript["text"]
25
+
26
+ def text_to_speech(input_text):
27
+ """Generates a TTS audio file from the input text."""
28
+ tts = gTTS(text=input_text, lang="en")
29
+ audio_file_path = "temp_audio_play.mp3"
30
+ tts.save(audio_file_path)
31
+ return audio_file_path
32
+
33
+ def autoplay_audio(file_path: str):
34
+ with open(file_path, "rb") as f:
35
+ data = f.read()
36
+ b64 = base64.b64encode(data).decode("utf-8")
37
+ md = f"""
38
+ <audio autoplay>
39
+ <source src="data:audio/mp3;base64,{b64}" type="audio/mp3">
40
+ </audio>
41
+ """
42
+ st.markdown(md, unsafe_allow_html=True)
requirements.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ openai==0.28
2
+ pillow
3
+ PyMuPDF