VarsaGupta commited on
Commit
e65f5e1
·
1 Parent(s): f4e4932

Upload 11 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ NLP-Based-Chatbot/embeddings_chatbot.csv filter=lfs diff=lfs merge=lfs -text
NLP-Based-Chatbot/.DS_Store ADDED
Binary file (6.15 kB). View file
 
NLP-Based-Chatbot/Dronealexa.csv ADDED
The diff for this file is too large to render. See raw diff
 
NLP-Based-Chatbot/NLP_Based_Chatbot/.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
NLP-Based-Chatbot/NLP_Based_Chatbot/README.md ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - chatbot
5
+ - natural language processing
6
+ license: mit
7
+ ---
8
+
9
+ Model Card: NLP-Based Chatbot
10
+
11
+ Overview
12
+
13
+ The NLP-Based Chatbot is designed to explore Science & Technology topics. It utilizes a combination of semantic search and summarization techniques to provide relevant and concise responses to user queries.
14
+
15
+ Model Details
16
+
17
+ - Model Name: NLP-Based Chatbot
18
+ - Model Type: Natural Language Processing (NLP) Chatbot
19
+ - Framework: Gradio Blocks Interface, spaCy, Transformers
20
+
21
+ Components
22
+
23
+ 1. Semantic Search
24
+
25
+ The chatbot employs semantic search to retrieve relevant information from a preprocessed dataset (Dronealexa.csv). The search is based on a TF-IDF vectorizer and cosine similarity calculations.
26
+
27
+ 2. Summarization
28
+
29
+ A summarization pipeline is used to generate concise summaries of the retrieved information. The Hugging Face Transformers library is utilized for summarization tasks.
30
+
31
+ 3. Custom Embeddings
32
+
33
+ The model incorporates custom text embeddings using spaCy and pre-trained word embeddings. These embeddings enhance the understanding of user queries and contribute to the semantic search.
34
+
35
+ 4. Gradio Blocks Interface
36
+
37
+ The chatbot's frontend is built using Gradio Blocks Interface, providing an interactive and user-friendly platform for users to input queries and receive responses.
38
+
39
+ 5. Model Card Generation
40
+
41
+ The model card generation involves constructing prompts based on search results and utilizing a summarization pipeline to produce model card content.
42
+
43
+ Intended Use
44
+
45
+ The NLP-Based Chatbot is intended for users interested in exploring Science & Technology topics. It can be used to obtain information from the provided dataset, and users are encouraged to provide feedback for continuous improvement.
46
+
47
+ Training Data
48
+
49
+ The model is trained on a custom dataset (Dronealexa.csv) containing Science & Technology-related information. The dataset has been preprocessed to handle missing values and ensure efficient semantic search.
50
+
51
+
52
+ Evaluation Metrics
53
+
54
+ - Semantic Search: TF-IDF Vectorizer, Cosine Similarity
55
+ - Summarization: Hugging Face Transformers Pipeline
56
+
57
+
58
+ Ethical Considerations
59
+
60
+ The chatbot aims to provide accurate and relevant information. However, users are advised to critically evaluate the responses and understand that the model's knowledge is based on the training data.
61
+
62
+
63
+ Usage Instructions
64
+
65
+ 1. Input your query in the provided textbox.
66
+ 2. Click the "Send" button to receive a response.
67
+ 3. Optionally, submit feedback using the "Submit Feedback" button.
68
+
69
+
70
+ License
71
+
72
+ This model is released under the Apache 2.0 License.
73
+
74
+
75
+ Contact Information
76
+
77
+ For inquiries or issues, please contact [email protected].
78
+
NLP-Based-Chatbot/README.md ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ### NLP Powered Chatbot to Explore Science and Technologies
2
+
3
+ Project URL : https://060d8e72c0a548cb38.gradio.live
4
+
5
+ Screenshots :
6
+ <img width="1552" alt="image" src="https://github.com/dhawansolanki/NLP-Chatbot-Hackaphasia/assets/91565429/d3066564-ae96-44c9-8507-64ee7b4f0a14">
7
+
8
+
9
+ <img width="1552" alt="image" src="https://github.com/dhawansolanki/NLP-Chatbot-Hackaphasia/assets/91565429/4fe19f0c-8fe5-4f9f-af71-a3e7cd5f3138">
10
+
11
+ Tech Stack : Python, NLP, Gradio
12
+
13
+ #How to Run
14
+
15
+ 1. Frontend : (Gradio)
16
+ 1. Create Environment :
17
+ python -m venv envname
18
+ 2. Install Dependencies :
19
+ pip install -qU openai==0.28 transformers plotly scikit-learn transformers
20
+ 3. python frontend.py
21
+
22
+ 2. Backend : (Python)
23
+ 3. python backend.py
24
+
25
+
26
+
27
+ Dummy Input Data :
28
+
29
+ User Input (Dummy) :
30
+
31
+ ```
32
+ In the realm of drone technology, a groundbreaking feature has emerged with the integration of an advanced voice command recognition system. This system is meticulously configured to capture and process an audio stream, discerning voice commands from the operator in real-time. What sets this innovation apart is its ability to not only identify the operator's voice within a cacophony of ambient sounds but also to exclusively recognize and act upon the commands issued by the authorized operator. This heightened level of personalization and security is further complemented by the inclusion of a directional camera, which automatically focuses on the operator, providing a continuous and detailed video stream. This video stream not only aids in documentation and monitoring but also plays a pivotal role in disambiguating different voice streams, ensuring that the drone responds with precision in complex and dynamic environments. In essence, this drone system represents a leap forward in human-machine interaction, offering a seamless and efficient mode of control through voice commands.
33
+ Question:
34
+ How does the integration of a directional camera enhance the functionality of the drone's voice command recognition system, and what specific benefits does it bring to the disambiguation of voice streams in dynamic operational environments?
35
+ ```
36
+
37
+
38
+ Feedback Input (Dummy):
39
+ ```
40
+ How does the integration of a directional camera enhance the functionality of the drone's voice command recognition system, and what specific benefits does it bring to the disambiguation of voice streams in dynamic operational environments?
41
+ ```
42
+
43
+
44
+
45
+
46
+
47
+
48
+
NLP-Based-Chatbot/backend.py ADDED
@@ -0,0 +1,291 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import openai
2
+ import pandas as pd
3
+ import numpy as np
4
+ # from openai.embeddings_utils import get_embedding
5
+ from transformers import GPT2TokenizerFast
6
+ from tqdm.auto import tqdm
7
+ import os
8
+
9
+
10
+
11
+ tqdm.pandas()
12
+
13
+ import spacy
14
+ # import numpy as np
15
+
16
+ # Load spaCy model with GloVe embeddings
17
+ import en_core_web_sm
18
+
19
+ nlp = en_core_web_sm.load()
20
+
21
+ def custom_embedding(text, model_name="text-embedding-ada-002"):
22
+ # Process the text with spaCy
23
+ doc = nlp(text)
24
+
25
+ # Extract word embeddings and average them to get the text embedding
26
+ word_embeddings = [token.vector for token in doc if token.has_vector]
27
+
28
+ if not word_embeddings:
29
+ return None # No embeddings found for any word in the text
30
+
31
+ text_embedding = np.mean(word_embeddings, axis=0)
32
+
33
+ # Create a response dictionary
34
+ response = {
35
+ "data": [
36
+ {
37
+ "embedding": text_embedding.tolist(),
38
+ "index": 0,
39
+ "object": "embedding"
40
+ }
41
+ ],
42
+ "model": model_name,
43
+ "object": "list",
44
+ "usage": {
45
+ "prompt_tokens": len(text.split()),
46
+ "total_tokens": len(text.split())
47
+ }
48
+ }
49
+
50
+ return response
51
+
52
+ # Example usage
53
+ text = "Rome"
54
+ response = custom_embedding(text)
55
+
56
+ if response["data"][0]["embedding"] is not None:
57
+ print(f"Custom Embedding for '{text}': {response['data'][0]['embedding']}")
58
+ else:
59
+ print(f"No embeddings found for words in '{text}'.")
60
+
61
+ print(response)
62
+
63
+
64
+ # import spacy
65
+ # import numpy as np
66
+
67
+ # Load spaCy model with GloVe embeddings
68
+ # import en_core_web_sm
69
+
70
+ nlp = en_core_web_sm.load()
71
+
72
+ def custom_embedding(text_list, model_name="text-embedding-ada-002"):
73
+ embeddings = []
74
+
75
+ for text in text_list:
76
+ # Process the text with spaCy
77
+ doc = nlp(text)
78
+
79
+ # Extract word embeddings and average them to get the text embedding
80
+ word_embeddings = [token.vector for token in doc if token.has_vector]
81
+
82
+ if not word_embeddings:
83
+ embeddings.append(None) # No embeddings found for any word in the text
84
+ else:
85
+ text_embedding = np.mean(word_embeddings, axis=0)
86
+ embeddings.append(text_embedding.tolist())
87
+
88
+ # Create a response dictionary
89
+ response = {
90
+ "data": [
91
+ {
92
+ "embedding": emb,
93
+ "index": idx,
94
+ "object": "embedding"
95
+ }
96
+ for idx, emb in enumerate(embeddings)
97
+ ],
98
+ "model": model_name,
99
+ "object": "list",
100
+ "usage": {
101
+ "prompt_tokens": sum(len(text.split()) for text in text_list),
102
+ "total_tokens": sum(len(text.split()) for text in text_list)
103
+ }
104
+ }
105
+
106
+ return response
107
+
108
+ # Example usage
109
+ text = ["She is running", "Fitness is good", "I am hungry", "Basketball is healthy"]
110
+ response = custom_embedding(text)
111
+
112
+ for idx, embedding in enumerate(response["data"]):
113
+ if embedding["embedding"] is not None:
114
+ print(f"Custom Embedding for '{text[idx]}': {embedding['embedding']}")
115
+ else:
116
+ print(f"No embeddings found for words in '{text[idx]}'.")
117
+
118
+ print(response)
119
+
120
+ emb1 = response['data'][0]['embedding']
121
+ emb2 = response['data'][1]['embedding']
122
+ emb3 = response['data'][2]['embedding']
123
+ emb4 = response['data'][3]['embedding']
124
+
125
+ np.dot(emb1, emb2)
126
+ np.dot(emb2, emb4)
127
+
128
+ df = pd.read_csv('Dronealexa.csv')
129
+ df = df.dropna()
130
+ df.info()
131
+ df.head()
132
+ df['combined'] = "Title: " + df['Title'].str.strip() + "; URL: " + df['URL'].str.strip() + "; Publication Year: " + df['Publication Year'].astype(str).str.strip() + "; Abstract: " + df['Abstract'].str.strip()
133
+ df.head()
134
+
135
+ tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")
136
+
137
+ df['n_tokens'] = df.combined.progress_apply(lambda x: len(tokenizer.encode(x)))
138
+ df = df[df.n_tokens < 8000]
139
+ df.info()
140
+ df.head()
141
+
142
+
143
+ # import spacy
144
+ # import numpy as np
145
+
146
+ # Load spaCy model with GloVe embeddings
147
+ # import en_core_web_sm
148
+
149
+ nlp = en_core_web_sm.load()
150
+
151
+ def get_embeddings(text, model):
152
+ # Process the text with spaCy
153
+ doc = model(text)
154
+
155
+ # Extract word embeddings and average them to get the text embedding
156
+ word_embeddings = [token.vector for token in doc if token.has_vector]
157
+
158
+ if not word_embeddings:
159
+ return None # No embeddings found for any word in the text
160
+
161
+ text_embedding = np.mean(word_embeddings, axis=0)
162
+
163
+ # Create a response dictionary
164
+ response = {
165
+ "data": [
166
+ {
167
+ "embedding": text_embedding.tolist(),
168
+ "index": 0,
169
+ "object": "embedding"
170
+ }
171
+ ],
172
+ "model": model.meta["name"],
173
+ "object": "list",
174
+ "usage": {
175
+ "prompt_tokens": len(text.split()),
176
+ "total_tokens": len(doc)
177
+ }
178
+ }
179
+
180
+ return response
181
+
182
+ # Example usage
183
+ input_text = "Your input text goes here"
184
+ custom_model = nlp # You can replace this with any other spaCy model
185
+
186
+ # Renaming 'input_text' to avoid conflict with the built-in 'input' function
187
+ text_to_process = input_text
188
+
189
+ response = get_embeddings(text_to_process, custom_model)
190
+
191
+ if response["data"][0]["embedding"] is not None:
192
+ print(f"Custom Embedding for '{text_to_process}': {response['data'][0]['embedding']}")
193
+ else:
194
+ print(f"No embeddings found for words in '{text_to_process}'.")
195
+
196
+ print(response)
197
+
198
+ from tqdm import tqdm
199
+
200
+ batch_size = 2000
201
+ model_name = 'text-embedding-ada-002'
202
+
203
+ # Assuming df is your DataFrame
204
+ for i in tqdm(range(0, len(df.combined), batch_size)):
205
+ # find end of batch
206
+ i_end = min(i + batch_size, len(df.combined))
207
+
208
+ # Get embeddings for the current batch
209
+ batch_text = list(df.combined)[i:i_end]
210
+
211
+ # Initialize an empty list to store the embeddings for each text in the batch
212
+ batch_embeddings = []
213
+
214
+ # Process each text in the batch and get embeddings
215
+ for text in batch_text:
216
+ response = get_embeddings(text, nlp)
217
+
218
+ # Check if embeddings were found
219
+ if response and response["data"][0]["embedding"] is not None:
220
+ batch_embeddings.append(response["data"][0]["embedding"])
221
+ else:
222
+ # Handle the case where no embeddings are found for a text
223
+ batch_embeddings.append(None)
224
+
225
+ # Update the DataFrame with the embeddings
226
+ for j in range(i, i_end):
227
+ df.loc[j, 'ada_vector'] = str(batch_embeddings[j - i])
228
+
229
+ df.head()
230
+ df.info()
231
+ df['ada_vector'] = df.ada_vector.progress_apply(eval).progress_apply(np.array)
232
+ df.to_csv('embeddings_chatbot.csv',index=False)
233
+ df=pd.read_csv('embeddings_chatbot.csv')
234
+
235
+ user_query = input("Enter query - ")
236
+
237
+ query_response = get_embeddings(user_query, nlp)
238
+
239
+ if query_response["data"][0]["embedding"] is not None:
240
+ print(f"Embedding for '{user_query}': {query_response['data'][0]['embedding']}")
241
+ else:
242
+ print(f"No embeddings found for words in '{user_query}'.")
243
+
244
+ searchvector = get_embeddings(user_query, custom_model)["data"][0]["embedding"]
245
+
246
+
247
+
248
+ from sklearn.metrics.pairwise import cosine_similarity
249
+
250
+ # Assuming df['ada_vector'] contains the vectors you want to compare
251
+
252
+ # Ensure 'ada_vector' column contains valid numeric arrays
253
+ df['ada_vector'] = df['ada_vector'].apply(lambda x: np.array(x) if isinstance(x, (list, np.ndarray)) else x)
254
+
255
+ # Filter out rows where 'ada_vector' is not a valid numeric array
256
+ valid_rows = df['ada_vector'].apply(lambda x: isinstance(x, np.ndarray))
257
+
258
+ # Calculate cosine similarity only for valid rows
259
+ df.loc[valid_rows, 'similarities'] = df.loc[valid_rows, 'ada_vector'].apply(
260
+ lambda x: cosine_similarity([x], [searchvector])[0][0]
261
+ )
262
+
263
+ # If you are using the 'progress_apply' from the 'tqdm' library
264
+ # You can keep it as follows:
265
+ # df.loc[valid_rows, 'similarities'] = df.loc[valid_rows, 'ada_vector'].progress_apply(
266
+ # lambda x: cosine_similarity([x], [searchvector])[0][0]
267
+ # )
268
+
269
+ df.head()
270
+ df.sort_values('similarities', ascending = False)
271
+ result = df.sort_values('similarities', ascending = False).head(3)
272
+
273
+ result.head()
274
+
275
+ xc = list(result.combined)
276
+
277
+ def construct_prompt(query, xc):
278
+ context = ''
279
+ for i in range(3):
280
+ context += xc[i] + "\n"
281
+ header = """Answer the question as truthfully as possible using the provided context, and if the answer is not contained within the text below, say "I don't know."\n\nContext:\n"""
282
+ header += context + "\n\n Q: " + query + "\n A:"
283
+ return header
284
+
285
+
286
+
287
+ from transformers import pipeline
288
+
289
+ summarizer = pipeline("summarization")
290
+ Fresult = construct_prompt(user_query, xc)
291
+ summarizer("\n".join(xc), max_length=130, min_length=30, do_sample=False)
NLP-Based-Chatbot/config.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model": {
3
+ "repository": "https://github.com/VarsaGupta/NLP_Based_Chatbot.git",
4
+ "subfolder": "",
5
+ "files": ["model_card.md", "embeddings_chatbot.csv", "backend.py", "feedback_data.csv", "frontend.py", "README.txt", "Dronealexa.csv", "config.json"]
6
+ },
7
+ "card": {
8
+ "repository": "https://github.com/VarsaGupta/NLP_Based_Chatbot.git",
9
+ "subfolder": "",
10
+ "files": ["model_card.md"]
11
+ }
12
+ }
NLP-Based-Chatbot/embeddings_chatbot.csv ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7387cb50de3d3db315cb8cac661aeceb160883647e279eb8a06e484b58a85255
3
+ size 14166418
NLP-Based-Chatbot/feedback_data.csv ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ hello,aerial drone companion device is autonomously oriented such that an image capture device faces the first user in response to detecting the first voice command . a second voice command is detected while the image captured device faces a first user . the task signal is transmitted by the computer based on the second voice commands .,when i say "hello" chatbot should say "hi"
2
+ hai,aerial drone companion device is autonomously oriented such that an image capture device faces the first user in response to detecting the first voice command . a second voice command is detected while the image captured device faces a first user . the task signal is transmitted by the computer based on the second voice commands .,when i say "hai" you should say "hai"
NLP-Based-Chatbot/frontend.py ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import pandas as pd
2
+ from sklearn.feature_extraction.text import TfidfVectorizer
3
+ from sklearn.metrics.pairwise import cosine_similarity
4
+ import gradio as gr
5
+ from transformers import pipeline
6
+
7
+ # Load the CSV file and preprocess it
8
+ def load_csv_and_preprocess(csv_file):
9
+ df = pd.read_csv(csv_file)
10
+ df = df.dropna().head(100000)
11
+
12
+ column_names = list(df.columns)
13
+ df['combined'] = df.apply(lambda x: "Title: " + '; '.join(x[column_names].astype(str)), axis=1)
14
+ df['combined'] = df['combined'].str.strip()
15
+
16
+ vectorizer = TfidfVectorizer()
17
+ embeddings = vectorizer.fit_transform(df['combined'])
18
+
19
+ return df, vectorizer, embeddings
20
+
21
+ # Initialize the summarization pipeline outside of the chatbot_response function
22
+ summarizer = pipeline("summarization")
23
+
24
+ # Perform semantic search for a given query
25
+ def semantic_search(df, vectorizer, embeddings, query):
26
+ search_vector = vectorizer.transform([query])
27
+ similarities = cosine_similarity(search_vector, embeddings).flatten()
28
+ df['similarities'] = similarities
29
+ result = df.sort_values('similarities', ascending=False).head(3)
30
+
31
+ return result['combined'].tolist()
32
+
33
+ # Define the chatbot response function with summarization
34
+ def chatbot_response(query, history):
35
+ if not query.strip():
36
+ return "", history
37
+ search_results = semantic_search(df, vectorizer, embeddings, query)
38
+ # Summarize the search results
39
+ summary = summarizer("\n".join(search_results), max_length=130, min_length=30, do_sample=False)[0]['summary_text']
40
+ # Format the summarized response and update the chat history
41
+ history = f"{history}User: {query}\nBot: {summary}\n\n"
42
+ return "", history # Clear the input box after each message, update history
43
+
44
+ # Load CSV and preprocess on server startup
45
+ csv_file_path = "Dronealexa.csv" # Update this to your CSV file path
46
+ df, vectorizer, embeddings = load_csv_and_preprocess(csv_file_path)
47
+
48
+ # Define a function to handle feedback
49
+ def handle_feedback(feedback, response, history_box):
50
+ # Simple logic to prepend feedback to the user's query
51
+ # This could be replaced with more sophisticated logic or ML model updating
52
+ response = f"Based on your feedback ('{feedback}'): {response}"
53
+ history = history_box + "\nBot: " + response + "\n"
54
+ return "", history # Update the history with the feedback-aware response
55
+
56
+
57
+ # Gradio Blocks Interface
58
+ with gr.Blocks() as blocks_app:
59
+ gr.Markdown("<h1 style='text-align: center;'>Explore Science & Technology with Chatbot</h1>")
60
+ history_box = gr.Textbox(label="", value="", interactive=False, lines=20)
61
+ with gr.Row():
62
+ query_input = gr.Textbox(show_label=False, placeholder="Type your message here...", lines=1)
63
+ with gr.Row():
64
+ send_button = gr.Button("Send")
65
+
66
+ send_button.click(
67
+ fn=chatbot_response,
68
+ inputs=[query_input, history_box],
69
+ outputs=[query_input, history_box]
70
+ )
71
+ feedback_input = gr.Textbox(show_label=False, placeholder="Type your feedback here...", lines=1)
72
+ feedback_button = gr.Button("Submit Feedback")
73
+
74
+ feedback_button.click(
75
+ fn=handle_feedback,
76
+ inputs=[feedback_input, history_box, history_box],
77
+ outputs=[query_input, history_box]
78
+ )
79
+ blocks_app.launch(share=True)
NLP-Based-Chatbot/model_card.md.txt ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - chatbot
5
+ - natural language processing
6
+ license: Apache 2.0
7
+ datasets:
8
+ - Custom Dataset (Dronealexa)
9
+ ---
10
+
11
+ Model Card: NLP-Based Chatbot
12
+
13
+ Overview
14
+
15
+ The NLP-Based Chatbot is designed to explore Science & Technology topics. It utilizes a combination of semantic search and summarization techniques to provide relevant and concise responses to user queries.
16
+
17
+ Model Details
18
+
19
+ - **Model Name:** NLP-Based Chatbot
20
+ - **Model Type:** Natural Language Processing (NLP) Chatbot
21
+ - **Framework:** Gradio Blocks Interface, spaCy, Transformers
22
+
23
+ Components
24
+
25
+ 1. Semantic Search
26
+
27
+ The chatbot employs semantic search to retrieve relevant information from a preprocessed dataset (Dronealexa.csv). The search is based on a TF-IDF vectorizer and cosine similarity calculations.
28
+
29
+ 2. Summarization
30
+
31
+ A summarization pipeline is used to generate concise summaries of the retrieved information. The Hugging Face Transformers library is utilized for summarization tasks.
32
+
33
+ 3. Custom Embeddings
34
+
35
+ The model incorporates custom text embeddings using spaCy and pre-trained word embeddings. These embeddings enhance the understanding of user queries and contribute to the semantic search.
36
+
37
+ 4. Gradio Blocks Interface
38
+
39
+ The chatbot's frontend is built using Gradio Blocks Interface, providing an interactive and user-friendly platform for users to input queries and receive responses.
40
+
41
+ 5. Model Card Generation
42
+
43
+ The model card generation involves constructing prompts based on search results and utilizing a summarization pipeline to produce model card content.
44
+
45
+ Intended Use
46
+
47
+ The NLP-Based Chatbot is intended for users interested in exploring Science & Technology topics. It can be used to obtain information from the provided dataset, and users are encouraged to provide feedback for continuous improvement.
48
+
49
+ Training Data
50
+
51
+ The model is trained on a custom dataset (Dronealexa.csv) containing Science & Technology-related information. The dataset has been preprocessed to handle missing values and ensure efficient semantic search.
52
+
53
+
54
+ Evaluation Metrics
55
+
56
+ - Semantic Search: TF-IDF Vectorizer, Cosine Similarity
57
+ - Summarization: Hugging Face Transformers Pipeline
58
+
59
+
60
+ Ethical Considerations
61
+
62
+ The chatbot aims to provide accurate and relevant information. However, users are advised to critically evaluate the responses and understand that the model's knowledge is based on the training data.
63
+
64
+
65
+ Usage Instructions
66
+
67
+ 1. Input your query in the provided textbox.
68
+ 2. Click the "Send" button to receive a response.
69
+ 3. Optionally, submit feedback using the "Submit Feedback" button.
70
+
71
+
72
+ License
73
+
74
+ This model is released under the Apache 2.0 License.
75
+
76
+
77
+ Contact Information
78
+
79
+ For inquiries or issues, please contact [email protected].
80
+