Spaces:
Runtime error
Runtime error
Jingxiang Mo
commited on
Commit
•
07fbd40
1
Parent(s):
9f9e047
Update README.md
Browse files
README.md
CHANGED
@@ -5,10 +5,10 @@ https://rajpurkar.github.io/SQuAD-explorer/
|
|
5 |
|
6 |
We will use the Stanford Question Answering Dataset (SQuAD) for our machine learning project because it is a large-scale, diverse dataset containing over 100,000 questions and answers. It has been widely used and evaluated by the research community and is well-suited for training and evaluating models for question answering and machine reading comprehension tasks.
|
7 |
|
8 |
-
Project Goal
|
9 |
Question Answering Model: Building a supervised learning logistic regression model that can answer questions based on the information contained within SQuAD. The model could be trained on the questions and answers in the dataset, and then be used to answer new questions.
|
10 |
|
11 |
-
Methodology
|
12 |
Data Preprocessing
|
13 |
The data will undergo several preprocessing steps to ensure that it is suitable for the question-answering model. These steps include data cleaning, data transformation, and data encoding.
|
14 |
|
@@ -24,6 +24,6 @@ We have also considered other classification models, including KNN, Naive Bayes,
|
|
24 |
Evaluation Metric
|
25 |
When it comes to the Evaluation Metric we intend to use, since we’re using a classification model, it only makes sense to use a Confusion Matrix. However, we still have to learn how the BLEU score with brevity penalty could help us as it deals with text generation problems which is also what we work on.
|
26 |
|
27 |
-
Application
|
28 |
We hope to build a web application and provide a user-friendly interface that allows users to input their questions either through voice or text. This will allow for greater accessibility and convenience for users with different preferences.
|
29 |
The model will then provide its answer via text, which will then be voiced by an API. This will ensure that the user can receive the answer in their preferred format, whether they prefer to hear the answer or read it. The dual output format will also ensure that the bot's answer can be easily shared or recorded, making it more accessible for others to use.
|
|
|
5 |
|
6 |
We will use the Stanford Question Answering Dataset (SQuAD) for our machine learning project because it is a large-scale, diverse dataset containing over 100,000 questions and answers. It has been widely used and evaluated by the research community and is well-suited for training and evaluating models for question answering and machine reading comprehension tasks.
|
7 |
|
8 |
+
### Project Goal
|
9 |
Question Answering Model: Building a supervised learning logistic regression model that can answer questions based on the information contained within SQuAD. The model could be trained on the questions and answers in the dataset, and then be used to answer new questions.
|
10 |
|
11 |
+
### Methodology
|
12 |
Data Preprocessing
|
13 |
The data will undergo several preprocessing steps to ensure that it is suitable for the question-answering model. These steps include data cleaning, data transformation, and data encoding.
|
14 |
|
|
|
24 |
Evaluation Metric
|
25 |
When it comes to the Evaluation Metric we intend to use, since we’re using a classification model, it only makes sense to use a Confusion Matrix. However, we still have to learn how the BLEU score with brevity penalty could help us as it deals with text generation problems which is also what we work on.
|
26 |
|
27 |
+
### Application
|
28 |
We hope to build a web application and provide a user-friendly interface that allows users to input their questions either through voice or text. This will allow for greater accessibility and convenience for users with different preferences.
|
29 |
The model will then provide its answer via text, which will then be voiced by an API. This will ensure that the user can receive the answer in their preferred format, whether they prefer to hear the answer or read it. The dual output format will also ensure that the bot's answer can be easily shared or recorded, making it more accessible for others to use.
|