yitingliii commited on
Commit
6e11957
·
verified ·
1 Parent(s): 1d6d48d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -3
README.md CHANGED
@@ -1,8 +1,12 @@
1
  # SVM Model with TF-IDF
2
  Step by step instruction:
3
  1. install required packages:
4
- <br>Before running the code, install some necessary packages.
5
 
 
 
 
 
6
  ```python
7
  import nltk
8
  nltk.download('stopwords')
@@ -16,8 +20,20 @@ import pandas as pd
16
  from sklearn.svm import SVC
17
  ```
18
 
19
- 2. Data Cleaning
20
- <br> The next step is to do some data cleaning to ensure the input data's format.
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
 
23
  ```python
 
1
  # SVM Model with TF-IDF
2
  Step by step instruction:
3
  1. install required packages:
4
+ <br>Before running the code, install required packages.
5
 
6
+ ```python
7
+ pip install nltk beautifulsoup4 scikit-learn pandas
8
+ ```
9
+ <br> Download necessary packages.
10
  ```python
11
  import nltk
12
  nltk.download('stopwords')
 
20
  from sklearn.svm import SVC
21
  ```
22
 
23
+ 2. File Description
24
+ - config.json: Configuration file for model and dataset parameters.
25
+ - ml.py: Python script containing the machine learning pipeline.
26
+ - model.pkl: Trained SVM model saved as a pickle file.
27
+ - tfidf.pkl: TF-IDF vectorizer saved as a pickle file.
28
+ - README.md: Documentation for the repository.
29
+
30
+ 3. Data Cleaning
31
+ <br> The clean() function performs data preprocessing to prepare the input data for training. This includes:
32
+ - Removing HTML tags using BeautifulSoup.
33
+ - Removing non-alphanumeric characters and extra spaces.
34
+ - Converting text to lowercase.
35
+ - Removing stopwords using NLTK.
36
+ - Lemmatizing words using WordNetLemmatizer.
37
 
38
 
39
  ```python