CIS5190abcd
/

svm

yitingliii commited on Dec 13, 2024

Commit

6e11957

verified ·

1 Parent(s): 1d6d48d

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,8 +1,12 @@
 # SVM Model with TF-IDF
 Step by step instruction:
 1. install required packages:
-<br>Before running the code, install some necessary packages.
 ```python
 import nltk
 nltk.download('stopwords')
@@ -16,8 +20,20 @@ import pandas as pd
 from sklearn.svm import SVC
 ```
-2. Data Cleaning
-<br> The next step is to do some data cleaning to ensure the input data's format.
 ```python

 # SVM Model with TF-IDF
 Step by step instruction:
 1. install required packages:
+<br>Before running the code, install required packages.
+```python
+pip install nltk beautifulsoup4 scikit-learn pandas
+```
+<br> Download necessary packages.
 ```python
 import nltk
 nltk.download('stopwords')
 from sklearn.svm import SVC
 ```
+2. File Description
+- config.json: Configuration file for model and dataset parameters.
+- ml.py: Python script containing the machine learning pipeline.
+- model.pkl: Trained SVM model saved as a pickle file.
+- tfidf.pkl: TF-IDF vectorizer saved as a pickle file.
+- README.md: Documentation for the repository.
+3. Data Cleaning
+<br> The clean() function performs data preprocessing to prepare the input data for training. This includes:
+- Removing HTML tags using BeautifulSoup.
+- Removing non-alphanumeric characters and extra spaces.
+- Converting text to lowercase.
+- Removing stopwords using NLTK.
+- Lemmatizing words using WordNetLemmatizer.
 ```python