|
# SVM Model with TF-IDF |
|
Step by step instruction: |
|
## Installation |
|
<br>Before running the code, ensure you have all the required libraries installed: |
|
|
|
```python |
|
pip install nltk beautifulsoup4 scikit-learn pandas |
|
``` |
|
<br> Download necessary NTLK resources for preprocessing. |
|
```python |
|
import nltk |
|
nltk.download('stopwords') |
|
nltk.download('wordnet') |
|
|
|
``` |
|
# How to Use: |
|
1. Pre-Trained Model and Vectorizer |
|
<br> The repository includes: |
|
- model.pkl : The pre-trained SVM model |
|
- tfidf.pkl: The saved TF-IDF vectorizer used to transform the text data. |
|
|
|
2. Testing a new dataset |
|
<br> To test the model with the new dataset, follow these steps: |
|
- Step 1: Prepare the dataset: |
|
<br> Ensure the dataset is in CVS format and has three columns: title, outlet and labels. title column containing the text data to be classified. |
|
|
|
- Step 2: Preprocess the Data |
|
<br>Use the clean() function from data_cleaning.py to preprocess the text data: |
|
|
|
```python |
|
from data_cleaning import clean |
|
import pandas as pd |
|
|
|
# Load your data |
|
df = pd.read_csv('test_data_random_subset.csv') |
|
|
|
# Clean the data |
|
cleaned_df = clean(df) |
|
|
|
``` |
|
|
|
- Step 3: Load the pre-trained model and TF-IDF Vectorizer |
|
|
|
|