svm / README.md
yitingliii's picture
Update README.md
39757a1 verified
|
raw
history blame
1.17 kB

SVM Model with TF-IDF

Step by step instruction:

Installation


Before running the code, ensure you have all the required libraries installed:

pip install nltk beautifulsoup4 scikit-learn pandas


Download necessary NTLK resources for preprocessing.

import nltk
nltk.download('stopwords')
nltk.download('wordnet')

How to Use:

  1. Pre-Trained Model and Vectorizer
    The repository includes:
  • model.pkl : The pre-trained SVM model
  • tfidf.pkl: The saved TF-IDF vectorizer used to transform the text data.
  1. Testing a new dataset
    To test the model with the new dataset, follow these steps:
  • Step 1: Prepare the dataset:
    Ensure the dataset is in CVS format and has three columns: title, outlet and labels. title column containing the text data to be classified.

  • Step 2: Preprocess the Data
    Use the clean() function from data_cleaning.py to preprocess the text data:

from data_cleaning import clean
import pandas as pd

# Load your data
df = pd.read_csv('test_data_random_subset.csv')

# Clean the data
cleaned_df = clean(df)
  • Step 3: Load the pre-trained model and TF-IDF Vectorizer