SVM Model with TF-IDF
Step by step instruction:
Installation
Before running the code, ensure you have all the required libraries installed:
pip install nltk beautifulsoup4 scikit-learn pandas
Download necessary NTLK resources for preprocessing.
import nltk
nltk.download('stopwords')
nltk.download('wordnet')
How to Use:
- Pre-Trained Model and Vectorizer
The repository includes:
- model.pkl : The pre-trained SVM model
- tfidf.pkl: The saved TF-IDF vectorizer used to transform the text data.
- Testing a new dataset
To test the model with the new dataset, follow these steps:
Step 1: Prepare the dataset:
Ensure the dataset is in CVS format and has three columns: title, outlet and labels. title column containing the text data to be classified.Step 2: Preprocess the Data
Use the clean() function from data_cleaning.py to preprocess the text data:
from data_cleaning import clean
import pandas as pd
# Load your data
df = pd.read_csv('test_data_random_subset.csv')
# Clean the data
cleaned_df = clean(df)
- Step 3: Load the pre-trained model and TF-IDF Vectorizer