CIS5190abcd
/

svm

svm / README.md

Update README.md

39757a1 verified about 2 months ago

1.17 kB

	# SVM Model with TF-IDF
	Step by step instruction:
	## Installation
	<br>Before running the code, ensure you have all the required libraries installed:

	```python
	pip install nltk beautifulsoup4 scikit-learn pandas
	```
	<br> Download necessary NTLK resources for preprocessing.
	```python
	import nltk
	nltk.download('stopwords')
	nltk.download('wordnet')

	```
	# How to Use:
	1. Pre-Trained Model and Vectorizer
	<br> The repository includes:
	- model.pkl : The pre-trained SVM model
	- tfidf.pkl: The saved TF-IDF vectorizer used to transform the text data.

	2. Testing a new dataset
	<br> To test the model with the new dataset, follow these steps:
	- Step 1: Prepare the dataset:
	<br> Ensure the dataset is in CVS format and has three columns: title, outlet and labels. title column containing the text data to be classified.

	- Step 2: Preprocess the Data
	<br>Use the clean() function from data_cleaning.py to preprocess the text data:

	```python
	from data_cleaning import clean
	import pandas as pd

	# Load your data
	df = pd.read_csv('test_data_random_subset.csv')

	# Clean the data
	cleaned_df = clean(df)

	```

	- Step 3: Load the pre-trained model and TF-IDF Vectorizer