File size: 1,165 Bytes
5ee59c7 1d6d48d 39757a1 1d6d48d 6e11957 39757a1 1d6d48d 39757a1 1d6d48d 39757a1 1d6d48d 39757a1 1d6d48d 7044552 39757a1 1d6d48d b98218f 7044552 1d6d48d b98218f 1d6d48d 39757a1 1d6d48d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
# SVM Model with TF-IDF
Step by step instruction:
## Installation
<br>Before running the code, ensure you have all the required libraries installed:
```python
pip install nltk beautifulsoup4 scikit-learn pandas
```
<br> Download necessary NTLK resources for preprocessing.
```python
import nltk
nltk.download('stopwords')
nltk.download('wordnet')
```
# How to Use:
1. Pre-Trained Model and Vectorizer
<br> The repository includes:
- model.pkl : The pre-trained SVM model
- tfidf.pkl: The saved TF-IDF vectorizer used to transform the text data.
2. Testing a new dataset
<br> To test the model with the new dataset, follow these steps:
- Step 1: Prepare the dataset:
<br> Ensure the dataset is in CVS format and has three columns: title, outlet and labels. title column containing the text data to be classified.
- Step 2: Preprocess the Data
<br>Use the clean() function from data_cleaning.py to preprocess the text data:
```python
from data_cleaning import clean
import pandas as pd
# Load your data
df = pd.read_csv('test_data_random_subset.csv')
# Clean the data
cleaned_df = clean(df)
```
- Step 3: Load the pre-trained model and TF-IDF Vectorizer
|