File size: 1,165 Bytes
5ee59c7
1d6d48d
39757a1
 
1d6d48d
6e11957
 
 
39757a1
1d6d48d
 
 
 
 
 
39757a1
 
 
 
 
1d6d48d
39757a1
 
 
 
1d6d48d
39757a1
 
1d6d48d
 
7044552
39757a1
1d6d48d
b98218f
7044552
1d6d48d
b98218f
 
1d6d48d
 
 
39757a1
1d6d48d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# SVM Model with TF-IDF
Step by step instruction: 
## Installation
<br>Before running the code, ensure you have all the required libraries installed:

```python
pip install nltk beautifulsoup4 scikit-learn pandas
```
<br> Download necessary NTLK resources for preprocessing. 
```python
import nltk
nltk.download('stopwords')
nltk.download('wordnet')

```
# How to Use:
1. Pre-Trained Model and Vectorizer
<br> The repository includes:
- model.pkl : The pre-trained SVM model
- tfidf.pkl: The saved TF-IDF vectorizer used to transform the text data.

2. Testing a new dataset
<br> To test the model with the new dataset, follow these steps:
- Step 1: Prepare the dataset:
<br> Ensure the dataset is in CVS format and has three columns: title, outlet and labels. title column containing the text data to be classified. 

- Step 2: Preprocess the Data
<br>Use the clean() function from data_cleaning.py to preprocess the text data:

```python
from data_cleaning import clean
import pandas as pd

# Load your data
df = pd.read_csv('test_data_random_subset.csv')

# Clean the data
cleaned_df = clean(df)

```

- Step 3: Load the pre-trained model and TF-IDF Vectorizer