File size: 2,337 Bytes
5ee59c7 ada45ca 28bd825 ada45ca 39757a1 1d6d48d 6e11957 23ff33d 6e11957 a2b22c1 1d6d48d ada45ca a2b22c1 48e4179 d4e4e9a ada45ca 1d6d48d 7abb94b 9d8f216 ada45ca 25b4af4 c0804ea ada45ca 48e4179 ada45ca 25b4af4 a6e25cf 25b4af4 ed160f2 1d6d48d ada45ca |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
# SVM Model with TF-IDF
This repository provides a pre-trained Support Vector Machine (SVM) model for text classification using Term Frequency-Inverse Document Frequency (TF-IDF). The repository also includes utilities for data preprocessing and feature extraction:
## Start:
<br>Open your terminal.
<br> Clone the repo by using the following command:
```
git clone https://huggingface.co/CIS5190abcd/svm
```
<br> Go to the svm directory using following command:
```
cd svm
```
<br> Run ```ls``` to check the files inside svm folder. Make sure ```tfidf.py```, ```svm.py``` and ```data_cleaning.py``` are existing in this directory. If not, run the folloing commands:
```
git checkout origin/main -- tfidf.py
git checkout origin/main -- svm.py
git checkout origin/main -- data_cleaning.py
```
<br> Rerun ```ls```, double check all the required files(```tfidf.py```, ```svm.py``` and ```data_cleaning.py```) are existing. Should look like this:
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6755cffd784ff7ea9db10bd4/O9K5zYm7TKiIg9cYZpV1x.png)
<br> keep inside the svm directory until ends.
## Installation
<br>Before running the code, ensure you have all the required libraries installed:
```python
pip install nltk beautifulsoup4 scikit-learn pandas datasets
```
<br> Go to Python.
```
python
```
<br> Download necessary NTLK resources for preprocessing.
```python
import nltk
nltk.download('stopwords')
nltk.download('wordnet')
```
<br> After downloading all the required packages, **do not** exit.
## How to use:
Training a new dataset with existing SVM model, follow the steps below:
- Clean the Dataset
```python
from data_cleaning import clean
import pandas as pd
import nltk
nltk.download('stopwords')
```
<br> You can replace with any datasets you want by changing the file name inside ```pd.read_csv()```.
```python
# Load your data
df = pd.read_csv("hf://datasets/CIS5190abcd/headlines_test/test_cleaned_headlines.csv")
# Clean the data
cleaned_df = clean(df)
```
- Extract TF-IDF Features
```python
from tfidf import tfidf
# Transform the cleaned dataset
X_new_tfidf = tfidf.transform(cleaned_df['title'])
```
- Make Predictions
```python
# Make predictions.
from svm import svm_model
```
```exit()``` if you want to leave python.
```cd ..```if you want to exit svm directory.
|