yitingliii
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -20,14 +20,7 @@ import pandas as pd
|
|
20 |
from sklearn.svm import SVC
|
21 |
```
|
22 |
|
23 |
-
2.
|
24 |
-
- config.json: Configuration file for model and dataset parameters.
|
25 |
-
- ml.py: Python script containing the machine learning pipeline.
|
26 |
-
- model.pkl: Trained SVM model saved as a pickle file.
|
27 |
-
- tfidf.pkl: TF-IDF vectorizer saved as a pickle file.
|
28 |
-
- README.md: Documentation for the repository.
|
29 |
-
|
30 |
-
3. Data Cleaning
|
31 |
<br> The clean() function performs data preprocessing to prepare the input data for training. This includes:
|
32 |
- Removing HTML tags using BeautifulSoup.
|
33 |
- Removing non-alphanumeric characters and extra spaces.
|
@@ -37,10 +30,10 @@ from sklearn.svm import SVC
|
|
37 |
|
38 |
|
39 |
```python
|
40 |
-
from
|
41 |
|
42 |
# Load your data
|
43 |
-
df = pd.read_csv('
|
44 |
|
45 |
# Clean the data
|
46 |
cleaned_df = clean(df)
|
|
|
20 |
from sklearn.svm import SVC
|
21 |
```
|
22 |
|
23 |
+
2. Data Cleaning
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
24 |
<br> The clean() function performs data preprocessing to prepare the input data for training. This includes:
|
25 |
- Removing HTML tags using BeautifulSoup.
|
26 |
- Removing non-alphanumeric characters and extra spaces.
|
|
|
30 |
|
31 |
|
32 |
```python
|
33 |
+
from data_cleaning import clean
|
34 |
|
35 |
# Load your data
|
36 |
+
df = pd.read_csv('test_data_random_subset.csv')
|
37 |
|
38 |
# Clean the data
|
39 |
cleaned_df = clean(df)
|