Delete README.md
Browse files
README.md
DELETED
@@ -1,79 +0,0 @@
|
|
1 |
-
---
|
2 |
-
# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
|
3 |
-
# Doc / guide: https://huggingface.co/docs/hub/model-cards
|
4 |
-
{}
|
5 |
-
---
|
6 |
-
|
7 |
-
# Model Card for Model ID
|
8 |
-
|
9 |
-
<!-- Provide a quick summary of what the model is/does. -->
|
10 |
-
|
11 |
-
This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).
|
12 |
-
|
13 |
-
## Model Details
|
14 |
-
This model classifies news headlines as either NBC or Fox News.
|
15 |
-
|
16 |
-
### Model Description
|
17 |
-
|
18 |
-
<!-- Provide a longer summary of what this model is. -->
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
- **Developed by:** Jack Bader, Kaiyuan Wang, Pairan Xu
|
23 |
-
- **Taks:** Binary classification (NBC News vs. Fox News)
|
24 |
-
- **Preprocessing:** TF-IDF vectorization applied to the text data
|
25 |
-
- stop_words = "english"
|
26 |
-
- max_features = 1000
|
27 |
-
- **Model type:** Random Forest
|
28 |
-
- **Freamwork:** Scikit-learn
|
29 |
-
-
|
30 |
-
#### Metrics
|
31 |
-
|
32 |
-
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
33 |
-
|
34 |
-
- Accuracy Score
|
35 |
-
|
36 |
-
### Model Description
|
37 |
-
import pandas as pd
|
38 |
-
import joblib
|
39 |
-
from huggingface_hub import hf_hub_download
|
40 |
-
from sklearn.feature_extraction.text import TfidfVectorizer
|
41 |
-
from sklearn.metrics import classification_report
|
42 |
-
|
43 |
-
# Mount to drive
|
44 |
-
from google.colab import drive
|
45 |
-
drive.mount('/content/drive')
|
46 |
-
|
47 |
-
# Load test set
|
48 |
-
test_df = pd.read_csv("/content/drive/MyDrive/test_data_random_subset.csv")
|
49 |
-
|
50 |
-
# Log in w/ huggingface token
|
51 |
-
# token: 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
|
52 |
-
!huggingface-cli login
|
53 |
-
|
54 |
-
# Download the model
|
55 |
-
model = hf_hub_download(repo_id = "CIS5190FinalProj/GBTrees", filename = "gb_trees_model.pkl")
|
56 |
-
|
57 |
-
# Download the vectorizer
|
58 |
-
tfidf_vectorizer = hf_hub_download(repo_id = "CIS5190FinalProj/GBTrees", filename = "tfidf_vectorizer.pkl")
|
59 |
-
|
60 |
-
# Load the model
|
61 |
-
pipeline = joblib.load(model)
|
62 |
-
|
63 |
-
# Load the vectorizer
|
64 |
-
tfidf_vectorizer = joblib.load(tfidf_vectorizer)
|
65 |
-
|
66 |
-
# Extract the headlines from the test set
|
67 |
-
X_test = test_df['title']
|
68 |
-
|
69 |
-
# Apply transformation to the headlines into numerical features
|
70 |
-
X_test_transformed = tfidf_vectorizer.transform(X_test)
|
71 |
-
|
72 |
-
# Make predictions using the pipeline
|
73 |
-
y_pred = pipeline.predict(X_test_transformed)
|
74 |
-
|
75 |
-
# Extract 'labels' as target
|
76 |
-
y_test = test_df['labels']
|
77 |
-
|
78 |
-
# Print classification report
|
79 |
-
print(classification_report(y_test, y_pred))
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|