File size: 3,515 Bytes
60daa2c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
dde0ec7
60daa2c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
dde0ec7
 
60daa2c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
---
language:
- en
metrics:
- accuracy
base_model:
- google-bert/bert-base-uncased
pipeline_tag: text-classification
library_name: transformers
---

# PLEASE CHECK ED FOR TOKEN






# Model Evaluation Guide

This document provides the necessary instructions to evaluate a pre-trained sequence classification model using a test dataset.

## Prerequisites

Before running the evaluation pipeline, ensure you have the following installed:

- Python 3.7+
- Required Python libraries  
  Install them by running:

```bash
pip install transformers datasets evaluate torch
```

## Dataset Information

The test dataset is hosted on the Hugging Face Hub under the namespace `CIS5190ml/Dataset`. The dataset should have the following structure:
- Column: `title`
- Column: `label`

Example entries:
- "Jack Carr's take on the late Tom Clancy..." (label: 0)
- "Feeding America CEO asks community to help..." (label: 0)
- "Trump's campaign rival decides between..." (label: 0)

## Model Information

The model being evaluated is hosted under the Hugging Face Hub namespace `CIS5190ml/bert4`.

## Evaluation Pipeline

The complete evaluation pipeline is provided in the file:
**Evaluation_Pipeline.ipynb**

This Jupyter Notebook walks you through the following steps:
1. Loading the pre-trained model and tokenizer
2. Loading and preprocessing the test dataset
3. Running predictions on the test data
4. Computing the evaluation metric (e.g., accuracy)

## Quick Start

Clone this repository and navigate to the directory:

```bash
git clone <repository-url>
cd <repository-directory>
```

Open the Jupyter Notebook:

```bash
jupyter notebook Evaluation_Pipeline.ipynb
```

Follow the step-by-step instructions in the notebook to evaluate the model.

## Code Example

Here is an overview of the evaluation pipeline used in the notebook:

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from datasets import load_dataset
import evaluate
import torch
from torch.utils.data import DataLoader

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("CIS5190ml/bert4")
model = AutoModelForSequenceClassification.from_pretrained("CIS5190ml/bert4")

# Load dataset
ds = load_dataset("CIS5190ml/test_20_rows", split="train")

# Preprocessing
def preprocess_function(examples):
    return tokenizer(examples["title"], truncation=True, padding="max_length")

encoded_ds = ds.map(preprocess_function, batched=True)
encoded_ds = encoded_ds.remove_columns([col for col in encoded_ds.column_names if col not in ["input_ids", "attention_mask", "label"]])
encoded_ds.set_format("torch")

# Create DataLoader
test_loader = DataLoader(encoded_ds, batch_size=8)

# Evaluate
accuracy = evaluate.load("accuracy")
model.eval()

for batch in test_loader:
    with torch.no_grad():
        outputs = model(input_ids=batch["input_ids"], attention_mask=batch["attention_mask"])
        preds = torch.argmax(outputs.logits, dim=-1)
        accuracy.add_batch(predictions=preds, references=batch["label"])

final_accuracy = accuracy.compute()
print("Accuracy:", final_accuracy["accuracy"])
```

## Output

After running the pipeline, the evaluation metric (e.g., accuracy) will be displayed in the notebook output. Example:

```
Accuracy: 0.85
```

## Notes

* If your dataset or column names differ, update the relevant sections in the notebook.
* To use a different evaluation metric, modify the `evaluate.load()` function in the notebook.
* For any issues or questions, please feel free to reach out.