PLEASE CHECK ED FOR TOKEN

Model Evaluation Guide

This document provides the necessary instructions to evaluate a pre-trained sequence classification model using a test dataset.

Prerequisites

Before running the evaluation pipeline, ensure you have the following installed:

Python 3.7+
Required Python libraries
Install them by running:

pip install transformers datasets evaluate torch

Dataset Information

The test dataset is hosted on the Hugging Face Hub under the namespace CIS5190ml/Dataset. The dataset should have the following structure:

Column: title
Column: label

Example entries:

"Jack Carr's take on the late Tom Clancy..." (label: 0)
"Feeding America CEO asks community to help..." (label: 0)
"Trump's campaign rival decides between..." (label: 0)

Model Information

The model being evaluated is hosted under the Hugging Face Hub namespace CIS5190ml/bert4.

Evaluation Pipeline

The complete evaluation pipeline is provided in the file: Evaluation_Pipeline.ipynb

This Jupyter Notebook walks you through the following steps:

Loading the pre-trained model and tokenizer
Loading and preprocessing the test dataset
Running predictions on the test data
Computing the evaluation metric (e.g., accuracy)

Quick Start

Clone this repository and navigate to the directory:

git clone <repository-url>
cd <repository-directory>

Open the Jupyter Notebook:

jupyter notebook Evaluation_Pipeline.ipynb

Follow the step-by-step instructions in the notebook to evaluate the model.

Code Example

Here is an overview of the evaluation pipeline used in the notebook:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from datasets import load_dataset
import evaluate
import torch
from torch.utils.data import DataLoader

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("CIS5190ml/bert5")
model = AutoModelForSequenceClassification.from_pretrained("CIS5190ml/bert5")

# Load dataset
ds = load_dataset("CIS5190ml/test_20_rows", split="train")

# Preprocessing
def preprocess_function(examples):
    return tokenizer(examples["title"], truncation=True, padding="max_length")

encoded_ds = ds.map(preprocess_function, batched=True)
encoded_ds = encoded_ds.remove_columns([col for col in encoded_ds.column_names if col not in ["input_ids", "attention_mask", "label"]])
encoded_ds.set_format("torch")

# Create DataLoader
test_loader = DataLoader(encoded_ds, batch_size=8)

# Evaluate
accuracy = evaluate.load("accuracy")
model.eval()

for batch in test_loader:
    with torch.no_grad():
        outputs = model(input_ids=batch["input_ids"], attention_mask=batch["attention_mask"])
        preds = torch.argmax(outputs.logits, dim=-1)
        accuracy.add_batch(predictions=preds, references=batch["label"])

final_accuracy = accuracy.compute()
print("Accuracy:", final_accuracy["accuracy"])

Output

After running the pipeline, the evaluation metric (e.g., accuracy) will be displayed in the notebook output. Example:

Accuracy: 0.82

Notes

If your dataset or column names differ, update the relevant sections in the notebook.
To use a different evaluation metric, modify the evaluate.load() function in the notebook.
For any issues or questions, please feel free to reach out.

CIS5190ml
/

bert5