PLEASE CHECK ED FOR TOKEN
Model Evaluation Guide
This document provides the necessary instructions to evaluate a pre-trained sequence classification model using a test dataset.
Prerequisites
Before running the evaluation pipeline, ensure you have the following installed:
- Python 3.7+
- Required Python libraries
Install them by running:
pip install transformers datasets evaluate torch
Dataset Information
The test dataset is hosted on the Hugging Face Hub under the namespace CIS5190ml/Dataset
. The dataset should have the following structure:
- Column:
title
- Column:
label
Example entries:
- "Jack Carr's take on the late Tom Clancy..." (label: 0)
- "Feeding America CEO asks community to help..." (label: 0)
- "Trump's campaign rival decides between..." (label: 0)
Model Information
The model being evaluated is hosted under the Hugging Face Hub namespace CIS5190ml/bert4
.
Evaluation Pipeline
The complete evaluation pipeline is provided in the file: Evaluation_Pipeline.ipynb
This Jupyter Notebook walks you through the following steps:
- Loading the pre-trained model and tokenizer
- Loading and preprocessing the test dataset
- Running predictions on the test data
- Computing the evaluation metric (e.g., accuracy)
Quick Start
Clone this repository and navigate to the directory:
git clone <repository-url>
cd <repository-directory>
Open the Jupyter Notebook:
jupyter notebook Evaluation_Pipeline.ipynb
Follow the step-by-step instructions in the notebook to evaluate the model.
Code Example
Here is an overview of the evaluation pipeline used in the notebook:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from datasets import load_dataset
import evaluate
import torch
from torch.utils.data import DataLoader
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("CIS5190ml/bert5")
model = AutoModelForSequenceClassification.from_pretrained("CIS5190ml/bert5")
# Load dataset
ds = load_dataset("CIS5190ml/test_20_rows", split="train")
# Preprocessing
def preprocess_function(examples):
return tokenizer(examples["title"], truncation=True, padding="max_length")
encoded_ds = ds.map(preprocess_function, batched=True)
encoded_ds = encoded_ds.remove_columns([col for col in encoded_ds.column_names if col not in ["input_ids", "attention_mask", "label"]])
encoded_ds.set_format("torch")
# Create DataLoader
test_loader = DataLoader(encoded_ds, batch_size=8)
# Evaluate
accuracy = evaluate.load("accuracy")
model.eval()
for batch in test_loader:
with torch.no_grad():
outputs = model(input_ids=batch["input_ids"], attention_mask=batch["attention_mask"])
preds = torch.argmax(outputs.logits, dim=-1)
accuracy.add_batch(predictions=preds, references=batch["label"])
final_accuracy = accuracy.compute()
print("Accuracy:", final_accuracy["accuracy"])
Output
After running the pipeline, the evaluation metric (e.g., accuracy) will be displayed in the notebook output. Example:
Accuracy: 0.82
Notes
- If your dataset or column names differ, update the relevant sections in the notebook.
- To use a different evaluation metric, modify the
evaluate.load()
function in the notebook. - For any issues or questions, please feel free to reach out.
- Downloads last month
- 12
Model tree for CIS5190ml/bert5
Base model
google-bert/bert-base-uncased