--- language: - en metrics: - accuracy base_model: - google-bert/bert-base-uncased pipeline_tag: text-classification library_name: transformers --- # PLEASE CHECK ED FOR TOKEN # Model Evaluation Guide This document provides the necessary instructions to evaluate a pre-trained sequence classification model using a test dataset. ## Prerequisites Before running the evaluation pipeline, ensure you have the following installed: - Python 3.7+ - Required Python libraries Install them by running: ```bash pip install transformers datasets evaluate torch ``` ## Dataset Information The test dataset is hosted on the Hugging Face Hub under the namespace `CIS5190ml/Dataset`. The dataset should have the following structure: - Column: `title` - Column: `label` Example entries: - "Jack Carr's take on the late Tom Clancy..." (label: 0) - "Feeding America CEO asks community to help..." (label: 0) - "Trump's campaign rival decides between..." (label: 0) ## Model Information The model being evaluated is hosted under the Hugging Face Hub namespace `CIS5190ml/bert4`. ## Evaluation Pipeline The complete evaluation pipeline is provided in the file: **Evaluation_Pipeline.ipynb** This Jupyter Notebook walks you through the following steps: 1. Loading the pre-trained model and tokenizer 2. Loading and preprocessing the test dataset 3. Running predictions on the test data 4. Computing the evaluation metric (e.g., accuracy) ## Quick Start Clone this repository and navigate to the directory: ```bash git clone cd ``` Open the Jupyter Notebook: ```bash jupyter notebook Evaluation_Pipeline.ipynb ``` Follow the step-by-step instructions in the notebook to evaluate the model. ## Code Example Here is an overview of the evaluation pipeline used in the notebook: ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification from datasets import load_dataset import evaluate import torch from torch.utils.data import DataLoader # Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained("CIS5190ml/bert4") model = AutoModelForSequenceClassification.from_pretrained("CIS5190ml/bert4") # Load dataset ds = load_dataset("CIS5190ml/test_20_rows", split="train") # Preprocessing def preprocess_function(examples): return tokenizer(examples["title"], truncation=True, padding="max_length") encoded_ds = ds.map(preprocess_function, batched=True) encoded_ds = encoded_ds.remove_columns([col for col in encoded_ds.column_names if col not in ["input_ids", "attention_mask", "label"]]) encoded_ds.set_format("torch") # Create DataLoader test_loader = DataLoader(encoded_ds, batch_size=8) # Evaluate accuracy = evaluate.load("accuracy") model.eval() for batch in test_loader: with torch.no_grad(): outputs = model(input_ids=batch["input_ids"], attention_mask=batch["attention_mask"]) preds = torch.argmax(outputs.logits, dim=-1) accuracy.add_batch(predictions=preds, references=batch["label"]) final_accuracy = accuracy.compute() print("Accuracy:", final_accuracy["accuracy"]) ``` ## Output After running the pipeline, the evaluation metric (e.g., accuracy) will be displayed in the notebook output. Example: ``` Accuracy: 0.85 ``` ## Notes * If your dataset or column names differ, update the relevant sections in the notebook. * To use a different evaluation metric, modify the `evaluate.load()` function in the notebook. * For any issues or questions, please feel free to reach out.