|
--- |
|
language: |
|
- en |
|
metrics: |
|
- accuracy |
|
base_model: |
|
- google-bert/bert-base-uncased |
|
pipeline_tag: text-classification |
|
library_name: transformers |
|
--- |
|
|
|
# PLEASE CHECK ED FOR TOKEN |
|
|
|
|
|
|
|
|
|
|
|
|
|
# Model Evaluation Guide |
|
|
|
This document provides the necessary instructions to evaluate a pre-trained sequence classification model using a test dataset. |
|
|
|
## Prerequisites |
|
|
|
Before running the evaluation pipeline, ensure you have the following installed: |
|
|
|
- Python 3.7+ |
|
- Required Python libraries |
|
Install them by running: |
|
|
|
```bash |
|
pip install transformers datasets evaluate torch |
|
``` |
|
|
|
## Dataset Information |
|
|
|
The test dataset is hosted on the Hugging Face Hub under the namespace `CIS5190ml/Dataset`. The dataset should have the following structure: |
|
- Column: `title` |
|
- Column: `label` |
|
|
|
Example entries: |
|
- "Jack Carr's take on the late Tom Clancy..." (label: 0) |
|
- "Feeding America CEO asks community to help..." (label: 0) |
|
- "Trump's campaign rival decides between..." (label: 0) |
|
|
|
## Model Information |
|
|
|
The model being evaluated is hosted under the Hugging Face Hub namespace `CIS5190ml/bert4`. |
|
|
|
## Evaluation Pipeline |
|
|
|
The complete evaluation pipeline is provided in the file: |
|
**Evaluation_Pipeline.ipynb** |
|
|
|
This Jupyter Notebook walks you through the following steps: |
|
1. Loading the pre-trained model and tokenizer |
|
2. Loading and preprocessing the test dataset |
|
3. Running predictions on the test data |
|
4. Computing the evaluation metric (e.g., accuracy) |
|
|
|
## Quick Start |
|
|
|
Clone this repository and navigate to the directory: |
|
|
|
```bash |
|
git clone <repository-url> |
|
cd <repository-directory> |
|
``` |
|
|
|
Open the Jupyter Notebook: |
|
|
|
```bash |
|
jupyter notebook Evaluation_Pipeline.ipynb |
|
``` |
|
|
|
Follow the step-by-step instructions in the notebook to evaluate the model. |
|
|
|
## Code Example |
|
|
|
Here is an overview of the evaluation pipeline used in the notebook: |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
from datasets import load_dataset |
|
import evaluate |
|
import torch |
|
from torch.utils.data import DataLoader |
|
|
|
# Load model and tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained("CIS5190ml/bert4") |
|
model = AutoModelForSequenceClassification.from_pretrained("CIS5190ml/bert4") |
|
|
|
# Load dataset |
|
ds = load_dataset("CIS5190ml/test_20_rows", split="train") |
|
|
|
# Preprocessing |
|
def preprocess_function(examples): |
|
return tokenizer(examples["title"], truncation=True, padding="max_length") |
|
|
|
encoded_ds = ds.map(preprocess_function, batched=True) |
|
encoded_ds = encoded_ds.remove_columns([col for col in encoded_ds.column_names if col not in ["input_ids", "attention_mask", "label"]]) |
|
encoded_ds.set_format("torch") |
|
|
|
# Create DataLoader |
|
test_loader = DataLoader(encoded_ds, batch_size=8) |
|
|
|
# Evaluate |
|
accuracy = evaluate.load("accuracy") |
|
model.eval() |
|
|
|
for batch in test_loader: |
|
with torch.no_grad(): |
|
outputs = model(input_ids=batch["input_ids"], attention_mask=batch["attention_mask"]) |
|
preds = torch.argmax(outputs.logits, dim=-1) |
|
accuracy.add_batch(predictions=preds, references=batch["label"]) |
|
|
|
final_accuracy = accuracy.compute() |
|
print("Accuracy:", final_accuracy["accuracy"]) |
|
``` |
|
|
|
## Output |
|
|
|
After running the pipeline, the evaluation metric (e.g., accuracy) will be displayed in the notebook output. Example: |
|
|
|
``` |
|
Accuracy: 0.85 |
|
``` |
|
|
|
## Notes |
|
|
|
* If your dataset or column names differ, update the relevant sections in the notebook. |
|
* To use a different evaluation metric, modify the `evaluate.load()` function in the notebook. |
|
* For any issues or questions, please feel free to reach out. |