Newswire Classifier (AP, UPI, NEA) - BERT Transformers

πŸ“˜ Overview

This repository contains three separately trained BERT models for identifying whether a newspaper article was produced by one of three major newswire services:

  • AP (Associated Press)
  • UPI (United Press International)
  • NEA (Newspaper Enterprise Association)

The models are designed for historical news classification from public-domain newswire articles (1960–1975).

🧠 Model Architecture

  • Base Model: bert-base-uncased
  • Task: Binary classification (1 if from the specific newswire, 0 otherwise)
  • Optimizer: AdamW
  • Loss Function: Binary Cross-Entropy with Logits
  • Batch Size: 16
  • Epochs: 4
  • Learning Rate: 2e-5
  • Device: TPU (v2-8) in Google Colab

πŸ“Š Training Data

  • Source: Historical newspapers (1960–1975, public domain)
  • Articles: 4000 per training round (1000 from target newswire, 3000 from other sources)
  • Features Used: Headline, author, and first 100 characters of the article.
  • Labeling: 1 for articles from the target newswire, 0 for all others.

πŸš€ Model Performance

Model Accuracy Precision Recall F1 Score
AP 0.9925 0.9926 0.9925 0.9925
UPI 0.9999 0.9999 0.9999 0.9999
NEA 0.9875 0.9880 0.9875 0.9876

πŸ› οΈ Usage

Installation

pip install transformers torch

Example Inference (AP Classifier)

from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained("mike-mcrae/newswire_classifier/AP")
tokenizer = AutoTokenizer.from_pretrained("mike-mcrae/newswire_classifier/AP")

text = "(AP) President speaks at conference..."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
outputs = model(**inputs)
prediction = outputs.logits.argmax().item()
print("AP Article" if prediction == 1 else "Not AP Article")

βš™οΈ Recommended Usage Notes

  • The models were trained on a combination of the first 100 characters of headline + author + the first 100 characters of articles, as the mention of the newswire often appears in these sections. Using the same format for inference may improve accuracy.

πŸ“œ Licensing & Data Source

  • Training Data: Historical newspaper articles (1960–1975) from public-domain sources.
  • License: Public domain (for data) and MIT License (for model and code).

πŸ’¬ Citation

If you use these models, please cite:

@misc{newswire_classifier,
  author = {McRae, Michael},
  title = {Newswire Classifier (AP, UPI, NEA) - BERT Transformers},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/username/newswire_classifier}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for mikemcrae25/newswire_classifiers

Finetuned
(3341)
this model