--- license: mit datasets: - mikemcrae25/black_entity_classifier language: - en metrics: - accuracy base_model: - google-bert/bert-base-uncased pipeline_tag: text-classification --- # 🧠 BERT Classifier for Black Article Detection ## 📝 Model Overview This repository hosts a fine-tuned BERT model (`bert-base-uncased`) for classifying sentences as being about Black entities. It is common in social sciences to analyse semantic and topic variation by race. Some racial labels do not require complementary language to disseminate meaning, e.g. "Negro". However, mentions of the word "Black" referencing Black people may be confused with many other meanings. This model identifies if a sentence contains a "Black" entity (i.e. person, group or organisation). The training dataset is also provided for reproducibility. ## 📖 Description - **Model:** Fine-tuned `bert-base-uncased` - **Training Data:** 2,000 manually labeled sentences from historical newspaper articles (1960–1973) - **Inputs:** `sentence` *(string)* - **Outputs:** `black_story` *(0 or 1)* ## 📊 Performance Metrics - **Training Accuracy:** 93.5% - **Validation Accuracy:** 91.2% - **Precision:** 90.8% - **Recall:** 92.1% ## 🚀 Usage Instructions ```python from transformers import pipeline classifier = pipeline("text-classification", model="mikemcrae/black-entity-classifier") result = classifier("Black activists led a peaceful protest downtown.") print(result) ``` ## 💾 Training Dataset - Hugging Face Dataset: [mikemcrae/black-article-training-data](https://huggingface.co/datasets/mikemcrae/black-article-training-data) - CSV Columns: `sentence`, `black_story` - ## 📊 Example Data Preview ```csv sentence,black_story "The Black Panthers organized a march for civil rights.",1 "The mayor discussed the city's budget for next year.",0 "Black students protested against segregation policies.",1 "Black car for sale.",0 ## ⚙️ Reproduction Instructions ```python from datasets import load_dataset from transformers import BertForSequenceClassification, Trainer, TrainingArguments dataset = load_dataset("mikemcrae/black-article-training-data") model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2) ``` ## 📜 License **MIT License**: Free to use with attribution. ``` MIT License © 2025 Mike McRae THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND. ``` ## ❤️ Citation ``` @inproceedings{mcrae2025blackbert, title={BERT Classifier for Black Article Detection}, author={Mike McRae}, year={2025}, url={https://huggingface.co/mikemcrae/black-article-classifier} } ```