# 🧠 NER-BERT-AI-Model-using-annotated-corpus-ner A BERT-based Named Entity Recognition (NER) model fine-tuned on the Entity Annotated Corpus. It classifies tokens in text into predefined entity types such as Person (PER), Organization (ORG), and Location (LOC). This model is well-suited for information extraction, resume parsing, and chatbot applications. --- ## ✨ Model Highlights - 📌 Based on `bert-base-cased` (by Google) - 🔍 Fine-tuned on the Entity Annotated Corpus (`ner_dataset.csv`) - ⚡ Supports prediction of 3 entity types: PER, ORG, LOC - 💾 Compatible with Hugging Face `pipeline()` for easy inference --- ## 🧠 Intended Uses - Resume and document parsing - Chatbots and virtual assistants - Named entity tagging in structured documents - Search and information retrieval systems - News or content analysis --- ## 🚫 Limitations - Trained only on English formal texts - May not generalize well to informal text or domain-specific jargon - Subword tokenization may split entities (e.g., "Cupertino" → "Cup", "##ert", "##ino") - Limited to the entities available in the original dataset (PER, ORG, LOC only) --- ## 🏋️‍♂️ Training Details | Field | Value | |---------------|------------------------------| | Base Model | `bert-base-cased` | | Dataset | Entity Annotated Corpus | | Framework | PyTorch with Transformers | | Epochs | 3 | | Batch Size | 16 | | Max Length | 128 tokens | | Optimizer | AdamW | | Loss | CrossEntropyLoss (token-level) | | Device | Trained on CUDA-enabled GPU | --- ## 📊 Evaluation Metrics | Metric | Score | |-----------|-------| | Precision | 83.15 | | Recall | 83.85 | | F1-Score | 83.50 | --- ## 🔎 Label Mapping | Label ID | Entity Type | |----------|--------------| | 0 | O | | 1 | B-PER | | 2 | I-PER | | 3 | B-ORG | | 4 | I-ORG | | 5 | B-LOC | | 6 | I-LOC | --- ## 🚀 Usage ```python from transformers import AutoTokenizer, AutoModelForTokenClassification from transformers import pipeline model_name = "/AventIQ-AI/NER-BERT-AI-Model-using-annotated-corpus-ner" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForTokenClassification.from_pretrained(model_name) nlp = pipeline("ner", model=model, tokenizer=tokenizer) example = "My name is Wolfgang and I live in Berlin" ner_results = nlp(example) print(ner_results) ``` ## 🧩 Quantization Post-training quantization can be applied using PyTorch to reduce model size and improve inference performance, especially on edge devices. ## 🗂 Repository Structure ``` . ├── model/ # Trained model files ├── tokenizer_config/ # Tokenizer and vocab files ├── model.safensors/ # Model in safetensors format ├── README.md # Model card ``` ## 🤝 Contributing We welcome feedback, bug reports, and improvements! Feel free to open an issue or submit a pull request.