File size: 4,172 Bytes

# 🧠 Resume-Parsing-NER-AI-Model

A custom Named Entity Recognition (NER) model fine-tuned on annotated resume data using a pre-trained BERT architecture. This model extracts structured information such as names, emails, phone numbers, skills, job titles, education, and companies from raw resume text.

---


## ✨ Model Highlights

- 📌 Base Model: bert-base-cased-resume-ner
- 📚 Datasets: Custom annotated resume dataset (BIO format)
- 🏷️ Entity Labels: Name, Email, Phone, Education, Skills, Company, Job Title
- 🔧 Framework: Hugging Face Transformers + PyTorch
- 💾 Format: transformers model directory (with tokenizer and config)

---


## 🧠 Intended Uses

- ✅ Resume parsing and candidate data extraction
- ✅ Applicant Tracking Systems (ATS)
- ✅ Automated HR screening tools
- ✅ Resume data analytics and visualization
- ✅ Chatbots and document understanding applications

---

## 🚫 Limitations

- ❌ Performance may degrade on resumes with non-standard formatting
- ❌ Might not capture entities in handwritten or image-based resumes
- ❌ May not generalize to other document types without re-training

---

## 🏋️‍♂️ Training Details

| Attribute          | Value                            |
|--------------------|----------------------------------|
| Base Model         | bert-base-cased                  |
| Dataset            | Food-101-Dataset                 |
| Task Type          | Token Classification (NER)       |
| Epochs             | 3                                |
| Batch Size         | 16                               |
| Optimizer          | AdamW                            |
| Loss Function      | CrossEntropyLoss                 |
| Framework          | PyTorch + Transformers           |
| Hardware           | CUDA-enabled GPU                 |

---

## 📊 Evaluation Metrics


| Metric                                          | Score |
| ----------------------------------------------- | ----- |
| Accuracy                                        | 0.98  |
| F1-Score                                        | 0.98  |
| Precision                                       | 0.97  |
| Recall                                          | 0.98  |


---

🚀 Usage
```python
from datasets import load_dataset
from transformers import AutoTokenizer,
from transformers import AutoModelForTokenClassification,
from transformers import TrainingArguments, Trainer
from transformers import pipeline


# Load model and processor
model_name = "AventIQ-AI/Resume-Parsing-NER-AI-Model"
model = AutoModelForImageClassification.from_pretrained("bert-base-cased")

from transformers import pipeline

ner_pipe = pipeline("ner", model="./resume-ner-model", tokenizer="./resume-ner-model", aggregation_strategy="simple")

text = "John worked at Infosys as an Analyst. Email: [email protected]"
ner_results = ner_pipe(text)

for entity in ner_results:
    print(f"{entity['word']} → {entity['entity_group']} ({entity['score']:.2f})")
label_list = [
    "O",           # 0
    "B-NAME",      # 1
    "I-NAME",      # 2
    "B-EMAIL",     # 3
    "I-EMAIL",     # 4
    "B-PHONE",     # 5
    "I-PHONE",     # 6
    "B-EDUCATION", # 7
    "I-EDUCATION", # 8
    "B-SKILL",     # 9
    "I-SKILL",     # 10
    "B-COMPANY",   # 11
    "I-COMPANY",   # 12
    "B-JOB",       # 13
    "I-JOB"        # 14
]

```
---

- 🧩 Quantization
- Post-training static quantization applied using PyTorch to reduce model size and accelerate inference on edge devices.

----

🗂 Repository Structure
```
.
beans-vit-finetuned/
├── config.json               ✅ Model configuration
├── pytorch_model.bin         ✅ Fine-tuned model weights
├── tokenizer_config.json     ✅ Tokenizer configuration
├── vocab.txt                 ✅ BERT vocabulary
├── training_args.bin         ✅ Training parameters
├── preprocessor_config.json  ✅ Optional tokenizer pre-processing info
├── README.md                 ✅ Model card

```
---
🤝 Contributing

Open to improvements and feedback! Feel free to submit a pull request or open an issue if you find any bugs or want to enhance the model.