AmanSengar's picture
Update README.md
34f976f verified
# 🧠 Resume-Parsing-NER-AI-Model
A custom Named Entity Recognition (NER) model fine-tuned on annotated resume data using a pre-trained BERT architecture. This model extracts structured information such as names, emails, phone numbers, skills, job titles, education, and companies from raw resume text.
---
## ✨ Model Highlights
- πŸ“Œ Base Model: bert-base-cased-resume-ner
- πŸ“š Datasets: Custom annotated resume dataset (BIO format)
- 🏷️ Entity Labels: Name, Email, Phone, Education, Skills, Company, Job Title
- πŸ”§ Framework: Hugging Face Transformers + PyTorch
- πŸ’Ύ Format: transformers model directory (with tokenizer and config)
---
## 🧠 Intended Uses
- βœ… Resume parsing and candidate data extraction
- βœ… Applicant Tracking Systems (ATS)
- βœ… Automated HR screening tools
- βœ… Resume data analytics and visualization
- βœ… Chatbots and document understanding applications
---
## 🚫 Limitations
- ❌ Performance may degrade on resumes with non-standard formatting
- ❌ Might not capture entities in handwritten or image-based resumes
- ❌ May not generalize to other document types without re-training
---
## πŸ‹οΈβ€β™‚οΈ Training Details
| Attribute | Value |
|--------------------|----------------------------------|
| Base Model | bert-base-cased |
| Dataset | Food-101-Dataset |
| Task Type | Token Classification (NER) |
| Epochs | 3 |
| Batch Size | 16 |
| Optimizer | AdamW |
| Loss Function | CrossEntropyLoss |
| Framework | PyTorch + Transformers |
| Hardware | CUDA-enabled GPU |
---
## πŸ“Š Evaluation Metrics
| Metric | Score |
| ----------------------------------------------- | ----- |
| Accuracy | 0.98 |
| F1-Score | 0.98 |
| Precision | 0.97 |
| Recall | 0.98 |
---
πŸš€ Usage
```python
from datasets import load_dataset
from transformers import AutoTokenizer,
from transformers import AutoModelForTokenClassification,
from transformers import TrainingArguments, Trainer
from transformers import pipeline
# Load model and processor
model_name = "AventIQ-AI/Resume-Parsing-NER-AI-Model"
model = AutoModelForImageClassification.from_pretrained("bert-base-cased")
from transformers import pipeline
ner_pipe = pipeline("ner", model="./resume-ner-model", tokenizer="./resume-ner-model", aggregation_strategy="simple")
text = "John worked at Infosys as an Analyst. Email: [email protected]"
ner_results = ner_pipe(text)
for entity in ner_results:
print(f"{entity['word']} β†’ {entity['entity_group']} ({entity['score']:.2f})")
label_list = [
"O", # 0
"B-NAME", # 1
"I-NAME", # 2
"B-EMAIL", # 3
"I-EMAIL", # 4
"B-PHONE", # 5
"I-PHONE", # 6
"B-EDUCATION", # 7
"I-EDUCATION", # 8
"B-SKILL", # 9
"I-SKILL", # 10
"B-COMPANY", # 11
"I-COMPANY", # 12
"B-JOB", # 13
"I-JOB" # 14
]
```
---
- 🧩 Quantization
- Post-training static quantization applied using PyTorch to reduce model size and accelerate inference on edge devices.
----
πŸ—‚ Repository Structure
```
.
beans-vit-finetuned/
β”œβ”€β”€ config.json βœ… Model configuration
β”œβ”€β”€ pytorch_model.bin βœ… Fine-tuned model weights
β”œβ”€β”€ tokenizer_config.json βœ… Tokenizer configuration
β”œβ”€β”€ vocab.txt βœ… BERT vocabulary
β”œβ”€β”€ training_args.bin βœ… Training parameters
β”œβ”€β”€ preprocessor_config.json βœ… Optional tokenizer pre-processing info
β”œβ”€β”€ README.md βœ… Model card
```
---
🀝 Contributing
Open to improvements and feedback! Feel free to submit a pull request or open an issue if you find any bugs or want to enhance the model.