|
# π§ Resume-Parsing-NER-AI-Model |
|
|
|
A custom Named Entity Recognition (NER) model fine-tuned on annotated resume data using a pre-trained BERT architecture. This model extracts structured information such as names, emails, phone numbers, skills, job titles, education, and companies from raw resume text. |
|
|
|
--- |
|
|
|
|
|
## β¨ Model Highlights |
|
|
|
- π Base Model: bert-base-cased-resume-ner |
|
- π Datasets: Custom annotated resume dataset (BIO format) |
|
- π·οΈ Entity Labels: Name, Email, Phone, Education, Skills, Company, Job Title |
|
- π§ Framework: Hugging Face Transformers + PyTorch |
|
- πΎ Format: transformers model directory (with tokenizer and config) |
|
|
|
--- |
|
|
|
|
|
## π§ Intended Uses |
|
|
|
- β
Resume parsing and candidate data extraction |
|
- β
Applicant Tracking Systems (ATS) |
|
- β
Automated HR screening tools |
|
- β
Resume data analytics and visualization |
|
- β
Chatbots and document understanding applications |
|
|
|
--- |
|
|
|
## π« Limitations |
|
|
|
- β Performance may degrade on resumes with non-standard formatting |
|
- β Might not capture entities in handwritten or image-based resumes |
|
- β May not generalize to other document types without re-training |
|
|
|
--- |
|
|
|
## ποΈββοΈ Training Details |
|
|
|
| Attribute | Value | |
|
|--------------------|----------------------------------| |
|
| Base Model | bert-base-cased | |
|
| Dataset | Food-101-Dataset | |
|
| Task Type | Token Classification (NER) | |
|
| Epochs | 3 | |
|
| Batch Size | 16 | |
|
| Optimizer | AdamW | |
|
| Loss Function | CrossEntropyLoss | |
|
| Framework | PyTorch + Transformers | |
|
| Hardware | CUDA-enabled GPU | |
|
|
|
--- |
|
|
|
## π Evaluation Metrics |
|
|
|
|
|
| Metric | Score | |
|
| ----------------------------------------------- | ----- | |
|
| Accuracy | 0.98 | |
|
| F1-Score | 0.98 | |
|
| Precision | 0.97 | |
|
| Recall | 0.98 | |
|
|
|
|
|
--- |
|
|
|
π Usage |
|
```python |
|
from datasets import load_dataset |
|
from transformers import AutoTokenizer, |
|
from transformers import AutoModelForTokenClassification, |
|
from transformers import TrainingArguments, Trainer |
|
from transformers import pipeline |
|
|
|
|
|
# Load model and processor |
|
model_name = "AventIQ-AI/Resume-Parsing-NER-AI-Model" |
|
model = AutoModelForImageClassification.from_pretrained("bert-base-cased") |
|
|
|
from transformers import pipeline |
|
|
|
ner_pipe = pipeline("ner", model="./resume-ner-model", tokenizer="./resume-ner-model", aggregation_strategy="simple") |
|
|
|
text = "John worked at Infosys as an Analyst. Email: [email protected]" |
|
ner_results = ner_pipe(text) |
|
|
|
for entity in ner_results: |
|
print(f"{entity['word']} β {entity['entity_group']} ({entity['score']:.2f})") |
|
label_list = [ |
|
"O", # 0 |
|
"B-NAME", # 1 |
|
"I-NAME", # 2 |
|
"B-EMAIL", # 3 |
|
"I-EMAIL", # 4 |
|
"B-PHONE", # 5 |
|
"I-PHONE", # 6 |
|
"B-EDUCATION", # 7 |
|
"I-EDUCATION", # 8 |
|
"B-SKILL", # 9 |
|
"I-SKILL", # 10 |
|
"B-COMPANY", # 11 |
|
"I-COMPANY", # 12 |
|
"B-JOB", # 13 |
|
"I-JOB" # 14 |
|
] |
|
|
|
``` |
|
--- |
|
|
|
- π§© Quantization |
|
- Post-training static quantization applied using PyTorch to reduce model size and accelerate inference on edge devices. |
|
|
|
---- |
|
|
|
π Repository Structure |
|
``` |
|
. |
|
beans-vit-finetuned/ |
|
βββ config.json β
Model configuration |
|
βββ pytorch_model.bin β
Fine-tuned model weights |
|
βββ tokenizer_config.json β
Tokenizer configuration |
|
βββ vocab.txt β
BERT vocabulary |
|
βββ training_args.bin β
Training parameters |
|
βββ preprocessor_config.json β
Optional tokenizer pre-processing info |
|
βββ README.md β
Model card |
|
|
|
``` |
|
--- |
|
π€ Contributing |
|
|
|
Open to improvements and feedback! Feel free to submit a pull request or open an issue if you find any bugs or want to enhance the model. |
|
|