File size: 4,172 Bytes
1695f42 34f976f 1695f42 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 |
# π§ Resume-Parsing-NER-AI-Model
A custom Named Entity Recognition (NER) model fine-tuned on annotated resume data using a pre-trained BERT architecture. This model extracts structured information such as names, emails, phone numbers, skills, job titles, education, and companies from raw resume text.
---
## β¨ Model Highlights
- π Base Model: bert-base-cased-resume-ner
- π Datasets: Custom annotated resume dataset (BIO format)
- π·οΈ Entity Labels: Name, Email, Phone, Education, Skills, Company, Job Title
- π§ Framework: Hugging Face Transformers + PyTorch
- πΎ Format: transformers model directory (with tokenizer and config)
---
## π§ Intended Uses
- β
Resume parsing and candidate data extraction
- β
Applicant Tracking Systems (ATS)
- β
Automated HR screening tools
- β
Resume data analytics and visualization
- β
Chatbots and document understanding applications
---
## π« Limitations
- β Performance may degrade on resumes with non-standard formatting
- β Might not capture entities in handwritten or image-based resumes
- β May not generalize to other document types without re-training
---
## ποΈββοΈ Training Details
| Attribute | Value |
|--------------------|----------------------------------|
| Base Model | bert-base-cased |
| Dataset | Food-101-Dataset |
| Task Type | Token Classification (NER) |
| Epochs | 3 |
| Batch Size | 16 |
| Optimizer | AdamW |
| Loss Function | CrossEntropyLoss |
| Framework | PyTorch + Transformers |
| Hardware | CUDA-enabled GPU |
---
## π Evaluation Metrics
| Metric | Score |
| ----------------------------------------------- | ----- |
| Accuracy | 0.98 |
| F1-Score | 0.98 |
| Precision | 0.97 |
| Recall | 0.98 |
---
π Usage
```python
from datasets import load_dataset
from transformers import AutoTokenizer,
from transformers import AutoModelForTokenClassification,
from transformers import TrainingArguments, Trainer
from transformers import pipeline
# Load model and processor
model_name = "AventIQ-AI/Resume-Parsing-NER-AI-Model"
model = AutoModelForImageClassification.from_pretrained("bert-base-cased")
from transformers import pipeline
ner_pipe = pipeline("ner", model="./resume-ner-model", tokenizer="./resume-ner-model", aggregation_strategy="simple")
text = "John worked at Infosys as an Analyst. Email: [email protected]"
ner_results = ner_pipe(text)
for entity in ner_results:
print(f"{entity['word']} β {entity['entity_group']} ({entity['score']:.2f})")
label_list = [
"O", # 0
"B-NAME", # 1
"I-NAME", # 2
"B-EMAIL", # 3
"I-EMAIL", # 4
"B-PHONE", # 5
"I-PHONE", # 6
"B-EDUCATION", # 7
"I-EDUCATION", # 8
"B-SKILL", # 9
"I-SKILL", # 10
"B-COMPANY", # 11
"I-COMPANY", # 12
"B-JOB", # 13
"I-JOB" # 14
]
```
---
- π§© Quantization
- Post-training static quantization applied using PyTorch to reduce model size and accelerate inference on edge devices.
----
π Repository Structure
```
.
beans-vit-finetuned/
βββ config.json β
Model configuration
βββ pytorch_model.bin β
Fine-tuned model weights
βββ tokenizer_config.json β
Tokenizer configuration
βββ vocab.txt β
BERT vocabulary
βββ training_args.bin β
Training parameters
βββ preprocessor_config.json β
Optional tokenizer pre-processing info
βββ README.md β
Model card
```
---
π€ Contributing
Open to improvements and feedback! Feel free to submit a pull request or open an issue if you find any bugs or want to enhance the model.
|