AmanSengar commited on
Commit
1695f42
Β·
verified Β·
1 Parent(s): 1934c23

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +130 -0
README.md ADDED
@@ -0,0 +1,130 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🧠 Resume-Parsing-NER-AI-Model
2
+
3
+ A custom Named Entity Recognition (NER) model fine-tuned on annotated resume data using a pre-trained BERT architecture. This model extracts structured information such as names, emails, phone numbers, skills, job titles, education, and companies from raw resume text.
4
+
5
+ ---
6
+
7
+
8
+ ## ✨ Model Highlights
9
+
10
+ - πŸ“Œ Base Model: bert-base-cased-resume-ner
11
+ - πŸ“š Datasets: Custom annotated resume dataset (BIO format)
12
+ - 🏷️ Entity Labels: Name, Email, Phone, Education, Skills, Company, Job Title
13
+ - πŸ”§ Framework: Hugging Face Transformers + PyTorch
14
+ - πŸ’Ύ Format: transformers model directory (with tokenizer and config)
15
+
16
+ ---
17
+
18
+
19
+ ## 🧠 Intended Uses
20
+
21
+ - βœ… Resume parsing and candidate data extraction
22
+ - βœ… Applicant Tracking Systems (ATS)
23
+ - βœ… Automated HR screening tools
24
+ - βœ… Resume data analytics and visualization
25
+ - βœ… Chatbots and document understanding applications
26
+
27
+ ---
28
+
29
+ ## 🚫 Limitations
30
+
31
+ - ❌ Performance may degrade on resumes with non-standard formatting
32
+ - ❌ Might not capture entities in handwritten or image-based resumes
33
+ - ❌ May not generalize to other document types without re-training
34
+
35
+ ---
36
+
37
+ ## πŸ‹οΈβ€β™‚οΈ Training Details
38
+
39
+ | Attribute | Value |
40
+ |--------------------|----------------------------------|
41
+ | Base Model | bert-base-cased |
42
+ | Dataset | Food-101-Dataset |
43
+ | Task Type | Token Classification (NER) |
44
+ | Epochs | 3 |
45
+ | Batch Size | 16 |
46
+ | Optimizer | AdamW |
47
+ | Loss Function | CrossEntropyLoss |
48
+ | Framework | PyTorch + Transformers |
49
+ | Hardware | CUDA-enabled GPU |
50
+
51
+ ---
52
+
53
+ ## πŸ“Š Evaluation Metrics
54
+
55
+
56
+ | Metric | Score |
57
+ | ----------------------------------------------- | ----- |
58
+ | Accuracy | 0.98 |
59
+ | F1-Score | 0.98 |
60
+ | Precision | 0.97 |
61
+ | Recall | 0.98 |
62
+
63
+
64
+ ---
65
+
66
+ πŸš€ Usage
67
+ ```python
68
+ from datasets import load_dataset
69
+ from transformers import AutoTokenizer,
70
+ from transformers import AutoModelForTokenClassification,
71
+ from transformers import TrainingArguments, Trainer
72
+ from transformers import pipeline
73
+
74
+
75
+ # Load model and processor
76
+ model_name = "AventIQ-AI/Resume-Parsing-NER-AI-Model"
77
+ model = AutoModelForImageClassification.from_pretrained("bert-base-cased")
78
+
79
+ from transformers import pipeline
80
+
81
+ ner_pipe = pipeline("ner", model="./resume-ner-model", tokenizer="./resume-ner-model", aggregation_strategy="simple")
82
+
83
+ text = "John worked at Infosys as an Analyst. Email: [email protected]"
84
+ ner_results = ner_pipe(text)
85
+
86
+ for entity in ner_results:
87
+ print(f"{entity['word']} β†’ {entity['entity_group']} ({entity['score']:.2f})")
88
+ label_list = [
89
+ "O", # 0
90
+ "B-NAME", # 1
91
+ "I-NAME", # 2
92
+ "B-EMAIL", # 3
93
+ "I-EMAIL", # 4
94
+ "B-PHONE", # 5
95
+ "I-PHONE", # 6
96
+ "B-EDUCATION", # 7
97
+ "I-EDUCATION", # 8
98
+ "B-SKILL", # 9
99
+ "I-SKILL", # 10
100
+ "B-COMPANY", # 11
101
+ "I-COMPANY", # 12
102
+ "B-JOB", # 13
103
+ "I-JOB" # 14
104
+ ]
105
+
106
+ ```
107
+ ---
108
+
109
+ - 🧩 Quantization
110
+ - Post-training static quantization applied using PyTorch to reduce model size and accelerate inference on edge devices.
111
+
112
+ ----
113
+
114
+ πŸ—‚ Repository Structure
115
+ ```
116
+ .
117
+ beans-vit-finetuned/
118
+ β”œβ”€β”€ config.json βœ… Model configuration
119
+ β”œβ”€β”€ pytorch_model.bin βœ… Fine-tuned model weights
120
+ β”œβ”€β”€ tokenizer_config.json βœ… Tokenizer configuration
121
+ β”œβ”€β”€ vocab.txt βœ… BERT vocabulary
122
+ β”œβ”€β”€ training_args.bin βœ… Training parameters
123
+ β”œβ”€β”€ preprocessor_config.json βœ… Optional tokenizer pre-processing info
124
+ β”œβ”€β”€ README.md βœ… Model card
125
+
126
+ ```
127
+ ---
128
+ 🀝 Contributing
129
+
130
+ Open to improvements and feedback! Feel free to submit a pull request or open an issue if you find any bugs or want to enhance the model.