root
commited on
Commit
·
e232281
1
Parent(s):
72d33a9
ss
Browse files- README.md +61 -85
- app.py +569 -583
- explanation_generator.py +178 -0
- fix_dependencies.py +76 -0
- requirements.txt +15 -20
README.md
CHANGED
@@ -12,102 +12,78 @@ license: mit
|
|
12 |
|
13 |
# Resume Screener and Skill Extractor
|
14 |
|
15 |
-
A
|
16 |
|
17 |
## Features
|
18 |
|
19 |
-
- **
|
20 |
-
- **
|
21 |
-
- **
|
22 |
-
- **
|
23 |
-
- **
|
24 |
-
- **
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
|
30 |
-
|
31 |
-
|
32 |
-
|
33 |
-
|
34 |
-
|
35 |
-
|
36 |
-
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
41 |
|
42 |
```bash
|
|
|
|
|
43 |
pip install -r requirements.txt
|
44 |
-
python -m spacy download en_core_web_sm
|
45 |
-
python -c "import nltk; nltk.download('punkt')"
|
46 |
-
```
|
47 |
-
|
48 |
-
## Common Issues and Solutions
|
49 |
-
|
50 |
-
### ImportError: cannot import name 'cached_download' from 'huggingface_hub'
|
51 |
-
|
52 |
-
This occurs due to version incompatibility between huggingface_hub and sentence_transformers. To fix:
|
53 |
-
|
54 |
-
1. Run the dependency fixer script: `python fix_dependencies.py`
|
55 |
-
2. Or manually install compatible versions: `pip install huggingface-hub==0.14.1 sentence-transformers==2.2.2`
|
56 |
-
|
57 |
-
### PydanticImportError: `pydantic:ConstrainedStr` has been removed in V2
|
58 |
-
|
59 |
-
This error occurs when using spaCy 3.5.0 with pydantic v2. To fix:
|
60 |
-
|
61 |
-
1. Run the dependency fixer script: `python fix_dependencies.py`
|
62 |
-
2. Or manually install a compatible pydantic version: `pip install "pydantic<2.0.0"`
|
63 |
-
|
64 |
-
## Running the Application
|
65 |
-
|
66 |
-
```bash
|
67 |
streamlit run app.py
|
68 |
```
|
69 |
|
70 |
-
##
|
71 |
-
|
72 |
-
1. Upload a resume in PDF format
|
73 |
-
2. Select a target job position
|
74 |
-
3. Review the analysis results in the different tabs
|
75 |
-
4. Click "Generate Personalized Career Advice" to get recommendations
|
76 |
-
|
77 |
-
## Dependencies
|
78 |
-
|
79 |
-
- streamlit
|
80 |
-
- pdfplumber
|
81 |
-
- spacy
|
82 |
-
- transformers
|
83 |
-
- sentence-transformers
|
84 |
-
- torch
|
85 |
-
- nltk
|
86 |
-
- plotly
|
87 |
-
- pandas
|
88 |
-
- numpy
|
89 |
-
- matplotlib
|
90 |
-
|
91 |
-
## Supported Job Positions
|
92 |
-
|
93 |
-
- Software Engineer
|
94 |
-
- Interaction Designer
|
95 |
-
- Data Scientist
|
96 |
-
|
97 |
-
## How it Works
|
98 |
|
99 |
-
|
100 |
-
|
101 |
-
|
102 |
-
|
103 |
-
|
104 |
-
- Suggestions for skills you might want to develop
|
105 |
|
106 |
-
##
|
107 |
|
108 |
-
|
109 |
-
- Hugging Face Transformers for AI-powered text summarization
|
110 |
-
- spaCy for natural language processing
|
111 |
-
- PyPDF2 and python-docx for document parsing
|
112 |
|
113 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
12 |
|
13 |
# Resume Screener and Skill Extractor
|
14 |
|
15 |
+
A Hugging Face Space application for efficiently screening resumes against job descriptions using a hybrid ranking approach that combines semantic similarity with keyword-based scoring.
|
16 |
|
17 |
## Features
|
18 |
|
19 |
+
- **Hybrid Resume Ranking**: Combines semantic similarity (via NV-Embed-v2) with keyword-based BM25 scoring
|
20 |
+
- **Skill Extraction**: Automatically identifies relevant skills from resumes based on job requirements
|
21 |
+
- **Fast Search**: Uses FAISS for efficient similarity search with large resume collections
|
22 |
+
- **Multi-format Support**: Processes PDFs, DOCX, TXT, and CSV files
|
23 |
+
- **Explanation Generation**: Provides explanations for why each resume was ranked highly
|
24 |
+
- **Visualization**: Displays comparative scores and key matches for easy analysis
|
25 |
+
- **Batch Processing**: Supports uploading multiple resumes simultaneously
|
26 |
+
|
27 |
+
## How It Works
|
28 |
+
|
29 |
+
1. **Input**: Provide a job description and upload resumes (PDF, DOCX, TXT, or CSV format)
|
30 |
+
2. **Processing**: The system creates embeddings for both the job description and resumes using the NV-Embed-v2 model
|
31 |
+
3. **Ranking**: Calculates a hybrid score based on:
|
32 |
+
- Semantic similarity (cosine similarity between embeddings)
|
33 |
+
- Keyword relevance (BM25 scoring)
|
34 |
+
4. **Results**: Returns the top 10 most suitable resumes with:
|
35 |
+
- Overall score and individual component scores
|
36 |
+
- Matched skills and key phrases
|
37 |
+
- Explanations for why each resume was ranked highly
|
38 |
+
|
39 |
+
## Technical Details
|
40 |
+
|
41 |
+
### Models Used
|
42 |
+
- **NV-Embed-v2**: State-of-the-art embedding model for semantic similarity
|
43 |
+
- **QwQ-32B**: Used for generating explanations (simulated in the current version)
|
44 |
+
|
45 |
+
### Libraries
|
46 |
+
- **FAISS**: Facebook AI Similarity Search for fast vector similarity search
|
47 |
+
- **rank_bm25**: Implementation of the BM25 algorithm for keyword-based scoring
|
48 |
+
- **Streamlit**: For the user interface
|
49 |
+
- **Hugging Face Transformers**: For accessing and using the models
|
50 |
+
|
51 |
+
## Configuration Options
|
52 |
+
|
53 |
+
The sidebar provides several configuration options:
|
54 |
+
- **Model Selection**: Choose which embedding model to use
|
55 |
+
- **Ranking Weights**: Adjust the balance between semantic similarity and keyword matching
|
56 |
+
- **Results Count**: Set how many top results to display
|
57 |
+
- **FAISS Usage**: Toggle the use of FAISS for faster searching with large resume collections
|
58 |
+
|
59 |
+
## Getting Started
|
60 |
+
|
61 |
+
### Online Usage
|
62 |
+
1. Visit the Hugging Face Space at [URL]
|
63 |
+
2. Enter a job description
|
64 |
+
3. Upload resumes (PDF, DOCX, TXT, or CSV)
|
65 |
+
4. Click "Find Top Candidates"
|
66 |
+
5. Review the results
|
67 |
+
|
68 |
+
### Local Installation
|
69 |
|
70 |
```bash
|
71 |
+
git clone https://huggingface.co/spaces/[username]/Resume_Screener_and_Skill_Extractor
|
72 |
+
cd Resume_Screener_and_Skill_Extractor
|
73 |
pip install -r requirements.txt
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
74 |
streamlit run app.py
|
75 |
```
|
76 |
|
77 |
+
## Future Enhancements
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
78 |
|
79 |
+
- Integration with Hugging Face datasets for loading resumes directly
|
80 |
+
- Enhanced skill extraction using more sophisticated NLP techniques
|
81 |
+
- Real-time explanation generation using QwQ-32B
|
82 |
+
- Support for additional file formats and languages
|
83 |
+
- Customizable scoring algorithms and weights
|
|
|
84 |
|
85 |
+
## License
|
86 |
|
87 |
+
MIT License
|
|
|
|
|
|
|
88 |
|
89 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
app.py
CHANGED
@@ -1,642 +1,628 @@
|
|
1 |
import streamlit as st
|
2 |
import pdfplumber
|
3 |
-
import re
|
4 |
import pandas as pd
|
5 |
-
import matplotlib.pyplot as plt
|
6 |
-
import torch
|
7 |
-
from datetime import datetime
|
8 |
-
import plotly.express as px
|
9 |
-
import plotly.graph_objects as go
|
10 |
import numpy as np
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
|
12 |
-
#
|
13 |
st.set_page_config(
|
14 |
page_title="Resume Screener & Skill Extractor",
|
15 |
page_icon="📄",
|
16 |
-
layout="wide"
|
|
|
17 |
)
|
18 |
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
|
23 |
-
# Import dependencies with fallbacks
|
24 |
-
try:
|
25 |
-
import spacy
|
26 |
-
spacy_available = True
|
27 |
-
except ImportError:
|
28 |
-
spacy_available = False
|
29 |
-
st.warning("spaCy is not available. Some features will be limited.")
|
30 |
-
|
31 |
-
try:
|
32 |
-
from transformers import pipeline
|
33 |
-
transformers_available = True
|
34 |
-
except ImportError:
|
35 |
-
transformers_available = False
|
36 |
-
st.warning("Transformers is not available. Summary generation will be limited.")
|
37 |
-
|
38 |
-
try:
|
39 |
-
import nltk
|
40 |
-
from nltk.tokenize import word_tokenize
|
41 |
-
nltk_available = True
|
42 |
|
43 |
-
#
|
44 |
-
|
45 |
-
|
46 |
-
|
47 |
-
|
48 |
-
|
49 |
-
nltk_available = False
|
50 |
-
st.warning("NLTK is not available. Some text processing features will be limited.")
|
51 |
-
|
52 |
-
# Custom sentence-transformers fallback
|
53 |
-
try:
|
54 |
-
from sentence_transformers import SentenceTransformer
|
55 |
-
try:
|
56 |
-
from sentence_transformers import util as st_util
|
57 |
-
sentence_transformers_available = True
|
58 |
-
except ImportError:
|
59 |
-
# Define our own utility functions
|
60 |
-
class CustomSTUtil:
|
61 |
-
@staticmethod
|
62 |
-
def pytorch_cos_sim(a, b):
|
63 |
-
if not isinstance(a, torch.Tensor):
|
64 |
-
a = torch.tensor(a)
|
65 |
-
if not isinstance(b, torch.Tensor):
|
66 |
-
b = torch.tensor(b)
|
67 |
-
|
68 |
-
if len(a.shape) == 1:
|
69 |
-
a = a.unsqueeze(0)
|
70 |
-
if len(b.shape) == 1:
|
71 |
-
b = b.unsqueeze(0)
|
72 |
-
|
73 |
-
a_norm = torch.nn.functional.normalize(a, p=2, dim=1)
|
74 |
-
b_norm = torch.nn.functional.normalize(b, p=2, dim=1)
|
75 |
-
return torch.mm(a_norm, b_norm.transpose(0, 1))
|
76 |
-
|
77 |
-
st_util = CustomSTUtil()
|
78 |
-
sentence_transformers_available = True
|
79 |
-
except ImportError:
|
80 |
-
sentence_transformers_available = False
|
81 |
-
st.warning("Sentence Transformers is not available. Semantic matching will be disabled.")
|
82 |
-
|
83 |
-
# Load models with exception handling
|
84 |
-
@st.cache_resource
|
85 |
-
def load_models():
|
86 |
-
models = {}
|
87 |
|
88 |
-
|
89 |
-
|
90 |
-
|
91 |
-
|
92 |
-
|
93 |
-
try:
|
94 |
-
import subprocess
|
95 |
-
import sys
|
96 |
-
subprocess.check_call([sys.executable, "-m", "spacy", "download", "en_core_web_sm"])
|
97 |
-
models['nlp'] = spacy.load("en_core_web_sm")
|
98 |
-
except Exception as e:
|
99 |
-
st.warning(f"Could not load spaCy model: {e}")
|
100 |
-
models['nlp'] = None
|
101 |
-
else:
|
102 |
-
models['nlp'] = None
|
103 |
|
104 |
-
#
|
105 |
-
|
106 |
-
|
107 |
-
|
108 |
-
|
109 |
-
st.warning(f"Could not load summarizer model: {e}")
|
110 |
-
# Simple fallback summarizer
|
111 |
-
models['summarizer'] = lambda text, **kwargs: [{"summary_text": ". ".join(text.split(". ")[:5]) + "."}]
|
112 |
-
else:
|
113 |
-
# Simple fallback summarizer
|
114 |
-
models['summarizer'] = lambda text, **kwargs: [{"summary_text": ". ".join(text.split(". ")[:5]) + "."}]
|
115 |
|
116 |
-
#
|
117 |
-
|
118 |
-
|
119 |
-
|
120 |
-
|
121 |
-
st.warning(f"Could not load sentence transformer model: {e}")
|
122 |
-
models['sentence_model'] = None
|
123 |
-
else:
|
124 |
-
models['sentence_model'] = None
|
125 |
|
126 |
-
|
127 |
-
|
128 |
-
|
129 |
-
job_descriptions = {
|
130 |
-
"Software Engineer": {
|
131 |
-
"skills": ["python", "java", "javascript", "sql", "algorithms", "data structures",
|
132 |
-
"git", "cloud", "web development", "software development", "coding"],
|
133 |
-
"description": "Looking for software engineers with strong programming skills and experience in software development.",
|
134 |
-
"must_have": ["python", "git", "algorithms"],
|
135 |
-
"nice_to_have": ["cloud", "java", "javascript"],
|
136 |
-
"seniority_levels": {
|
137 |
-
"Junior": "0-2 years of experience, familiar with basic programming concepts",
|
138 |
-
"Mid-level": "3-5 years of experience, proficient in multiple languages, experience with system design",
|
139 |
-
"Senior": "6+ years of experience, expert in software architecture, mentoring, and leading projects"
|
140 |
-
}
|
141 |
-
},
|
142 |
-
"Interaction Designer": {
|
143 |
-
"skills": ["ui", "ux", "user research", "wireframing", "prototyping", "figma",
|
144 |
-
"sketch", "adobe", "design thinking", "interaction design"],
|
145 |
-
"description": "Seeking interaction designers with expertise in user experience and interface design.",
|
146 |
-
"must_have": ["ui", "ux", "prototyping"],
|
147 |
-
"nice_to_have": ["figma", "sketch", "user research"],
|
148 |
-
"seniority_levels": {
|
149 |
-
"Junior": "0-2 years of experience, basic design skills, understanding of UX principles",
|
150 |
-
"Mid-level": "3-5 years of experience, strong portfolio, experience with user research",
|
151 |
-
"Senior": "6+ years of experience, leadership in design systems, driving design strategy"
|
152 |
-
}
|
153 |
-
},
|
154 |
-
"Data Scientist": {
|
155 |
-
"skills": ["python", "r", "statistics", "machine learning", "data analysis",
|
156 |
-
"sql", "tensorflow", "pytorch", "pandas", "numpy"],
|
157 |
-
"description": "Looking for data scientists with strong analytical and machine learning skills.",
|
158 |
-
"must_have": ["python", "statistics", "machine learning"],
|
159 |
-
"nice_to_have": ["tensorflow", "pytorch", "r"],
|
160 |
-
"seniority_levels": {
|
161 |
-
"Junior": "0-2 years of experience, basic knowledge of statistics and ML algorithms",
|
162 |
-
"Mid-level": "3-5 years of experience, model development, feature engineering",
|
163 |
-
"Senior": "6+ years of experience, advanced ML techniques, research experience"
|
164 |
-
}
|
165 |
-
}
|
166 |
-
}
|
167 |
|
168 |
-
#
|
169 |
-
|
170 |
-
|
171 |
-
|
172 |
-
|
173 |
-
|
174 |
-
|
175 |
-
|
176 |
-
|
177 |
-
|
178 |
-
|
179 |
-
|
180 |
-
|
181 |
-
|
182 |
-
|
183 |
-
required_skills = job_descriptions[job_title]["skills"]
|
184 |
-
|
185 |
-
# Simple keyword matching (no NLP needed)
|
186 |
-
for skill in required_skills:
|
187 |
-
if skill.lower() in text.lower():
|
188 |
-
found_skills.append(skill)
|
189 |
-
|
190 |
-
return found_skills
|
191 |
|
192 |
-
|
193 |
-
""
|
194 |
-
|
195 |
-
|
196 |
-
|
197 |
-
|
198 |
-
|
199 |
-
|
200 |
-
|
201 |
-
|
202 |
-
role = match.group(2).strip()
|
203 |
-
duration = match.group(3).strip()
|
204 |
|
205 |
-
|
206 |
-
|
207 |
-
|
208 |
-
|
209 |
-
|
210 |
-
|
211 |
-
|
212 |
-
|
213 |
-
|
214 |
-
|
215 |
-
|
216 |
-
|
217 |
-
|
218 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
219 |
|
220 |
-
|
221 |
-
|
222 |
-
|
223 |
-
|
224 |
-
|
225 |
-
|
226 |
-
})
|
227 |
-
except:
|
228 |
-
experiences.append({
|
229 |
-
'company': company,
|
230 |
-
'role': role,
|
231 |
-
'duration': duration
|
232 |
-
})
|
233 |
-
|
234 |
-
return experiences
|
235 |
-
|
236 |
-
def analyze_resume(text, job_title, models):
|
237 |
-
"""Analyze resume text."""
|
238 |
-
# Extract skills
|
239 |
-
found_skills = extract_skills(text, job_title, models.get('nlp'))
|
240 |
|
241 |
-
|
242 |
-
|
243 |
try:
|
244 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
245 |
except Exception as e:
|
246 |
-
st.
|
247 |
-
|
248 |
-
else:
|
249 |
-
summary = text[:500] + "..."
|
250 |
-
|
251 |
-
# Extract work experience
|
252 |
-
experiences = extract_experience(text)
|
253 |
|
254 |
-
|
255 |
-
|
256 |
-
|
257 |
-
|
258 |
-
|
259 |
-
|
260 |
|
261 |
-
|
262 |
-
|
263 |
-
|
264 |
-
|
265 |
-
|
266 |
-
|
267 |
-
|
268 |
-
|
269 |
-
|
270 |
-
|
271 |
-
if years_exp < 3:
|
272 |
-
seniority = "Junior"
|
273 |
-
elif years_exp < 6:
|
274 |
-
seniority = "Mid-level"
|
275 |
-
else:
|
276 |
-
seniority = "Senior"
|
277 |
-
|
278 |
-
# Detect skill levels
|
279 |
-
skill_levels = {}
|
280 |
-
for skill in found_skills:
|
281 |
-
# Default level
|
282 |
-
skill_levels[skill] = "intermediate"
|
283 |
-
|
284 |
-
# Look for advanced indicators
|
285 |
-
advanced_patterns = [
|
286 |
-
f"expert in {skill}",
|
287 |
-
f"advanced {skill}",
|
288 |
-
f"extensive experience with {skill}"
|
289 |
-
]
|
290 |
-
if any(pattern in text.lower() for pattern in advanced_patterns):
|
291 |
-
skill_levels[skill] = "advanced"
|
292 |
|
293 |
-
|
294 |
-
|
295 |
-
|
296 |
-
|
297 |
-
|
298 |
-
|
299 |
-
|
300 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
301 |
|
302 |
-
|
303 |
-
|
304 |
-
|
305 |
-
|
306 |
-
sorted_exps = sorted(
|
307 |
-
[exp for exp in experiences if 'start_date' in exp],
|
308 |
-
key=lambda x: x['start_date']
|
309 |
-
)
|
310 |
|
311 |
-
#
|
312 |
-
|
313 |
-
|
314 |
-
|
315 |
-
|
316 |
-
|
317 |
-
|
318 |
-
|
319 |
-
'description': f"Overlapping roles at {current['company']} and {next_exp['company']}"
|
320 |
-
})
|
321 |
|
322 |
-
|
323 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
324 |
|
325 |
-
|
326 |
-
|
327 |
-
|
328 |
-
|
329 |
-
|
330 |
-
|
331 |
-
|
332 |
-
|
333 |
-
|
334 |
-
|
335 |
-
|
336 |
-
|
337 |
-
|
338 |
-
|
339 |
-
|
340 |
-
return f"Next potential role: Senior {job_title}"
|
341 |
-
elif seniority == "Mid-level":
|
342 |
-
roles = {
|
343 |
-
"Software Engineer": "Team Lead, Technical Lead, or Engineering Manager",
|
344 |
-
"Data Scientist": "Senior Data Scientist or Data Science Lead",
|
345 |
-
"Interaction Designer": "Senior Designer or UX Lead"
|
346 |
-
}
|
347 |
-
return f"Next potential roles: {roles.get(job_title, f'Senior {job_title}')}"
|
348 |
-
else: # Senior
|
349 |
-
roles = {
|
350 |
-
"Software Engineer": "Engineering Manager, Software Architect, or CTO",
|
351 |
-
"Data Scientist": "Head of Data Science, ML Engineering Manager, or Chief Data Officer",
|
352 |
-
"Interaction Designer": "Design Director, Head of UX, or VP of Design"
|
353 |
-
}
|
354 |
-
return f"Next potential roles: {roles.get(job_title, f'Director of {job_title}')}"
|
355 |
-
|
356 |
-
def generate_career_advice(resume_text, job_title, found_skills, missing_skills):
|
357 |
-
"""Generate career advice based on resume analysis."""
|
358 |
-
advice = f"""## Career Development Plan for {job_title}
|
359 |
-
|
360 |
-
### Skills to Develop
|
361 |
-
|
362 |
-
The following skills would strengthen your profile for this position:
|
363 |
-
|
364 |
-
"""
|
365 |
|
366 |
-
|
367 |
-
|
368 |
-
|
369 |
-
if
|
370 |
-
|
371 |
-
|
372 |
-
|
373 |
-
|
374 |
-
|
375 |
-
|
376 |
-
|
377 |
-
|
378 |
-
|
379 |
-
|
380 |
-
|
381 |
-
|
382 |
-
|
|
|
|
|
383 |
else:
|
384 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
385 |
|
386 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
387 |
|
388 |
-
|
389 |
-
|
390 |
-
|
391 |
-
|
392 |
-
|
393 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
394 |
|
395 |
-
|
396 |
-
|
397 |
-
|
398 |
-
|
399 |
-
|
400 |
-
|
401 |
-
|
402 |
-
|
403 |
-
|
404 |
-
|
405 |
-
|
406 |
-
|
407 |
-
|
408 |
-
|
409 |
-
|
410 |
-
|
411 |
-
|
412 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
413 |
|
414 |
-
|
415 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
416 |
|
417 |
-
|
418 |
-
|
419 |
-
|
420 |
-
|
421 |
-
|
422 |
-
return
|
423 |
|
424 |
-
#
|
425 |
-
|
|
|
426 |
|
427 |
-
#
|
428 |
-
|
429 |
|
430 |
-
#
|
431 |
-
st.
|
432 |
-
|
433 |
-
|
434 |
-
|
435 |
-
|
436 |
-
|
437 |
-
""")
|
438 |
|
439 |
-
#
|
440 |
-
|
|
|
|
|
|
|
|
|
441 |
|
442 |
-
|
443 |
-
|
444 |
-
|
445 |
|
446 |
-
|
447 |
-
|
448 |
-
|
|
|
|
|
|
|
|
|
449 |
|
450 |
-
|
451 |
-
|
452 |
-
|
453 |
-
|
454 |
-
|
455 |
-
if uploaded_file and job_title:
|
456 |
-
try:
|
457 |
-
# Show spinner while processing
|
458 |
-
with st.spinner("Analyzing resume..."):
|
459 |
-
# Extract text from PDF
|
460 |
-
text = extract_text_from_pdf(uploaded_file)
|
461 |
-
|
462 |
-
# Analyze resume
|
463 |
-
analysis_results = analyze_resume(text, job_title, models)
|
464 |
-
|
465 |
-
# Calculate missing skills
|
466 |
-
missing_skills = [skill for skill in job_descriptions[job_title]["skills"]
|
467 |
-
if skill not in analysis_results['found_skills']]
|
468 |
-
|
469 |
-
# Display results in tabs
|
470 |
-
tab1, tab2, tab3, tab4 = st.tabs([
|
471 |
-
"📊 Skills Match",
|
472 |
-
"📝 Resume Summary",
|
473 |
-
"🎯 Skills Gap",
|
474 |
-
"🚀 Career Advice"
|
475 |
-
])
|
476 |
-
|
477 |
-
with tab1:
|
478 |
-
# Create two columns
|
479 |
-
col1, col2 = st.columns(2)
|
480 |
-
|
481 |
-
with col1:
|
482 |
-
# Display matched skills
|
483 |
-
st.subheader("🎯 Matched Skills")
|
484 |
-
if analysis_results['found_skills']:
|
485 |
-
for skill in analysis_results['found_skills']:
|
486 |
-
# Show skill with proficiency level
|
487 |
-
level = analysis_results['skill_levels'].get(skill, 'intermediate')
|
488 |
-
level_emoji = "🟢" if level == 'advanced' else "🟡" if level == 'intermediate' else "🟠"
|
489 |
-
st.success(f"{level_emoji} {skill.title()} ({level.title()})")
|
490 |
-
|
491 |
-
# Calculate match percentage
|
492 |
-
match_percentage = len(analysis_results['found_skills']) / len(job_descriptions[job_title]["skills"]) * 100
|
493 |
-
st.metric("Skills Match", f"{match_percentage:.1f}%")
|
494 |
-
else:
|
495 |
-
st.warning("No direct skill matches found.")
|
496 |
-
|
497 |
-
with col2:
|
498 |
-
# Display semantic match score
|
499 |
-
st.subheader("💡 Semantic Match")
|
500 |
-
st.metric("Overall Match Score", f"{analysis_results['match_score']:.1f}%")
|
501 |
|
502 |
-
|
503 |
-
|
504 |
-
|
505 |
-
must_have_percentage = (must_have_count / len(must_have_skills)) * 100
|
506 |
|
507 |
-
|
508 |
-
|
509 |
-
|
|
|
510 |
|
511 |
-
#
|
512 |
-
|
513 |
-
st.info(f"**{analysis_results['seniority']}** ({analysis_results['years_experience']:.1f} years equivalent experience)")
|
514 |
-
st.write(job_descriptions[job_title]["seniority_levels"][analysis_results['seniority']])
|
515 |
-
|
516 |
-
with tab2:
|
517 |
-
# Display resume summary
|
518 |
-
st.subheader("📝 Resume Summary")
|
519 |
-
st.write(analysis_results['summary'])
|
520 |
-
|
521 |
-
# Display experience timeline
|
522 |
-
st.subheader("⏳ Experience Timeline")
|
523 |
-
if analysis_results['experiences']:
|
524 |
-
# Convert experiences to dataframe for display
|
525 |
-
exp_data = []
|
526 |
-
for exp in analysis_results['experiences']:
|
527 |
-
if 'start_date' in exp and 'end_date' in exp:
|
528 |
-
exp_data.append({
|
529 |
-
'Company': exp['company'],
|
530 |
-
'Role': exp['role'],
|
531 |
-
'Start Date': exp['start_date'].strftime('%b %Y') if exp['start_date'] else 'Unknown',
|
532 |
-
'End Date': exp['end_date'].strftime('%b %Y') if exp['end_date'] != datetime.now() else 'Present',
|
533 |
-
'Duration (months)': exp.get('duration_months', 'Unknown')
|
534 |
-
})
|
535 |
-
else:
|
536 |
-
exp_data.append({
|
537 |
-
'Company': exp['company'],
|
538 |
-
'Role': exp['role'],
|
539 |
-
'Duration': exp.get('duration', 'Unknown')
|
540 |
-
})
|
541 |
|
542 |
-
|
543 |
-
|
544 |
-
|
545 |
-
|
546 |
-
|
547 |
-
|
548 |
-
|
549 |
-
|
550 |
-
|
551 |
-
|
552 |
-
|
553 |
-
|
554 |
-
|
555 |
-
|
556 |
-
|
557 |
-
|
558 |
-
|
559 |
-
|
560 |
-
|
561 |
-
|
562 |
-
|
563 |
-
|
564 |
-
|
565 |
-
|
566 |
-
|
567 |
-
|
568 |
-
|
569 |
-
|
570 |
-
|
571 |
-
margin=dict(l=0, r=0, b=0, t=30)
|
572 |
-
)
|
573 |
-
|
574 |
-
st.plotly_chart(fig, use_container_width=True)
|
575 |
-
except Exception as e:
|
576 |
-
st.warning(f"Could not create timeline visualization: {e}")
|
577 |
-
else:
|
578 |
-
st.warning("No work experience data could be extracted.")
|
579 |
|
580 |
-
|
581 |
-
|
582 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
583 |
|
584 |
-
|
585 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
586 |
|
587 |
with col1:
|
588 |
-
|
589 |
-
|
590 |
-
|
591 |
-
|
|
|
|
|
|
|
|
|
|
|
592 |
else:
|
593 |
-
st.
|
594 |
|
595 |
with col2:
|
596 |
-
|
597 |
-
st.
|
598 |
|
599 |
-
|
600 |
-
|
601 |
-
|
602 |
|
603 |
-
|
604 |
-
|
605 |
-
|
606 |
-
|
607 |
-
|
608 |
-
|
609 |
-
|
610 |
-
|
611 |
-
|
612 |
-
|
613 |
-
|
614 |
-
|
615 |
-
|
616 |
-
if missing_nice_to_have:
|
617 |
-
st.warning("**Nice-to-Have Skills Missing:**")
|
618 |
-
for skill in missing_nice_to_have:
|
619 |
-
st.write(f"- {skill.title()}")
|
620 |
-
else:
|
621 |
-
st.success("Candidate has all the nice-to-have skills!")
|
622 |
-
|
623 |
-
# Display career trajectory
|
624 |
-
st.subheader("👨💼 Career Trajectory")
|
625 |
-
st.info(analysis_results['career_prediction'])
|
626 |
-
|
627 |
-
with tab4:
|
628 |
-
# Display career advice
|
629 |
-
st.subheader("🚀 Career Advice and Project Recommendations")
|
630 |
-
|
631 |
-
if st.button("Generate Career Advice"):
|
632 |
-
with st.spinner("Generating personalized career advice..."):
|
633 |
-
advice = generate_career_advice(text, job_title, analysis_results['found_skills'], missing_skills)
|
634 |
-
st.markdown(advice)
|
635 |
|
636 |
-
|
637 |
-
|
638 |
-
st.exception(e)
|
639 |
|
640 |
-
#
|
641 |
st.markdown("---")
|
642 |
-
st.markdown("
|
|
|
1 |
import streamlit as st
|
2 |
import pdfplumber
|
|
|
3 |
import pandas as pd
|
|
|
|
|
|
|
|
|
|
|
4 |
import numpy as np
|
5 |
+
import torch
|
6 |
+
import nltk
|
7 |
+
import faiss
|
8 |
+
import os
|
9 |
+
import tempfile
|
10 |
+
import base64
|
11 |
+
from rank_bm25 import BM25Okapi
|
12 |
+
from transformers import AutoModel, AutoTokenizer
|
13 |
+
from sentence_transformers import SentenceTransformer
|
14 |
+
from nltk.tokenize import word_tokenize, sent_tokenize
|
15 |
+
from tqdm import tqdm
|
16 |
+
import re
|
17 |
+
import io
|
18 |
+
import PyPDF2
|
19 |
+
from docx import Document
|
20 |
+
import csv
|
21 |
+
from explanation_generator import ExplanationGenerator
|
22 |
+
|
23 |
+
# Download NLTK resources
|
24 |
+
try:
|
25 |
+
nltk.data.find('tokenizers/punkt')
|
26 |
+
except LookupError:
|
27 |
+
nltk.download('punkt')
|
28 |
|
29 |
+
# Set page configuration
|
30 |
st.set_page_config(
|
31 |
page_title="Resume Screener & Skill Extractor",
|
32 |
page_icon="📄",
|
33 |
+
layout="wide",
|
34 |
+
initial_sidebar_state="expanded"
|
35 |
)
|
36 |
|
37 |
+
# Sidebar for model selection and weights
|
38 |
+
with st.sidebar:
|
39 |
+
st.title("Configuration")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
|
41 |
+
# Model selection
|
42 |
+
embedding_model_name = st.selectbox(
|
43 |
+
"Embedding Model",
|
44 |
+
["nvidia/NV-Embed-v2"],
|
45 |
+
index=0
|
46 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
47 |
|
48 |
+
explanation_model_name = st.selectbox(
|
49 |
+
"Explanation Model",
|
50 |
+
["Qwen/QwQ-32B"],
|
51 |
+
index=0
|
52 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
53 |
|
54 |
+
# Ranking weights
|
55 |
+
st.subheader("Ranking Weights")
|
56 |
+
semantic_weight = st.slider("Semantic Similarity Weight", 0.0, 1.0, 0.7, 0.1)
|
57 |
+
keyword_weight = 1.0 - semantic_weight
|
58 |
+
st.write(f"Keyword Weight: {keyword_weight:.1f}")
|
|
|
|
|
|
|
|
|
|
|
|
|
59 |
|
60 |
+
# Advanced options
|
61 |
+
st.subheader("Advanced Options")
|
62 |
+
top_k = st.number_input("Number of results to display", min_value=1, max_value=20, value=10, step=1)
|
63 |
+
use_explanation = st.checkbox("Generate Explanations", value=True)
|
64 |
+
use_faiss = st.checkbox("Use FAISS for fast search", value=True)
|
|
|
|
|
|
|
|
|
65 |
|
66 |
+
st.markdown("---")
|
67 |
+
st.markdown("### About")
|
68 |
+
st.markdown("This app uses a hybrid ranking system combining semantic similarity with keyword matching to find the most suitable resumes for a job position.")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
69 |
|
70 |
+
# Initialize session state variables
|
71 |
+
if 'resumes_uploaded' not in st.session_state:
|
72 |
+
st.session_state.resumes_uploaded = False
|
73 |
+
if 'job_description' not in st.session_state:
|
74 |
+
st.session_state.job_description = ""
|
75 |
+
if 'results' not in st.session_state:
|
76 |
+
st.session_state.results = []
|
77 |
+
if 'embedding_model' not in st.session_state:
|
78 |
+
st.session_state.embedding_model = None
|
79 |
+
if 'tokenizer' not in st.session_state:
|
80 |
+
st.session_state.tokenizer = None
|
81 |
+
if 'faiss_index' not in st.session_state:
|
82 |
+
st.session_state.faiss_index = None
|
83 |
+
if 'explanation_generator' not in st.session_state:
|
84 |
+
st.session_state.explanation_generator = None
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
85 |
|
86 |
+
class ResumeScreener:
|
87 |
+
def __init__(self, embedding_model_name="nvidia/NV-Embed-v2", explanation_model_name="Qwen/QwQ-32B"):
|
88 |
+
"""Initialize the ResumeScreener with the specified embedding model"""
|
89 |
+
self.embedding_model_name = embedding_model_name
|
90 |
+
self.explanation_model_name = explanation_model_name
|
91 |
+
self.model = None
|
92 |
+
self.tokenizer = None
|
93 |
+
self.faiss_index = None
|
94 |
+
self.embedding_size = None
|
95 |
+
self.explanation_generator = None
|
|
|
|
|
96 |
|
97 |
+
def load_model(self):
|
98 |
+
"""Load the embedding model from Hugging Face"""
|
99 |
+
if st.session_state.embedding_model is None:
|
100 |
+
with st.spinner(f"Loading model {self.embedding_model_name}..."):
|
101 |
+
try:
|
102 |
+
if "sentence-transformers" in self.embedding_model_name:
|
103 |
+
self.model = SentenceTransformer(self.embedding_model_name)
|
104 |
+
else:
|
105 |
+
self.tokenizer = AutoTokenizer.from_pretrained(self.embedding_model_name)
|
106 |
+
self.model = AutoModel.from_pretrained(self.embedding_model_name)
|
107 |
+
|
108 |
+
st.session_state.embedding_model = self.model
|
109 |
+
st.session_state.tokenizer = self.tokenizer
|
110 |
+
|
111 |
+
# Get embedding size
|
112 |
+
if "sentence-transformers" in self.embedding_model_name:
|
113 |
+
self.embedding_size = self.model.get_sentence_embedding_dimension()
|
114 |
+
else:
|
115 |
+
# For non-sentence-transformers, we'll determine this after first embedding
|
116 |
+
pass
|
117 |
+
|
118 |
+
except Exception as e:
|
119 |
+
st.error(f"Error loading model: {str(e)}")
|
120 |
+
st.stop()
|
121 |
+
else:
|
122 |
+
self.model = st.session_state.embedding_model
|
123 |
+
self.tokenizer = st.session_state.tokenizer
|
124 |
|
125 |
+
# Initialize explanation generator if needed
|
126 |
+
if use_explanation and st.session_state.explanation_generator is None:
|
127 |
+
st.session_state.explanation_generator = ExplanationGenerator(self.explanation_model_name)
|
128 |
+
self.explanation_generator = st.session_state.explanation_generator
|
129 |
+
elif use_explanation:
|
130 |
+
self.explanation_generator = st.session_state.explanation_generator
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
131 |
|
132 |
+
def extract_text_from_file(self, file, file_type):
|
133 |
+
"""Extract text from various file types"""
|
134 |
try:
|
135 |
+
if file_type == "pdf":
|
136 |
+
# Use pdfplumber for better text extraction
|
137 |
+
with pdfplumber.open(file) as pdf:
|
138 |
+
text = ""
|
139 |
+
for page in pdf.pages:
|
140 |
+
text += page.extract_text() or ""
|
141 |
+
|
142 |
+
# If pdfplumber fails, try PyPDF2 as fallback
|
143 |
+
if not text.strip():
|
144 |
+
reader = PyPDF2.PdfReader(file)
|
145 |
+
text = ""
|
146 |
+
for page_num in range(len(reader.pages)):
|
147 |
+
page = reader.pages[page_num]
|
148 |
+
text += page.extract_text() or ""
|
149 |
+
|
150 |
+
return text
|
151 |
+
|
152 |
+
elif file_type == "docx":
|
153 |
+
doc = Document(file)
|
154 |
+
return " ".join([paragraph.text for paragraph in doc.paragraphs])
|
155 |
+
|
156 |
+
elif file_type == "txt":
|
157 |
+
return file.read().decode("utf-8")
|
158 |
+
|
159 |
+
elif file_type == "csv":
|
160 |
+
csv_text = ""
|
161 |
+
csv_reader = csv.reader(io.StringIO(file.read().decode("utf-8")))
|
162 |
+
for row in csv_reader:
|
163 |
+
csv_text += " ".join(row) + " "
|
164 |
+
return csv_text
|
165 |
+
|
166 |
+
else:
|
167 |
+
st.error(f"Unsupported file type: {file_type}")
|
168 |
+
return ""
|
169 |
+
|
170 |
except Exception as e:
|
171 |
+
st.error(f"Error extracting text from file: {str(e)}")
|
172 |
+
return ""
|
|
|
|
|
|
|
|
|
|
|
173 |
|
174 |
+
def get_embedding(self, text):
|
175 |
+
"""Generate text embedding for a given text"""
|
176 |
+
if "sentence-transformers" in self.embedding_model_name:
|
177 |
+
# For sentence-transformers models
|
178 |
+
embedding = self.model.encode([text], convert_to_tensor=True, show_progress_bar=False)[0]
|
179 |
+
embedding_np = embedding.cpu().detach().numpy()
|
180 |
|
181 |
+
# Set embedding size if not set
|
182 |
+
if self.embedding_size is None:
|
183 |
+
self.embedding_size = embedding_np.shape[0]
|
184 |
+
|
185 |
+
return embedding_np
|
186 |
+
else:
|
187 |
+
# For HuggingFace models
|
188 |
+
inputs = self.tokenizer(text, return_tensors="pt", truncation=True, max_length=512, padding=True)
|
189 |
+
with torch.no_grad():
|
190 |
+
outputs = self.model(**inputs)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
191 |
|
192 |
+
# Use [CLS] token embedding or mean pooling based on model architecture
|
193 |
+
if hasattr(outputs, "last_hidden_state"):
|
194 |
+
# Mean pooling across token dimension
|
195 |
+
embeddings = outputs.last_hidden_state.mean(dim=1).squeeze()
|
196 |
+
embedding_np = embeddings.cpu().detach().numpy()
|
197 |
+
|
198 |
+
# Set embedding size if not set
|
199 |
+
if self.embedding_size is None:
|
200 |
+
self.embedding_size = embedding_np.shape[0]
|
201 |
+
|
202 |
+
return embedding_np
|
203 |
+
else:
|
204 |
+
# For models that return a specific embedding
|
205 |
+
embedding_np = outputs.cpu().detach().numpy()
|
206 |
+
|
207 |
+
# Set embedding size if not set
|
208 |
+
if self.embedding_size is None:
|
209 |
+
self.embedding_size = embedding_np.shape[0]
|
210 |
+
|
211 |
+
return embedding_np
|
212 |
|
213 |
+
def create_faiss_index(self, embeddings):
|
214 |
+
"""Create a FAISS index for fast similarity search"""
|
215 |
+
# Get the dimension of the embeddings
|
216 |
+
dimension = embeddings[0].shape[0]
|
|
|
|
|
|
|
|
|
217 |
|
218 |
+
# Create a FAISS index
|
219 |
+
index = faiss.IndexFlatIP(dimension) # Inner product for cosine similarity with normalized vectors
|
220 |
+
|
221 |
+
# Add normalized vectors to the index
|
222 |
+
embeddings_normalized = np.vstack([emb / np.linalg.norm(emb) for emb in embeddings])
|
223 |
+
index.add(embeddings_normalized)
|
224 |
+
|
225 |
+
return index
|
|
|
|
|
226 |
|
227 |
+
def query_faiss_index(self, index, query_embedding, k=10):
|
228 |
+
"""Query the FAISS index with a query embedding"""
|
229 |
+
# Normalize query embedding
|
230 |
+
query_embedding = query_embedding / np.linalg.norm(query_embedding)
|
231 |
+
|
232 |
+
# Reshape to a row vector if needed
|
233 |
+
if len(query_embedding.shape) == 1:
|
234 |
+
query_embedding = query_embedding.reshape(1, -1)
|
235 |
+
|
236 |
+
# Query the index
|
237 |
+
scores, indices = index.search(query_embedding, k)
|
238 |
+
|
239 |
+
return scores[0], indices[0] # Return the scores and indices as flat arrays
|
240 |
|
241 |
+
def calculate_bm25_scores(self, resume_texts, job_description):
|
242 |
+
"""Calculate BM25 scores for keyword matching"""
|
243 |
+
# Tokenize job description
|
244 |
+
job_tokens = word_tokenize(job_description.lower())
|
245 |
+
|
246 |
+
# Prepare corpus from resumes
|
247 |
+
corpus = [word_tokenize(resume.lower()) for resume in resume_texts]
|
248 |
+
|
249 |
+
# Initialize BM25
|
250 |
+
bm25 = BM25Okapi(corpus)
|
251 |
+
|
252 |
+
# Calculate scores
|
253 |
+
scores = bm25.get_scores(job_tokens)
|
254 |
+
|
255 |
+
return scores
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
256 |
|
257 |
+
def calculate_hybrid_scores(self, resume_texts, resume_embeddings, job_embedding, semantic_weight=0.7, use_faiss=True):
|
258 |
+
"""Calculate hybrid scores combining semantic similarity and BM25"""
|
259 |
+
# Calculate semantic similarity scores (cosine similarity)
|
260 |
+
if use_faiss and len(resume_embeddings) > 10:
|
261 |
+
# Create FAISS index if not already created
|
262 |
+
if st.session_state.faiss_index is None:
|
263 |
+
index = self.create_faiss_index(resume_embeddings)
|
264 |
+
st.session_state.faiss_index = index
|
265 |
+
else:
|
266 |
+
index = st.session_state.faiss_index
|
267 |
+
|
268 |
+
# Query index with job embedding
|
269 |
+
faiss_scores, faiss_indices = self.query_faiss_index(index, job_embedding, k=len(resume_embeddings))
|
270 |
+
|
271 |
+
# Create full semantic scores array
|
272 |
+
semantic_scores = np.zeros(len(resume_embeddings))
|
273 |
+
for i, idx in enumerate(faiss_indices):
|
274 |
+
if idx < len(resume_embeddings):
|
275 |
+
semantic_scores[idx] = faiss_scores[i]
|
276 |
else:
|
277 |
+
# Direct cosine similarity calculation for smaller datasets
|
278 |
+
semantic_scores = []
|
279 |
+
for emb in resume_embeddings:
|
280 |
+
# Normalize the embeddings for cosine similarity
|
281 |
+
emb_norm = emb / np.linalg.norm(emb)
|
282 |
+
job_emb_norm = job_embedding / np.linalg.norm(job_embedding)
|
283 |
+
|
284 |
+
# Calculate cosine similarity
|
285 |
+
similarity = np.dot(emb_norm, job_emb_norm)
|
286 |
+
semantic_scores.append(similarity)
|
287 |
+
|
288 |
+
# Calculate BM25 scores
|
289 |
+
bm25_scores = self.calculate_bm25_scores(resume_texts, job_description)
|
290 |
|
291 |
+
# Normalize BM25 scores
|
292 |
+
if max(bm25_scores) > 0:
|
293 |
+
bm25_scores = [score / max(bm25_scores) for score in bm25_scores]
|
294 |
+
|
295 |
+
# Calculate hybrid scores
|
296 |
+
keyword_weight = 1.0 - semantic_weight
|
297 |
+
hybrid_scores = [
|
298 |
+
(semantic_weight * sem_score) + (keyword_weight * bm25_score)
|
299 |
+
for sem_score, bm25_score in zip(semantic_scores, bm25_scores)
|
300 |
+
]
|
301 |
+
|
302 |
+
return hybrid_scores, semantic_scores, bm25_scores
|
303 |
|
304 |
+
def extract_skills(self, text, job_description):
|
305 |
+
"""Extract skills from text based on job description"""
|
306 |
+
# Simple skill extraction using regex and job description keywords
|
307 |
+
# In a real implementation, this could be enhanced with ML-based skill extraction
|
308 |
+
|
309 |
+
# Extract potential skills from job description (words 3 letters or longer)
|
310 |
+
potential_skills = set()
|
311 |
+
|
312 |
+
# Common skill-related phrases that might appear in job descriptions
|
313 |
+
skill_indicators = ["experience with", "knowledge of", "familiar with", "proficient in",
|
314 |
+
"skills in", "expertise in", "background in", "capabilities in",
|
315 |
+
"years of experience in", "understanding of", "trained in"]
|
316 |
+
|
317 |
+
# Extract skills from sentences containing skill indicators
|
318 |
+
sentences = sent_tokenize(job_description)
|
319 |
+
for sentence in sentences:
|
320 |
+
sentence_lower = sentence.lower()
|
321 |
+
for indicator in skill_indicators:
|
322 |
+
if indicator in sentence_lower:
|
323 |
+
# Extract words after the indicator, possibly until end of sentence or punctuation
|
324 |
+
skills_part = sentence_lower.split(indicator, 1)[1]
|
325 |
+
|
326 |
+
# Extract words, cleaning up symbols
|
327 |
+
words = re.findall(r'\b[a-zA-Z0-9+#/.]+\b', skills_part)
|
328 |
+
for word in words:
|
329 |
+
if len(word) >= 3: # Only consider words 3 letters or longer
|
330 |
+
potential_skills.add(word.lower())
|
331 |
+
|
332 |
+
# Add explicit skills - look for comma-separated lists or bullet points
|
333 |
+
skill_lists = re.findall(r'(?:skills|requirements|qualifications)[^\n.]*?:(.+?)(?:\n|$)', job_description.lower())
|
334 |
+
for skill_list in skill_lists:
|
335 |
+
words = re.findall(r'\b[a-zA-Z0-9+#/.]+\b', skill_list)
|
336 |
+
for word in words:
|
337 |
+
if len(word) >= 3:
|
338 |
+
potential_skills.add(word.lower())
|
339 |
+
|
340 |
+
# Add common tech skills if they appear in the job description
|
341 |
+
common_tech_skills = ["python", "java", "c++", "javascript", "sql", "react", "node.js", "typescript",
|
342 |
+
"html", "css", "aws", "azure", "gcp", "docker", "kubernetes", "terraform",
|
343 |
+
"git", "ci/cd", "agile", "scrum", "rest", "graphql", "ml", "ai", "data science"]
|
344 |
+
|
345 |
+
for skill in common_tech_skills:
|
346 |
+
if skill in job_description.lower():
|
347 |
+
potential_skills.add(skill)
|
348 |
+
|
349 |
+
# Find skills in the resume
|
350 |
+
matched_skills = []
|
351 |
+
for skill in potential_skills:
|
352 |
+
# Make it a word boundary search with regex
|
353 |
+
pattern = r'\b' + re.escape(skill) + r'\b'
|
354 |
+
matches = re.findall(pattern, text.lower())
|
355 |
+
if matches:
|
356 |
+
matched_skills.append(skill)
|
357 |
+
|
358 |
+
return list(set(matched_skills))
|
359 |
|
360 |
+
def extract_key_phrases(self, text, job_description):
|
361 |
+
"""Extract key phrases from text that match job description keywords"""
|
362 |
+
# Identify job skills first
|
363 |
+
skills = self.extract_skills(job_description, job_description)
|
364 |
+
|
365 |
+
# Extract sentences that contain skills
|
366 |
+
sentences = sent_tokenize(text)
|
367 |
+
skill_sentences = []
|
368 |
+
|
369 |
+
for sentence in sentences:
|
370 |
+
sentence_lower = sentence.lower()
|
371 |
+
for skill in skills:
|
372 |
+
if skill in sentence_lower:
|
373 |
+
# Append the sentence with the skill highlighted
|
374 |
+
highlighted = sentence.replace(skill, f"**{skill}**")
|
375 |
+
skill_sentences.append(highlighted)
|
376 |
+
break
|
377 |
+
|
378 |
+
# Get additional generic matches if we don't have enough skill sentences
|
379 |
+
if len(skill_sentences) < 5:
|
380 |
+
# Simple extraction based on job description keywords
|
381 |
+
job_tokens = set(word.lower() for word in word_tokenize(job_description) if len(word) > 3)
|
382 |
+
text_tokens = word_tokenize(text)
|
383 |
+
|
384 |
+
matches = []
|
385 |
+
for i, token in enumerate(text_tokens):
|
386 |
+
if token.lower() in job_tokens:
|
387 |
+
# Get a phrase context (5 words before and after)
|
388 |
+
start = max(0, i - 5)
|
389 |
+
end = min(len(text_tokens), i + 6)
|
390 |
+
phrase = " ".join(text_tokens[start:end])
|
391 |
+
matches.append(phrase)
|
392 |
+
|
393 |
+
# Add unique phrases to complement skill sentences
|
394 |
+
unique_matches = list(set(matches))
|
395 |
+
skill_sentences.extend(unique_matches[:5 - len(skill_sentences)])
|
396 |
+
|
397 |
+
# Return unique phrases, up to 5
|
398 |
+
return skill_sentences[:5]
|
399 |
|
400 |
+
def generate_explanation(self, resume_text, job_description, score, semantic_score, bm25_score, skills):
|
401 |
+
"""Generate explanation for why a resume was ranked highly using QwQ-32B model"""
|
402 |
+
# Use the explanation generator if available
|
403 |
+
if use_explanation and self.explanation_generator:
|
404 |
+
return self.explanation_generator.generate_explanation(
|
405 |
+
resume_text,
|
406 |
+
job_description,
|
407 |
+
score,
|
408 |
+
semantic_score,
|
409 |
+
bm25_score,
|
410 |
+
skills
|
411 |
+
)
|
412 |
+
else:
|
413 |
+
# Fallback to simple explanation
|
414 |
+
matching_phrases = self.extract_key_phrases(resume_text, job_description)
|
415 |
+
|
416 |
+
explanation = f"This resume received a score of {score:.2f}, with semantic relevance of {semantic_score:.2f} and keyword match of {bm25_score:.2f}. "
|
417 |
+
|
418 |
+
if skills:
|
419 |
+
explanation += f"The resume shows experience with key skills: {', '.join(skills[:5])}. "
|
420 |
+
|
421 |
+
if matching_phrases:
|
422 |
+
explanation += f"Key matching elements include: {matching_phrases[0]}"
|
423 |
+
|
424 |
+
return explanation
|
425 |
|
426 |
+
# Function to create a download link for dataframe as CSV
|
427 |
+
def get_csv_download_link(df, filename="results.csv"):
|
428 |
+
csv = df.to_csv(index=False)
|
429 |
+
b64 = base64.b64encode(csv.encode()).decode()
|
430 |
+
href = f'<a href="data:file/csv;base64,{b64}" download="{filename}">Download CSV</a>'
|
431 |
+
return href
|
432 |
|
433 |
+
# Main app UI
|
434 |
+
st.title("Resume Screener & Skill Extractor")
|
435 |
+
st.markdown("---")
|
436 |
|
437 |
+
# Initialize the resume screener
|
438 |
+
screener = ResumeScreener(embedding_model_name, explanation_model_name)
|
439 |
|
440 |
+
# Job description input
|
441 |
+
st.header("1. Enter Job Description")
|
442 |
+
job_description = st.text_area(
|
443 |
+
"Paste the job description or requirements here:",
|
444 |
+
height=200,
|
445 |
+
help="Enter the complete job description or a list of required skills and qualifications."
|
446 |
+
)
|
|
|
447 |
|
448 |
+
# Resume upload
|
449 |
+
st.header("2. Upload Resumes")
|
450 |
+
upload_option = st.radio(
|
451 |
+
"Choose upload method:",
|
452 |
+
["Upload Files", "Upload from Dataset"]
|
453 |
+
)
|
454 |
|
455 |
+
uploaded_files = []
|
456 |
+
resume_texts = []
|
457 |
+
file_names = []
|
458 |
|
459 |
+
if upload_option == "Upload Files":
|
460 |
+
uploaded_files = st.file_uploader(
|
461 |
+
"Upload resume files",
|
462 |
+
type=["pdf", "docx", "txt", "csv"],
|
463 |
+
accept_multiple_files=True,
|
464 |
+
help="Upload multiple resume files in PDF, DOCX, TXT, or CSV format."
|
465 |
+
)
|
466 |
|
467 |
+
if uploaded_files:
|
468 |
+
with st.spinner("Processing resumes..."):
|
469 |
+
for file in uploaded_files:
|
470 |
+
file_type = file.name.split('.')[-1].lower()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
471 |
|
472 |
+
with tempfile.NamedTemporaryFile(delete=False, suffix=f'.{file_type}') as tmp_file:
|
473 |
+
tmp_file.write(file.getvalue())
|
474 |
+
tmp_path = tmp_file.name
|
|
|
475 |
|
476 |
+
text = screener.extract_text_from_file(tmp_path, file_type)
|
477 |
+
if text:
|
478 |
+
resume_texts.append(text)
|
479 |
+
file_names.append(file.name)
|
480 |
|
481 |
+
# Clean up temp file
|
482 |
+
os.unlink(tmp_path)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
483 |
|
484 |
+
st.session_state.resumes_uploaded = True
|
485 |
+
st.success(f"Successfully processed {len(resume_texts)} resumes.")
|
486 |
+
else:
|
487 |
+
st.write("Upload from dataset feature will be implemented soon.")
|
488 |
+
# Here you would implement the connection to Hugging Face datasets
|
489 |
+
# Example pseudocode:
|
490 |
+
# dataset_name = st.text_input("Enter Hugging Face dataset name:")
|
491 |
+
# if st.button("Load Dataset"):
|
492 |
+
# with st.spinner("Loading dataset..."):
|
493 |
+
# dataset = load_dataset(dataset_name)
|
494 |
+
# resume_texts = [item["text"] for item in dataset]
|
495 |
+
# file_names = [f"resume_{i}.txt" for i in range(len(resume_texts))]
|
496 |
+
|
497 |
+
# Process button
|
498 |
+
if st.button("Find Top Candidates", disabled=not (job_description and resume_texts)):
|
499 |
+
with st.spinner("Loading embedding model..."):
|
500 |
+
screener.load_model()
|
501 |
+
|
502 |
+
with st.spinner("Processing job description and resumes..."):
|
503 |
+
# Get job description embedding
|
504 |
+
job_embedding = screener.get_embedding(job_description)
|
505 |
+
|
506 |
+
# Get resume embeddings
|
507 |
+
resume_embeddings = []
|
508 |
+
progress_bar = st.progress(0)
|
509 |
+
for i, text in enumerate(resume_texts):
|
510 |
+
embedding = screener.get_embedding(text)
|
511 |
+
resume_embeddings.append(embedding)
|
512 |
+
progress_bar.progress((i + 1) / len(resume_texts))
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
513 |
|
514 |
+
# Calculate hybrid scores
|
515 |
+
hybrid_scores, semantic_scores, bm25_scores = screener.calculate_hybrid_scores(
|
516 |
+
resume_texts,
|
517 |
+
resume_embeddings,
|
518 |
+
job_embedding,
|
519 |
+
semantic_weight,
|
520 |
+
use_faiss
|
521 |
+
)
|
522 |
+
|
523 |
+
# Get top candidates
|
524 |
+
combined_data = list(zip(file_names, resume_texts, hybrid_scores, semantic_scores, bm25_scores))
|
525 |
+
sorted_data = sorted(combined_data, key=lambda x: x[2], reverse=True)
|
526 |
+
top_candidates = sorted_data[:int(top_k)]
|
527 |
+
|
528 |
+
# Create results with explanations if enabled
|
529 |
+
results = []
|
530 |
+
for name, text, score, semantic_score, bm25_score in top_candidates:
|
531 |
+
# Extract skills for this resume
|
532 |
+
skills = screener.extract_skills(text, job_description)
|
533 |
+
|
534 |
+
result = {
|
535 |
+
"filename": name,
|
536 |
+
"score": score,
|
537 |
+
"semantic_score": semantic_score,
|
538 |
+
"keyword_score": bm25_score,
|
539 |
+
"text_preview": text[:500] + "...",
|
540 |
+
"matched_phrases": screener.extract_key_phrases(text, job_description),
|
541 |
+
"skills": skills
|
542 |
+
}
|
543 |
|
544 |
+
if use_explanation:
|
545 |
+
explanation = screener.generate_explanation(
|
546 |
+
text,
|
547 |
+
job_description,
|
548 |
+
score,
|
549 |
+
semantic_score,
|
550 |
+
bm25_score,
|
551 |
+
skills
|
552 |
+
)
|
553 |
+
result["explanation"] = explanation
|
554 |
+
else:
|
555 |
+
result["explanation"] = ""
|
556 |
+
|
557 |
+
results.append(result)
|
558 |
+
|
559 |
+
st.session_state.results = results
|
560 |
+
st.success(f"Found top {len(results)} candidates!")
|
561 |
+
|
562 |
+
# Display results
|
563 |
+
if st.session_state.results:
|
564 |
+
st.header("3. Results")
|
565 |
+
|
566 |
+
# Create a DataFrame for download
|
567 |
+
df_data = []
|
568 |
+
for result in st.session_state.results:
|
569 |
+
df_data.append({
|
570 |
+
"Filename": result["filename"],
|
571 |
+
"Score": result["score"],
|
572 |
+
"Semantic Score": result["semantic_score"],
|
573 |
+
"Keyword Score": result["keyword_score"],
|
574 |
+
"Skills": ", ".join(result["skills"]),
|
575 |
+
"Explanation": result["explanation"]
|
576 |
+
})
|
577 |
+
|
578 |
+
results_df = pd.DataFrame(df_data)
|
579 |
+
|
580 |
+
# Display download link
|
581 |
+
st.markdown(get_csv_download_link(results_df), unsafe_allow_html=True)
|
582 |
+
|
583 |
+
# Display individual results
|
584 |
+
for i, result in enumerate(st.session_state.results):
|
585 |
+
with st.expander(f"#{i+1}: {result['filename']} (Score: {result['score']:.4f})"):
|
586 |
+
col1, col2 = st.columns([1, 1])
|
587 |
|
588 |
with col1:
|
589 |
+
st.subheader("Scores")
|
590 |
+
st.write(f"Total Score: {result['score']:.4f}")
|
591 |
+
st.write(f"Semantic Score: {result['semantic_score']:.4f}")
|
592 |
+
st.write(f"Keyword Score: {result['keyword_score']:.4f}")
|
593 |
+
|
594 |
+
st.subheader("Matched Skills")
|
595 |
+
if result["skills"]:
|
596 |
+
for skill in result["skills"]:
|
597 |
+
st.write(f"• {skill}")
|
598 |
else:
|
599 |
+
st.write("No specific skills matched.")
|
600 |
|
601 |
with col2:
|
602 |
+
st.subheader("Explanation")
|
603 |
+
st.write(result["explanation"])
|
604 |
|
605 |
+
st.subheader("Key Matches")
|
606 |
+
for phrase in result["matched_phrases"]:
|
607 |
+
st.markdown(f"• {phrase}")
|
608 |
|
609 |
+
st.subheader("Resume Preview")
|
610 |
+
st.text_area("", result["text_preview"], height=150, disabled=True)
|
611 |
+
|
612 |
+
# Visualization of scores
|
613 |
+
st.subheader("Score Comparison")
|
614 |
+
|
615 |
+
# Prepare data for visualization
|
616 |
+
chart_data = pd.DataFrame({
|
617 |
+
"Resume": [result["filename"] for result in st.session_state.results],
|
618 |
+
"Semantic Score": [result["semantic_score"] for result in st.session_state.results],
|
619 |
+
"Keyword Score": [result["keyword_score"] for result in st.session_state.results],
|
620 |
+
"Total Score": [result["score"] for result in st.session_state.results]
|
621 |
+
})
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
622 |
|
623 |
+
# Display as a bar chart
|
624 |
+
st.bar_chart(chart_data.set_index("Resume")[["Total Score", "Semantic Score", "Keyword Score"]])
|
|
|
625 |
|
626 |
+
# Footer
|
627 |
st.markdown("---")
|
628 |
+
st.markdown("Built with Streamlit and Hugging Face models (NV-Embed-v2 and QwQ-32B)")
|
explanation_generator.py
ADDED
@@ -0,0 +1,178 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
"""
|
2 |
+
Explanation Generator Module
|
3 |
+
|
4 |
+
This module handles the generation of explanations for resume rankings
|
5 |
+
using the QwQ-32B model from Hugging Face.
|
6 |
+
"""
|
7 |
+
|
8 |
+
import torch
|
9 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
10 |
+
import os
|
11 |
+
import re
|
12 |
+
|
13 |
+
class ExplanationGenerator:
|
14 |
+
def __init__(self, model_name="Qwen/QwQ-32B"):
|
15 |
+
"""Initialize the explanation generator with the specified model"""
|
16 |
+
self.model_name = model_name
|
17 |
+
self.model = None
|
18 |
+
self.tokenizer = None
|
19 |
+
self.initialized = False
|
20 |
+
|
21 |
+
def load_model(self):
|
22 |
+
"""Load the model and tokenizer if not already loaded"""
|
23 |
+
if not self.initialized:
|
24 |
+
try:
|
25 |
+
# Check if we have enough VRAM for loading the model
|
26 |
+
if torch.cuda.is_available():
|
27 |
+
gpu_memory = torch.cuda.get_device_properties(0).total_memory
|
28 |
+
# QwQ-32B requires at least 32GB VRAM for full precision
|
29 |
+
if gpu_memory >= 32 * (1024**3): # 32 GB
|
30 |
+
device = "cuda"
|
31 |
+
else:
|
32 |
+
device = "cpu"
|
33 |
+
else:
|
34 |
+
device = "cpu"
|
35 |
+
|
36 |
+
# Load tokenizer
|
37 |
+
self.tokenizer = AutoTokenizer.from_pretrained(self.model_name)
|
38 |
+
|
39 |
+
# Load model based on available resources
|
40 |
+
if device == "cuda":
|
41 |
+
self.model = AutoModelForCausalLM.from_pretrained(
|
42 |
+
self.model_name,
|
43 |
+
torch_dtype=torch.bfloat16,
|
44 |
+
device_map="auto"
|
45 |
+
)
|
46 |
+
else:
|
47 |
+
# Fall back to a simpler template-based solution if we can't load the model
|
48 |
+
self.model = None
|
49 |
+
print("Warning: Loading QwQ-32B on CPU is not recommended. Using template-based explanations instead.")
|
50 |
+
|
51 |
+
self.initialized = True
|
52 |
+
except Exception as e:
|
53 |
+
print(f"Error loading QwQ-32B model: {str(e)}")
|
54 |
+
print("Falling back to template-based explanations.")
|
55 |
+
self.model = None
|
56 |
+
self.initialized = True
|
57 |
+
|
58 |
+
def generate_explanation(self, resume_text, job_description, score, semantic_score, keyword_score, skills):
|
59 |
+
"""Generate explanation for why a resume was ranked highly"""
|
60 |
+
# Check if we need to load the model
|
61 |
+
if not self.initialized:
|
62 |
+
self.load_model()
|
63 |
+
|
64 |
+
# If the model is loaded and available, use it for generating explanations
|
65 |
+
if self.model is not None:
|
66 |
+
try:
|
67 |
+
# Prepare prompt for QwQ-32B
|
68 |
+
prompt = self._create_prompt(resume_text, job_description, score, semantic_score, keyword_score, skills)
|
69 |
+
|
70 |
+
# Create messages for chat format
|
71 |
+
messages = [
|
72 |
+
{"role": "user", "content": prompt}
|
73 |
+
]
|
74 |
+
|
75 |
+
# Apply chat template
|
76 |
+
text = self.tokenizer.apply_chat_template(
|
77 |
+
messages,
|
78 |
+
tokenize=False,
|
79 |
+
add_generation_prompt=True
|
80 |
+
)
|
81 |
+
|
82 |
+
# Tokenize
|
83 |
+
inputs = self.tokenizer(text, return_tensors="pt").to(self.model.device)
|
84 |
+
|
85 |
+
# Generate response
|
86 |
+
output_ids = self.model.generate(
|
87 |
+
**inputs,
|
88 |
+
max_new_tokens=300,
|
89 |
+
temperature=0.6,
|
90 |
+
top_p=0.95,
|
91 |
+
top_k=30
|
92 |
+
)
|
93 |
+
|
94 |
+
# Decode the response
|
95 |
+
response = self.tokenizer.decode(output_ids[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
|
96 |
+
|
97 |
+
# Clean up the response
|
98 |
+
cleaned_response = self._clean_response(response)
|
99 |
+
|
100 |
+
return cleaned_response
|
101 |
+
|
102 |
+
except Exception as e:
|
103 |
+
print(f"Error generating explanation with QwQ-32B: {str(e)}")
|
104 |
+
# Fall back to template-based explanation
|
105 |
+
return self._generate_template_explanation(score, semantic_score, keyword_score, skills)
|
106 |
+
else:
|
107 |
+
# Use template-based explanation if model is not available
|
108 |
+
return self._generate_template_explanation(score, semantic_score, keyword_score, skills)
|
109 |
+
|
110 |
+
def _create_prompt(self, resume_text, job_description, score, semantic_score, keyword_score, skills):
|
111 |
+
"""Create a prompt for the explanation generation"""
|
112 |
+
# Use only the first 1000 characters of the resume to keep prompt size manageable
|
113 |
+
resume_excerpt = resume_text[:1000] + "..." if len(resume_text) > 1000 else resume_text
|
114 |
+
|
115 |
+
prompt = f"""You are an AI assistant helping a recruiter understand why a candidate's resume was matched with a job posting.
|
116 |
+
|
117 |
+
The resume has been assigned the following scores:
|
118 |
+
- Overall Match Score: {score:.2f} out of 1.0
|
119 |
+
- Semantic Relevance Score: {semantic_score:.2f} out of 1.0
|
120 |
+
- Keyword Match Score: {keyword_score:.2f} out of 1.0
|
121 |
+
|
122 |
+
The job description is:
|
123 |
+
```
|
124 |
+
{job_description}
|
125 |
+
```
|
126 |
+
|
127 |
+
Based on analysis, the resume contains these skills relevant to the job: {', '.join(skills)}
|
128 |
+
|
129 |
+
Resume excerpt:
|
130 |
+
```
|
131 |
+
{resume_excerpt}
|
132 |
+
```
|
133 |
+
|
134 |
+
Please provide a short explanation (3-5 sentences) of why this resume received these scores and how well it matches the job requirements. Focus on the relationship between the candidate's experience and the job requirements."""
|
135 |
+
|
136 |
+
return prompt
|
137 |
+
|
138 |
+
def _clean_response(self, response):
|
139 |
+
"""Clean the response from the model"""
|
140 |
+
# Remove any thinking or internal processing tokens
|
141 |
+
response = re.sub(r'<think>.*?</think>', '', response, flags=re.DOTALL)
|
142 |
+
|
143 |
+
# Limit to a reasonable length
|
144 |
+
if len(response) > 500:
|
145 |
+
sentences = response.split('.')
|
146 |
+
shortened = '.'.join(sentences[:5]) + '.'
|
147 |
+
return shortened
|
148 |
+
|
149 |
+
return response
|
150 |
+
|
151 |
+
def _generate_template_explanation(self, score, semantic_score, keyword_score, skills):
|
152 |
+
"""Generate a template-based explanation when the model is not available"""
|
153 |
+
# Simple template-based explanation
|
154 |
+
if score > 0.8:
|
155 |
+
quality = "excellent"
|
156 |
+
elif score > 0.6:
|
157 |
+
quality = "good"
|
158 |
+
elif score > 0.4:
|
159 |
+
quality = "moderate"
|
160 |
+
else:
|
161 |
+
quality = "limited"
|
162 |
+
|
163 |
+
explanation = f"This resume shows {quality} alignment with the job requirements, with an overall score of {score:.2f}. "
|
164 |
+
|
165 |
+
if semantic_score > keyword_score:
|
166 |
+
explanation += f"The candidate's experience demonstrates strong semantic relevance ({semantic_score:.2f}) to the position, though specific keyword matches ({keyword_score:.2f}) could be improved. "
|
167 |
+
else:
|
168 |
+
explanation += f"The resume contains many relevant keywords ({keyword_score:.2f}), but could benefit from better contextual alignment ({semantic_score:.2f}) with the job requirements. "
|
169 |
+
|
170 |
+
if skills:
|
171 |
+
if len(skills) > 3:
|
172 |
+
explanation += f"Key skills identified include {', '.join(skills[:3])}, and {len(skills)-3} others that match the job requirements."
|
173 |
+
else:
|
174 |
+
explanation += f"Key skills identified include {', '.join(skills)}."
|
175 |
+
else:
|
176 |
+
explanation += "No specific skills were identified that directly match the requirements."
|
177 |
+
|
178 |
+
return explanation
|
fix_dependencies.py
ADDED
@@ -0,0 +1,76 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#!/usr/bin/env python
|
2 |
+
"""
|
3 |
+
Dependency fixer for Resume Screener and Skill Extractor
|
4 |
+
This script ensures all dependencies are properly installed with compatible versions.
|
5 |
+
"""
|
6 |
+
|
7 |
+
import sys
|
8 |
+
import subprocess
|
9 |
+
import pkg_resources
|
10 |
+
import os
|
11 |
+
|
12 |
+
def install(package):
|
13 |
+
"""Install a package using pip"""
|
14 |
+
subprocess.check_call([sys.executable, "-m", "pip", "install", package])
|
15 |
+
|
16 |
+
def install_with_message(package, message=None):
|
17 |
+
"""Install a package with an optional message"""
|
18 |
+
if message:
|
19 |
+
print(f"\n{message}")
|
20 |
+
print(f"Installing {package}...")
|
21 |
+
install(package)
|
22 |
+
|
23 |
+
def main():
|
24 |
+
print("Running dependency fixer for Resume Screener and Skill Extractor...")
|
25 |
+
|
26 |
+
# Install core dependencies first
|
27 |
+
install_with_message("pip==23.1.2", "Upgrading pip to ensure compatibility")
|
28 |
+
install_with_message("setuptools==68.0.0", "Installing compatible setuptools")
|
29 |
+
|
30 |
+
# Check if we're in a Hugging Face Space
|
31 |
+
in_hf_space = os.environ.get("SPACE_ID") is not None
|
32 |
+
|
33 |
+
# Install key libraries with specific versions to ensure compatibility
|
34 |
+
dependencies = [
|
35 |
+
("streamlit==1.31.0", "Installing Streamlit for the web interface"),
|
36 |
+
("pdfplumber==0.10.1", "Installing PDF processing libraries"),
|
37 |
+
("PyPDF2==3.0.1", None),
|
38 |
+
("python-docx==1.0.1", None),
|
39 |
+
("rank-bm25==0.2.2", "Installing BM25 ranking library"),
|
40 |
+
("tqdm==4.66.1", "Installing progress bar utility"),
|
41 |
+
("faiss-cpu==1.7.4", "Installing FAISS for vector similarity search"),
|
42 |
+
("huggingface-hub==0.20.3", "Installing Hugging Face Hub"),
|
43 |
+
("transformers==4.36.2", "Installing Transformers"),
|
44 |
+
("sentence-transformers==2.2.2", "Installing Sentence Transformers"),
|
45 |
+
("torch==2.1.2", "Installing PyTorch"),
|
46 |
+
("nltk==3.8.1", "Installing NLTK for text processing"),
|
47 |
+
("pandas==2.1.3", "Installing data processing libraries"),
|
48 |
+
("numpy==1.24.3", None),
|
49 |
+
("plotly==5.18.0", "Installing visualization libraries"),
|
50 |
+
("spacy==3.7.2", "Installing spaCy for NLP"),
|
51 |
+
]
|
52 |
+
|
53 |
+
# Install all dependencies
|
54 |
+
for package, message in dependencies:
|
55 |
+
install_with_message(package, message)
|
56 |
+
|
57 |
+
# Download required NLTK data
|
58 |
+
print("\nDownloading NLTK data...")
|
59 |
+
install("nltk")
|
60 |
+
import nltk
|
61 |
+
nltk.download('punkt')
|
62 |
+
|
63 |
+
# Download spaCy model if not in a Hugging Face Space
|
64 |
+
# (Spaces should include this in the requirements.txt)
|
65 |
+
if not in_hf_space:
|
66 |
+
print("\nDownloading spaCy model...")
|
67 |
+
try:
|
68 |
+
subprocess.check_call([sys.executable, "-m", "spacy", "download", "en_core_web_sm"])
|
69 |
+
except:
|
70 |
+
install("https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.0/en_core_web_sm-3.7.0.tar.gz")
|
71 |
+
|
72 |
+
print("\nDependency installation complete!")
|
73 |
+
print("You can now run the Resume Screener with: streamlit run app.py")
|
74 |
+
|
75 |
+
if __name__ == "__main__":
|
76 |
+
main()
|
requirements.txt
CHANGED
@@ -1,22 +1,17 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
sentence-transformers==2.2.2
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
# PDF processing
|
9 |
-
pdfplumber==0.9.0
|
10 |
-
|
11 |
-
# Web UI
|
12 |
-
streamlit==1.22.0
|
13 |
-
|
14 |
-
# Data processing
|
15 |
-
pandas==1.5.3
|
16 |
numpy==1.24.3
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
# Utilities
|
21 |
-
nltk==3.8.1
|
22 |
-
scikit-learn==1.0.2
|
|
|
1 |
+
streamlit==1.31.0
|
2 |
+
pdfplumber==0.10.1
|
3 |
+
PyPDF2==3.0.1
|
4 |
+
python-docx==1.0.1
|
5 |
+
spacy==3.7.2
|
6 |
+
https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.0/en_core_web_sm-3.7.0.tar.gz
|
7 |
+
transformers==4.36.2
|
8 |
+
torch==2.1.2
|
9 |
+
nltk==3.8.1
|
10 |
+
faiss-cpu==1.7.4
|
11 |
+
rank-bm25==0.2.2
|
12 |
sentence-transformers==2.2.2
|
13 |
+
plotly==5.18.0
|
14 |
+
pandas==2.1.3
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
numpy==1.24.3
|
16 |
+
tqdm==4.66.1
|
17 |
+
huggingface-hub==0.20.3
|
|
|
|
|
|
|
|