root commited on
Commit
e232281
·
1 Parent(s): 72d33a9
Files changed (5) hide show
  1. README.md +61 -85
  2. app.py +569 -583
  3. explanation_generator.py +178 -0
  4. fix_dependencies.py +76 -0
  5. requirements.txt +15 -20
README.md CHANGED
@@ -12,102 +12,78 @@ license: mit
12
 
13
  # Resume Screener and Skill Extractor
14
 
15
- A comprehensive application for analyzing resumes, matching them to job positions, and providing personalized career advice.
16
 
17
  ## Features
18
 
19
- - **Skill Extraction**: Identifies relevant skills for specific job positions
20
- - **Resume Summarization**: Generates concise summaries of candidate backgrounds
21
- - **Skill Gap Analysis**: Identifies missing skills for target roles
22
- - **Career Advice**: Provides personalized recommendations for skill development and projects
23
- - **Experience Analysis**: Analyzes work history and career progression
24
- - **Fraud Detection**: Flags potential inconsistencies for verification
25
-
26
- ## Installation
27
-
28
- ### Fix Dependencies (Recommended)
29
-
30
- If you encounter any dependency issues, run the dependency fixer script:
31
-
32
- ```bash
33
- python fix_dependencies.py
34
- ```
35
-
36
- This will install compatible versions of all required packages.
37
-
38
- ### Manual Installation
39
-
40
- Alternatively, you can install the dependencies manually:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
 
42
  ```bash
 
 
43
  pip install -r requirements.txt
44
- python -m spacy download en_core_web_sm
45
- python -c "import nltk; nltk.download('punkt')"
46
- ```
47
-
48
- ## Common Issues and Solutions
49
-
50
- ### ImportError: cannot import name 'cached_download' from 'huggingface_hub'
51
-
52
- This occurs due to version incompatibility between huggingface_hub and sentence_transformers. To fix:
53
-
54
- 1. Run the dependency fixer script: `python fix_dependencies.py`
55
- 2. Or manually install compatible versions: `pip install huggingface-hub==0.14.1 sentence-transformers==2.2.2`
56
-
57
- ### PydanticImportError: `pydantic:ConstrainedStr` has been removed in V2
58
-
59
- This error occurs when using spaCy 3.5.0 with pydantic v2. To fix:
60
-
61
- 1. Run the dependency fixer script: `python fix_dependencies.py`
62
- 2. Or manually install a compatible pydantic version: `pip install "pydantic<2.0.0"`
63
-
64
- ## Running the Application
65
-
66
- ```bash
67
  streamlit run app.py
68
  ```
69
 
70
- ## Usage
71
-
72
- 1. Upload a resume in PDF format
73
- 2. Select a target job position
74
- 3. Review the analysis results in the different tabs
75
- 4. Click "Generate Personalized Career Advice" to get recommendations
76
-
77
- ## Dependencies
78
-
79
- - streamlit
80
- - pdfplumber
81
- - spacy
82
- - transformers
83
- - sentence-transformers
84
- - torch
85
- - nltk
86
- - plotly
87
- - pandas
88
- - numpy
89
- - matplotlib
90
-
91
- ## Supported Job Positions
92
-
93
- - Software Engineer
94
- - Interaction Designer
95
- - Data Scientist
96
-
97
- ## How it Works
98
 
99
- 1. Upload your resume (PDF or DOCX format)
100
- 2. Select the target job position
101
- 3. The app will analyze your resume and provide:
102
- - A list of matched skills with a match percentage
103
- - An AI-generated summary of your resume
104
- - Suggestions for skills you might want to develop
105
 
106
- ## Technologies Used
107
 
108
- - Streamlit for the web interface
109
- - Hugging Face Transformers for AI-powered text summarization
110
- - spaCy for natural language processing
111
- - PyPDF2 and python-docx for document parsing
112
 
113
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
12
 
13
  # Resume Screener and Skill Extractor
14
 
15
+ A Hugging Face Space application for efficiently screening resumes against job descriptions using a hybrid ranking approach that combines semantic similarity with keyword-based scoring.
16
 
17
  ## Features
18
 
19
+ - **Hybrid Resume Ranking**: Combines semantic similarity (via NV-Embed-v2) with keyword-based BM25 scoring
20
+ - **Skill Extraction**: Automatically identifies relevant skills from resumes based on job requirements
21
+ - **Fast Search**: Uses FAISS for efficient similarity search with large resume collections
22
+ - **Multi-format Support**: Processes PDFs, DOCX, TXT, and CSV files
23
+ - **Explanation Generation**: Provides explanations for why each resume was ranked highly
24
+ - **Visualization**: Displays comparative scores and key matches for easy analysis
25
+ - **Batch Processing**: Supports uploading multiple resumes simultaneously
26
+
27
+ ## How It Works
28
+
29
+ 1. **Input**: Provide a job description and upload resumes (PDF, DOCX, TXT, or CSV format)
30
+ 2. **Processing**: The system creates embeddings for both the job description and resumes using the NV-Embed-v2 model
31
+ 3. **Ranking**: Calculates a hybrid score based on:
32
+ - Semantic similarity (cosine similarity between embeddings)
33
+ - Keyword relevance (BM25 scoring)
34
+ 4. **Results**: Returns the top 10 most suitable resumes with:
35
+ - Overall score and individual component scores
36
+ - Matched skills and key phrases
37
+ - Explanations for why each resume was ranked highly
38
+
39
+ ## Technical Details
40
+
41
+ ### Models Used
42
+ - **NV-Embed-v2**: State-of-the-art embedding model for semantic similarity
43
+ - **QwQ-32B**: Used for generating explanations (simulated in the current version)
44
+
45
+ ### Libraries
46
+ - **FAISS**: Facebook AI Similarity Search for fast vector similarity search
47
+ - **rank_bm25**: Implementation of the BM25 algorithm for keyword-based scoring
48
+ - **Streamlit**: For the user interface
49
+ - **Hugging Face Transformers**: For accessing and using the models
50
+
51
+ ## Configuration Options
52
+
53
+ The sidebar provides several configuration options:
54
+ - **Model Selection**: Choose which embedding model to use
55
+ - **Ranking Weights**: Adjust the balance between semantic similarity and keyword matching
56
+ - **Results Count**: Set how many top results to display
57
+ - **FAISS Usage**: Toggle the use of FAISS for faster searching with large resume collections
58
+
59
+ ## Getting Started
60
+
61
+ ### Online Usage
62
+ 1. Visit the Hugging Face Space at [URL]
63
+ 2. Enter a job description
64
+ 3. Upload resumes (PDF, DOCX, TXT, or CSV)
65
+ 4. Click "Find Top Candidates"
66
+ 5. Review the results
67
+
68
+ ### Local Installation
69
 
70
  ```bash
71
+ git clone https://huggingface.co/spaces/[username]/Resume_Screener_and_Skill_Extractor
72
+ cd Resume_Screener_and_Skill_Extractor
73
  pip install -r requirements.txt
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
  streamlit run app.py
75
  ```
76
 
77
+ ## Future Enhancements
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
78
 
79
+ - Integration with Hugging Face datasets for loading resumes directly
80
+ - Enhanced skill extraction using more sophisticated NLP techniques
81
+ - Real-time explanation generation using QwQ-32B
82
+ - Support for additional file formats and languages
83
+ - Customizable scoring algorithms and weights
 
84
 
85
+ ## License
86
 
87
+ MIT License
 
 
 
88
 
89
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
app.py CHANGED
@@ -1,642 +1,628 @@
1
  import streamlit as st
2
  import pdfplumber
3
- import re
4
  import pandas as pd
5
- import matplotlib.pyplot as plt
6
- import torch
7
- from datetime import datetime
8
- import plotly.express as px
9
- import plotly.graph_objects as go
10
  import numpy as np
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
- # Display startup message
13
  st.set_page_config(
14
  page_title="Resume Screener & Skill Extractor",
15
  page_icon="📄",
16
- layout="wide"
 
17
  )
18
 
19
- st.title("📄 Resume Screener & Skill Extractor")
20
- startup_message = st.empty()
21
- startup_message.info("Loading dependencies and models... This may take a minute on first run.")
22
-
23
- # Import dependencies with fallbacks
24
- try:
25
- import spacy
26
- spacy_available = True
27
- except ImportError:
28
- spacy_available = False
29
- st.warning("spaCy is not available. Some features will be limited.")
30
-
31
- try:
32
- from transformers import pipeline
33
- transformers_available = True
34
- except ImportError:
35
- transformers_available = False
36
- st.warning("Transformers is not available. Summary generation will be limited.")
37
-
38
- try:
39
- import nltk
40
- from nltk.tokenize import word_tokenize
41
- nltk_available = True
42
 
43
- # Download required NLTK resources
44
- try:
45
- nltk.data.find('tokenizers/punkt')
46
- except LookupError:
47
- nltk.download('punkt')
48
- except ImportError:
49
- nltk_available = False
50
- st.warning("NLTK is not available. Some text processing features will be limited.")
51
-
52
- # Custom sentence-transformers fallback
53
- try:
54
- from sentence_transformers import SentenceTransformer
55
- try:
56
- from sentence_transformers import util as st_util
57
- sentence_transformers_available = True
58
- except ImportError:
59
- # Define our own utility functions
60
- class CustomSTUtil:
61
- @staticmethod
62
- def pytorch_cos_sim(a, b):
63
- if not isinstance(a, torch.Tensor):
64
- a = torch.tensor(a)
65
- if not isinstance(b, torch.Tensor):
66
- b = torch.tensor(b)
67
-
68
- if len(a.shape) == 1:
69
- a = a.unsqueeze(0)
70
- if len(b.shape) == 1:
71
- b = b.unsqueeze(0)
72
-
73
- a_norm = torch.nn.functional.normalize(a, p=2, dim=1)
74
- b_norm = torch.nn.functional.normalize(b, p=2, dim=1)
75
- return torch.mm(a_norm, b_norm.transpose(0, 1))
76
-
77
- st_util = CustomSTUtil()
78
- sentence_transformers_available = True
79
- except ImportError:
80
- sentence_transformers_available = False
81
- st.warning("Sentence Transformers is not available. Semantic matching will be disabled.")
82
-
83
- # Load models with exception handling
84
- @st.cache_resource
85
- def load_models():
86
- models = {}
87
 
88
- # Load spaCy if available
89
- if spacy_available:
90
- try:
91
- models['nlp'] = spacy.load("en_core_web_sm")
92
- except OSError:
93
- try:
94
- import subprocess
95
- import sys
96
- subprocess.check_call([sys.executable, "-m", "spacy", "download", "en_core_web_sm"])
97
- models['nlp'] = spacy.load("en_core_web_sm")
98
- except Exception as e:
99
- st.warning(f"Could not load spaCy model: {e}")
100
- models['nlp'] = None
101
- else:
102
- models['nlp'] = None
103
 
104
- # Load summarizer if transformers available
105
- if transformers_available:
106
- try:
107
- models['summarizer'] = pipeline("summarization", model="facebook/bart-large-cnn")
108
- except Exception as e:
109
- st.warning(f"Could not load summarizer model: {e}")
110
- # Simple fallback summarizer
111
- models['summarizer'] = lambda text, **kwargs: [{"summary_text": ". ".join(text.split(". ")[:5]) + "."}]
112
- else:
113
- # Simple fallback summarizer
114
- models['summarizer'] = lambda text, **kwargs: [{"summary_text": ". ".join(text.split(". ")[:5]) + "."}]
115
 
116
- # Load sentence transformer if available
117
- if sentence_transformers_available:
118
- try:
119
- models['sentence_model'] = SentenceTransformer('paraphrase-MiniLM-L6-v2')
120
- except Exception as e:
121
- st.warning(f"Could not load sentence transformer model: {e}")
122
- models['sentence_model'] = None
123
- else:
124
- models['sentence_model'] = None
125
 
126
- return models
127
-
128
- # Job descriptions dictionary
129
- job_descriptions = {
130
- "Software Engineer": {
131
- "skills": ["python", "java", "javascript", "sql", "algorithms", "data structures",
132
- "git", "cloud", "web development", "software development", "coding"],
133
- "description": "Looking for software engineers with strong programming skills and experience in software development.",
134
- "must_have": ["python", "git", "algorithms"],
135
- "nice_to_have": ["cloud", "java", "javascript"],
136
- "seniority_levels": {
137
- "Junior": "0-2 years of experience, familiar with basic programming concepts",
138
- "Mid-level": "3-5 years of experience, proficient in multiple languages, experience with system design",
139
- "Senior": "6+ years of experience, expert in software architecture, mentoring, and leading projects"
140
- }
141
- },
142
- "Interaction Designer": {
143
- "skills": ["ui", "ux", "user research", "wireframing", "prototyping", "figma",
144
- "sketch", "adobe", "design thinking", "interaction design"],
145
- "description": "Seeking interaction designers with expertise in user experience and interface design.",
146
- "must_have": ["ui", "ux", "prototyping"],
147
- "nice_to_have": ["figma", "sketch", "user research"],
148
- "seniority_levels": {
149
- "Junior": "0-2 years of experience, basic design skills, understanding of UX principles",
150
- "Mid-level": "3-5 years of experience, strong portfolio, experience with user research",
151
- "Senior": "6+ years of experience, leadership in design systems, driving design strategy"
152
- }
153
- },
154
- "Data Scientist": {
155
- "skills": ["python", "r", "statistics", "machine learning", "data analysis",
156
- "sql", "tensorflow", "pytorch", "pandas", "numpy"],
157
- "description": "Looking for data scientists with strong analytical and machine learning skills.",
158
- "must_have": ["python", "statistics", "machine learning"],
159
- "nice_to_have": ["tensorflow", "pytorch", "r"],
160
- "seniority_levels": {
161
- "Junior": "0-2 years of experience, basic knowledge of statistics and ML algorithms",
162
- "Mid-level": "3-5 years of experience, model development, feature engineering",
163
- "Senior": "6+ years of experience, advanced ML techniques, research experience"
164
- }
165
- }
166
- }
167
 
168
- # Core functionality
169
- def extract_text_from_pdf(pdf_file):
170
- """Extract text from PDF file."""
171
- text = ""
172
- try:
173
- with pdfplumber.open(pdf_file) as pdf:
174
- for page in pdf.pages:
175
- text += page.extract_text() or ""
176
- except Exception as e:
177
- st.error(f"Error extracting text from PDF: {e}")
178
- return text
179
-
180
- def extract_skills(text, job_title, nlp=None):
181
- """Extract skills from resume text."""
182
- found_skills = []
183
- required_skills = job_descriptions[job_title]["skills"]
184
-
185
- # Simple keyword matching (no NLP needed)
186
- for skill in required_skills:
187
- if skill.lower() in text.lower():
188
- found_skills.append(skill)
189
-
190
- return found_skills
191
 
192
- def extract_experience(text):
193
- """Extract work experience from resume text."""
194
- experiences = []
195
-
196
- # Define regex pattern for experiences
197
- experience_pattern = r"(?i)(\w+[\w\s&,.']+)\s*(?:[-|•]|\bat\b)\s*([A-Za-z][\w\s&,.']+)\s*(?:[-|•]|\bfrom\b)\s*(\d{4}(?:\s*[-–]\s*(?:\d{4}|present|current)))"
198
-
199
- matches = re.finditer(experience_pattern, text)
200
- for match in matches:
201
- company = match.group(1).strip()
202
- role = match.group(2).strip()
203
- duration = match.group(3).strip()
204
 
205
- # Process dates
206
- try:
207
- date_parts = re.split(r'[-–]', duration)
208
- start_year = int(date_parts[0].strip())
209
-
210
- if len(date_parts) > 1 and 'present' not in date_parts[1].lower() and 'current' not in date_parts[1].lower():
211
- end_year = int(date_parts[1].strip())
212
- end_date = datetime(end_year, 12, 31)
213
- else:
214
- end_year = datetime.now().year
215
- end_date = datetime.now()
216
-
217
- start_date = datetime(start_year, 1, 1)
218
- duration_months = (end_date.year - start_date.year) * 12 + (end_date.month - start_date.month)
 
 
 
 
 
 
 
 
 
 
 
 
 
219
 
220
- experiences.append({
221
- 'company': company,
222
- 'role': role,
223
- 'start_date': start_date,
224
- 'end_date': end_date,
225
- 'duration_months': duration_months
226
- })
227
- except:
228
- experiences.append({
229
- 'company': company,
230
- 'role': role,
231
- 'duration': duration
232
- })
233
-
234
- return experiences
235
-
236
- def analyze_resume(text, job_title, models):
237
- """Analyze resume text."""
238
- # Extract skills
239
- found_skills = extract_skills(text, job_title, models.get('nlp'))
240
 
241
- # Generate summary
242
- if models.get('summarizer'):
243
  try:
244
- summary = models['summarizer'](text[:3000], max_length=150, min_length=50, do_sample=False)[0]["summary_text"]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
245
  except Exception as e:
246
- st.warning(f"Error generating summary: {e}")
247
- summary = text[:500] + "..."
248
- else:
249
- summary = text[:500] + "..."
250
-
251
- # Extract work experience
252
- experiences = extract_experience(text)
253
 
254
- # Calculate semantic match score
255
- match_score = 0
256
- if models.get('sentence_model') and sentence_transformers_available:
257
- try:
258
- resume_embedding = models['sentence_model'].encode(text[:5000], convert_to_tensor=True)
259
- job_embedding = models['sentence_model'].encode(job_descriptions[job_title]["description"], convert_to_tensor=True)
260
 
261
- match_score = float(st_util.pytorch_cos_sim(resume_embedding, job_embedding)[0][0]) * 100
262
- except Exception as e:
263
- st.warning(f"Error calculating semantic match: {e}")
264
- else:
265
- # Fallback to keyword-based score
266
- match_score = (len(found_skills) / len(job_descriptions[job_title]["skills"])) * 100
267
-
268
- # Calculate seniority level
269
- years_exp = sum(exp.get('duration_months', 0) for exp in experiences if 'duration_months' in exp) / 12
270
-
271
- if years_exp < 3:
272
- seniority = "Junior"
273
- elif years_exp < 6:
274
- seniority = "Mid-level"
275
- else:
276
- seniority = "Senior"
277
-
278
- # Detect skill levels
279
- skill_levels = {}
280
- for skill in found_skills:
281
- # Default level
282
- skill_levels[skill] = "intermediate"
283
-
284
- # Look for advanced indicators
285
- advanced_patterns = [
286
- f"expert in {skill}",
287
- f"advanced {skill}",
288
- f"extensive experience with {skill}"
289
- ]
290
- if any(pattern in text.lower() for pattern in advanced_patterns):
291
- skill_levels[skill] = "advanced"
292
 
293
- # Look for basic indicators
294
- basic_patterns = [
295
- f"familiar with {skill}",
296
- f"basic knowledge of {skill}",
297
- f"introduced to {skill}"
298
- ]
299
- if any(pattern in text.lower() for pattern in basic_patterns):
300
- skill_levels[skill] = "basic"
 
 
 
 
 
 
 
 
 
 
 
 
301
 
302
- # Check for inconsistencies in timeline
303
- inconsistencies = []
304
- if len(experiences) >= 2:
305
- # Sort experiences by start date
306
- sorted_exps = sorted(
307
- [exp for exp in experiences if 'start_date' in exp],
308
- key=lambda x: x['start_date']
309
- )
310
 
311
- # Check for overlaps
312
- for i in range(len(sorted_exps) - 1):
313
- current = sorted_exps[i]
314
- next_exp = sorted_exps[i+1]
315
-
316
- if current['end_date'] > next_exp['start_date']:
317
- inconsistencies.append({
318
- 'type': 'overlap',
319
- 'description': f"Overlapping roles at {current['company']} and {next_exp['company']}"
320
- })
321
 
322
- # Generate a simple career prediction
323
- career_prediction = predict_career_path(seniority, job_title)
 
 
 
 
 
 
 
 
 
 
 
324
 
325
- return {
326
- 'found_skills': found_skills,
327
- 'skill_levels': skill_levels,
328
- 'summary': summary,
329
- 'experiences': experiences,
330
- 'match_score': match_score,
331
- 'seniority': seniority,
332
- 'years_experience': years_exp,
333
- 'inconsistencies': inconsistencies,
334
- 'career_prediction': career_prediction
335
- }
336
-
337
- def predict_career_path(seniority, job_title):
338
- """Generate a simple career prediction."""
339
- if seniority == "Junior":
340
- return f"Next potential role: Senior {job_title}"
341
- elif seniority == "Mid-level":
342
- roles = {
343
- "Software Engineer": "Team Lead, Technical Lead, or Engineering Manager",
344
- "Data Scientist": "Senior Data Scientist or Data Science Lead",
345
- "Interaction Designer": "Senior Designer or UX Lead"
346
- }
347
- return f"Next potential roles: {roles.get(job_title, f'Senior {job_title}')}"
348
- else: # Senior
349
- roles = {
350
- "Software Engineer": "Engineering Manager, Software Architect, or CTO",
351
- "Data Scientist": "Head of Data Science, ML Engineering Manager, or Chief Data Officer",
352
- "Interaction Designer": "Design Director, Head of UX, or VP of Design"
353
- }
354
- return f"Next potential roles: {roles.get(job_title, f'Director of {job_title}')}"
355
-
356
- def generate_career_advice(resume_text, job_title, found_skills, missing_skills):
357
- """Generate career advice based on resume analysis."""
358
- advice = f"""## Career Development Plan for {job_title}
359
-
360
- ### Skills to Develop
361
-
362
- The following skills would strengthen your profile for this position:
363
-
364
- """
365
 
366
- for skill in missing_skills:
367
- advice += f"- **{skill.title()}**: "
368
-
369
- if skill == "python":
370
- advice += "Take online courses like Coursera's Python for Everybody or follow tutorials on Real Python."
371
- elif skill == "java":
372
- advice += "Complete the Oracle Java Certification or contribute to open-source Java projects."
373
- elif skill == "javascript":
374
- advice += "Build interactive web applications using modern frameworks like React or Vue."
375
- elif skill == "cloud":
376
- advice += "Get hands-on experience with AWS, Azure, or GCP through their free tier offerings."
377
- elif "algorithm" in skill or "data structure" in skill:
378
- advice += "Practice on platforms like LeetCode or HackerRank and study algorithm design principles."
379
- elif "ui" in skill or "ux" in skill:
380
- advice += "Create a portfolio of design work and study interaction design principles."
381
- elif "machine learning" in skill:
382
- advice += "Take Andrew Ng's Machine Learning course on Coursera and work on ML projects with real datasets."
 
 
383
  else:
384
- advice += f"Research and practice this skill through online courses, tutorials, and hands-on projects."
 
 
 
 
 
 
 
 
 
 
 
 
385
 
386
- advice += "\n\n"
 
 
 
 
 
 
 
 
 
 
 
387
 
388
- advice += f"""
389
- ### Project Ideas
390
-
391
- Consider these projects to showcase your skills for a {job_title} position:
392
-
393
- """
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
394
 
395
- if job_title == "Software Engineer":
396
- advice += """
397
- 1. **Full-Stack Web Application**: Build a complete web app with frontend, backend, and database
398
- 2. **API Service**: Create a RESTful or GraphQL API with proper authentication and documentation
399
- 3. **Open Source Contribution**: Contribute to relevant open-source projects in your area of interest
400
- """
401
- elif job_title == "Data Scientist":
402
- advice += """
403
- 1. **Predictive Model**: Build and deploy a machine learning model that solves a real-world problem
404
- 2. **Data Dashboard**: Create an interactive visualization dashboard for complex datasets
405
- 3. **Natural Language Processing**: Develop a text classification or sentiment analysis project
406
- """
407
- elif job_title == "Interaction Designer":
408
- advice += """
409
- 1. **Design System**: Create a comprehensive design system with components and usage guidelines
410
- 2. **UX Case Study**: Document your design process for a real or fictional product improvement
411
- 3. **Interactive Prototype**: Design a fully functional prototype that demonstrates your interaction design skills
412
- """
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
413
 
414
- advice += """
415
- ### Learning Resources
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
416
 
417
- - **Online Platforms**: Coursera, Udemy, Pluralsight, LinkedIn Learning
418
- - **Practice Sites**: GitHub, HackerRank, LeetCode, Kaggle
419
- - **Communities**: Stack Overflow, Reddit programming communities, relevant Discord servers
420
- """
421
-
422
- return advice
423
 
424
- # Load models
425
- models = load_models()
 
426
 
427
- # Clear startup message
428
- startup_message.empty()
429
 
430
- # App description
431
- st.markdown("""
432
- This app helps recruiters analyze resumes by:
433
- - Extracting relevant skills for specific job positions
434
- - Generating a concise summary of the candidate's background
435
- - Identifying skill gaps for the selected role
436
- - Providing personalized career advice and project recommendations
437
- """)
438
 
439
- # Create two columns
440
- col1, col2 = st.columns([2, 1])
 
 
 
 
441
 
442
- with col1:
443
- # File upload
444
- uploaded_file = st.file_uploader("Upload Resume (PDF)", type=["pdf"])
445
 
446
- with col2:
447
- # Job selection
448
- job_title = st.selectbox("Select Job Position", list(job_descriptions.keys()))
 
 
 
 
449
 
450
- # Show job description
451
- if job_title:
452
- st.info(f"**Required Skills:**\n" +
453
- "\n".join([f"- {skill.title()}" for skill in job_descriptions[job_title]["skills"]]))
454
-
455
- if uploaded_file and job_title:
456
- try:
457
- # Show spinner while processing
458
- with st.spinner("Analyzing resume..."):
459
- # Extract text from PDF
460
- text = extract_text_from_pdf(uploaded_file)
461
-
462
- # Analyze resume
463
- analysis_results = analyze_resume(text, job_title, models)
464
-
465
- # Calculate missing skills
466
- missing_skills = [skill for skill in job_descriptions[job_title]["skills"]
467
- if skill not in analysis_results['found_skills']]
468
-
469
- # Display results in tabs
470
- tab1, tab2, tab3, tab4 = st.tabs([
471
- "📊 Skills Match",
472
- "📝 Resume Summary",
473
- "🎯 Skills Gap",
474
- "🚀 Career Advice"
475
- ])
476
-
477
- with tab1:
478
- # Create two columns
479
- col1, col2 = st.columns(2)
480
-
481
- with col1:
482
- # Display matched skills
483
- st.subheader("🎯 Matched Skills")
484
- if analysis_results['found_skills']:
485
- for skill in analysis_results['found_skills']:
486
- # Show skill with proficiency level
487
- level = analysis_results['skill_levels'].get(skill, 'intermediate')
488
- level_emoji = "🟢" if level == 'advanced' else "🟡" if level == 'intermediate' else "🟠"
489
- st.success(f"{level_emoji} {skill.title()} ({level.title()})")
490
-
491
- # Calculate match percentage
492
- match_percentage = len(analysis_results['found_skills']) / len(job_descriptions[job_title]["skills"]) * 100
493
- st.metric("Skills Match", f"{match_percentage:.1f}%")
494
- else:
495
- st.warning("No direct skill matches found.")
496
-
497
- with col2:
498
- # Display semantic match score
499
- st.subheader("💡 Semantic Match")
500
- st.metric("Overall Match Score", f"{analysis_results['match_score']:.1f}%")
501
 
502
- # Display must-have skills match
503
- must_have_skills = job_descriptions[job_title]["must_have"]
504
- must_have_count = sum(1 for skill in must_have_skills if skill in analysis_results['found_skills'])
505
- must_have_percentage = (must_have_count / len(must_have_skills)) * 100
506
 
507
- st.write("Must-have skills:")
508
- st.progress(must_have_percentage / 100)
509
- st.write(f"{must_have_count} out of {len(must_have_skills)} ({must_have_percentage:.1f}%)")
 
510
 
511
- # Professional level assessment
512
- st.subheader("🧠 Seniority Assessment")
513
- st.info(f"**{analysis_results['seniority']}** ({analysis_results['years_experience']:.1f} years equivalent experience)")
514
- st.write(job_descriptions[job_title]["seniority_levels"][analysis_results['seniority']])
515
-
516
- with tab2:
517
- # Display resume summary
518
- st.subheader("📝 Resume Summary")
519
- st.write(analysis_results['summary'])
520
-
521
- # Display experience timeline
522
- st.subheader("⏳ Experience Timeline")
523
- if analysis_results['experiences']:
524
- # Convert experiences to dataframe for display
525
- exp_data = []
526
- for exp in analysis_results['experiences']:
527
- if 'start_date' in exp and 'end_date' in exp:
528
- exp_data.append({
529
- 'Company': exp['company'],
530
- 'Role': exp['role'],
531
- 'Start Date': exp['start_date'].strftime('%b %Y') if exp['start_date'] else 'Unknown',
532
- 'End Date': exp['end_date'].strftime('%b %Y') if exp['end_date'] != datetime.now() else 'Present',
533
- 'Duration (months)': exp.get('duration_months', 'Unknown')
534
- })
535
- else:
536
- exp_data.append({
537
- 'Company': exp['company'],
538
- 'Role': exp['role'],
539
- 'Duration': exp.get('duration', 'Unknown')
540
- })
541
 
542
- if exp_data:
543
- exp_df = pd.DataFrame(exp_data)
544
- st.dataframe(exp_df)
545
-
546
- # Create a timeline visualization if dates are available
547
- timeline_data = [exp for exp in analysis_results['experiences'] if 'start_date' in exp and 'end_date' in exp]
548
- if timeline_data and len(timeline_data) > 0:
549
- try:
550
- # Sort by start date
551
- timeline_data = sorted(timeline_data, key=lambda x: x['start_date'])
552
-
553
- # Create figure
554
- fig = go.Figure()
555
-
556
- for i, exp in enumerate(timeline_data):
557
- fig.add_trace(go.Bar(
558
- x=[(exp['end_date'] - exp['start_date']).days / 30], # Duration in months
559
- y=[exp['company']],
560
- orientation='h',
561
- name=exp['role'],
562
- hovertext=f"{exp['role']} at {exp['company']}",
563
- marker=dict(color=px.colors.qualitative.Plotly[i % len(px.colors.qualitative.Plotly)])
564
- ))
565
-
566
- fig.update_layout(
567
- title="Career Timeline",
568
- xaxis_title="Duration (months)",
569
- yaxis_title="Company",
570
- height=400,
571
- margin=dict(l=0, r=0, b=0, t=30)
572
- )
573
-
574
- st.plotly_chart(fig, use_container_width=True)
575
- except Exception as e:
576
- st.warning(f"Could not create timeline visualization: {e}")
577
- else:
578
- st.warning("No work experience data could be extracted.")
579
 
580
- with tab3:
581
- # Display missing skills
582
- st.subheader("📌 Skills to Develop")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
583
 
584
- # Create two columns
585
- col1, col2 = st.columns(2)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
586
 
587
  with col1:
588
- # Missing skills
589
- if missing_skills:
590
- for skill in missing_skills:
591
- st.warning(f" {skill.title()}")
 
 
 
 
 
592
  else:
593
- st.success("Great! The candidate has all the required skills!")
594
 
595
  with col2:
596
- # Skills gap analysis
597
- st.subheader("🔍 Gap Analysis")
598
 
599
- # Show must-have skills that are missing
600
- missing_must_have = [skill for skill in job_descriptions[job_title]["must_have"]
601
- if skill not in analysis_results['found_skills']]
602
 
603
- if missing_must_have:
604
- st.error("**Critical Skills Missing:**")
605
- for skill in missing_must_have:
606
- st.write(f"- {skill.title()}")
607
-
608
- st.markdown("These are must-have skills for this position.")
609
- else:
610
- st.success("Candidate has all the must-have skills for this position!")
611
-
612
- # Show nice-to-have skills gap
613
- missing_nice_to_have = [skill for skill in job_descriptions[job_title]["nice_to_have"]
614
- if skill not in analysis_results['found_skills']]
615
-
616
- if missing_nice_to_have:
617
- st.warning("**Nice-to-Have Skills Missing:**")
618
- for skill in missing_nice_to_have:
619
- st.write(f"- {skill.title()}")
620
- else:
621
- st.success("Candidate has all the nice-to-have skills!")
622
-
623
- # Display career trajectory
624
- st.subheader("👨‍💼 Career Trajectory")
625
- st.info(analysis_results['career_prediction'])
626
-
627
- with tab4:
628
- # Display career advice
629
- st.subheader("🚀 Career Advice and Project Recommendations")
630
-
631
- if st.button("Generate Career Advice"):
632
- with st.spinner("Generating personalized career advice..."):
633
- advice = generate_career_advice(text, job_title, analysis_results['found_skills'], missing_skills)
634
- st.markdown(advice)
635
 
636
- except Exception as e:
637
- st.error(f"An error occurred while processing the resume: {str(e)}")
638
- st.exception(e)
639
 
640
- # Add footer
641
  st.markdown("---")
642
- st.markdown("Made with ❤️ using Streamlit and Hugging Face")
 
1
  import streamlit as st
2
  import pdfplumber
 
3
  import pandas as pd
 
 
 
 
 
4
  import numpy as np
5
+ import torch
6
+ import nltk
7
+ import faiss
8
+ import os
9
+ import tempfile
10
+ import base64
11
+ from rank_bm25 import BM25Okapi
12
+ from transformers import AutoModel, AutoTokenizer
13
+ from sentence_transformers import SentenceTransformer
14
+ from nltk.tokenize import word_tokenize, sent_tokenize
15
+ from tqdm import tqdm
16
+ import re
17
+ import io
18
+ import PyPDF2
19
+ from docx import Document
20
+ import csv
21
+ from explanation_generator import ExplanationGenerator
22
+
23
+ # Download NLTK resources
24
+ try:
25
+ nltk.data.find('tokenizers/punkt')
26
+ except LookupError:
27
+ nltk.download('punkt')
28
 
29
+ # Set page configuration
30
  st.set_page_config(
31
  page_title="Resume Screener & Skill Extractor",
32
  page_icon="📄",
33
+ layout="wide",
34
+ initial_sidebar_state="expanded"
35
  )
36
 
37
+ # Sidebar for model selection and weights
38
+ with st.sidebar:
39
+ st.title("Configuration")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
 
41
+ # Model selection
42
+ embedding_model_name = st.selectbox(
43
+ "Embedding Model",
44
+ ["nvidia/NV-Embed-v2"],
45
+ index=0
46
+ )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
 
48
+ explanation_model_name = st.selectbox(
49
+ "Explanation Model",
50
+ ["Qwen/QwQ-32B"],
51
+ index=0
52
+ )
 
 
 
 
 
 
 
 
 
 
53
 
54
+ # Ranking weights
55
+ st.subheader("Ranking Weights")
56
+ semantic_weight = st.slider("Semantic Similarity Weight", 0.0, 1.0, 0.7, 0.1)
57
+ keyword_weight = 1.0 - semantic_weight
58
+ st.write(f"Keyword Weight: {keyword_weight:.1f}")
 
 
 
 
 
 
59
 
60
+ # Advanced options
61
+ st.subheader("Advanced Options")
62
+ top_k = st.number_input("Number of results to display", min_value=1, max_value=20, value=10, step=1)
63
+ use_explanation = st.checkbox("Generate Explanations", value=True)
64
+ use_faiss = st.checkbox("Use FAISS for fast search", value=True)
 
 
 
 
65
 
66
+ st.markdown("---")
67
+ st.markdown("### About")
68
+ st.markdown("This app uses a hybrid ranking system combining semantic similarity with keyword matching to find the most suitable resumes for a job position.")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
69
 
70
+ # Initialize session state variables
71
+ if 'resumes_uploaded' not in st.session_state:
72
+ st.session_state.resumes_uploaded = False
73
+ if 'job_description' not in st.session_state:
74
+ st.session_state.job_description = ""
75
+ if 'results' not in st.session_state:
76
+ st.session_state.results = []
77
+ if 'embedding_model' not in st.session_state:
78
+ st.session_state.embedding_model = None
79
+ if 'tokenizer' not in st.session_state:
80
+ st.session_state.tokenizer = None
81
+ if 'faiss_index' not in st.session_state:
82
+ st.session_state.faiss_index = None
83
+ if 'explanation_generator' not in st.session_state:
84
+ st.session_state.explanation_generator = None
 
 
 
 
 
 
 
 
85
 
86
+ class ResumeScreener:
87
+ def __init__(self, embedding_model_name="nvidia/NV-Embed-v2", explanation_model_name="Qwen/QwQ-32B"):
88
+ """Initialize the ResumeScreener with the specified embedding model"""
89
+ self.embedding_model_name = embedding_model_name
90
+ self.explanation_model_name = explanation_model_name
91
+ self.model = None
92
+ self.tokenizer = None
93
+ self.faiss_index = None
94
+ self.embedding_size = None
95
+ self.explanation_generator = None
 
 
96
 
97
+ def load_model(self):
98
+ """Load the embedding model from Hugging Face"""
99
+ if st.session_state.embedding_model is None:
100
+ with st.spinner(f"Loading model {self.embedding_model_name}..."):
101
+ try:
102
+ if "sentence-transformers" in self.embedding_model_name:
103
+ self.model = SentenceTransformer(self.embedding_model_name)
104
+ else:
105
+ self.tokenizer = AutoTokenizer.from_pretrained(self.embedding_model_name)
106
+ self.model = AutoModel.from_pretrained(self.embedding_model_name)
107
+
108
+ st.session_state.embedding_model = self.model
109
+ st.session_state.tokenizer = self.tokenizer
110
+
111
+ # Get embedding size
112
+ if "sentence-transformers" in self.embedding_model_name:
113
+ self.embedding_size = self.model.get_sentence_embedding_dimension()
114
+ else:
115
+ # For non-sentence-transformers, we'll determine this after first embedding
116
+ pass
117
+
118
+ except Exception as e:
119
+ st.error(f"Error loading model: {str(e)}")
120
+ st.stop()
121
+ else:
122
+ self.model = st.session_state.embedding_model
123
+ self.tokenizer = st.session_state.tokenizer
124
 
125
+ # Initialize explanation generator if needed
126
+ if use_explanation and st.session_state.explanation_generator is None:
127
+ st.session_state.explanation_generator = ExplanationGenerator(self.explanation_model_name)
128
+ self.explanation_generator = st.session_state.explanation_generator
129
+ elif use_explanation:
130
+ self.explanation_generator = st.session_state.explanation_generator
 
 
 
 
 
 
 
 
 
 
 
 
 
 
131
 
132
+ def extract_text_from_file(self, file, file_type):
133
+ """Extract text from various file types"""
134
  try:
135
+ if file_type == "pdf":
136
+ # Use pdfplumber for better text extraction
137
+ with pdfplumber.open(file) as pdf:
138
+ text = ""
139
+ for page in pdf.pages:
140
+ text += page.extract_text() or ""
141
+
142
+ # If pdfplumber fails, try PyPDF2 as fallback
143
+ if not text.strip():
144
+ reader = PyPDF2.PdfReader(file)
145
+ text = ""
146
+ for page_num in range(len(reader.pages)):
147
+ page = reader.pages[page_num]
148
+ text += page.extract_text() or ""
149
+
150
+ return text
151
+
152
+ elif file_type == "docx":
153
+ doc = Document(file)
154
+ return " ".join([paragraph.text for paragraph in doc.paragraphs])
155
+
156
+ elif file_type == "txt":
157
+ return file.read().decode("utf-8")
158
+
159
+ elif file_type == "csv":
160
+ csv_text = ""
161
+ csv_reader = csv.reader(io.StringIO(file.read().decode("utf-8")))
162
+ for row in csv_reader:
163
+ csv_text += " ".join(row) + " "
164
+ return csv_text
165
+
166
+ else:
167
+ st.error(f"Unsupported file type: {file_type}")
168
+ return ""
169
+
170
  except Exception as e:
171
+ st.error(f"Error extracting text from file: {str(e)}")
172
+ return ""
 
 
 
 
 
173
 
174
+ def get_embedding(self, text):
175
+ """Generate text embedding for a given text"""
176
+ if "sentence-transformers" in self.embedding_model_name:
177
+ # For sentence-transformers models
178
+ embedding = self.model.encode([text], convert_to_tensor=True, show_progress_bar=False)[0]
179
+ embedding_np = embedding.cpu().detach().numpy()
180
 
181
+ # Set embedding size if not set
182
+ if self.embedding_size is None:
183
+ self.embedding_size = embedding_np.shape[0]
184
+
185
+ return embedding_np
186
+ else:
187
+ # For HuggingFace models
188
+ inputs = self.tokenizer(text, return_tensors="pt", truncation=True, max_length=512, padding=True)
189
+ with torch.no_grad():
190
+ outputs = self.model(**inputs)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
191
 
192
+ # Use [CLS] token embedding or mean pooling based on model architecture
193
+ if hasattr(outputs, "last_hidden_state"):
194
+ # Mean pooling across token dimension
195
+ embeddings = outputs.last_hidden_state.mean(dim=1).squeeze()
196
+ embedding_np = embeddings.cpu().detach().numpy()
197
+
198
+ # Set embedding size if not set
199
+ if self.embedding_size is None:
200
+ self.embedding_size = embedding_np.shape[0]
201
+
202
+ return embedding_np
203
+ else:
204
+ # For models that return a specific embedding
205
+ embedding_np = outputs.cpu().detach().numpy()
206
+
207
+ # Set embedding size if not set
208
+ if self.embedding_size is None:
209
+ self.embedding_size = embedding_np.shape[0]
210
+
211
+ return embedding_np
212
 
213
+ def create_faiss_index(self, embeddings):
214
+ """Create a FAISS index for fast similarity search"""
215
+ # Get the dimension of the embeddings
216
+ dimension = embeddings[0].shape[0]
 
 
 
 
217
 
218
+ # Create a FAISS index
219
+ index = faiss.IndexFlatIP(dimension) # Inner product for cosine similarity with normalized vectors
220
+
221
+ # Add normalized vectors to the index
222
+ embeddings_normalized = np.vstack([emb / np.linalg.norm(emb) for emb in embeddings])
223
+ index.add(embeddings_normalized)
224
+
225
+ return index
 
 
226
 
227
+ def query_faiss_index(self, index, query_embedding, k=10):
228
+ """Query the FAISS index with a query embedding"""
229
+ # Normalize query embedding
230
+ query_embedding = query_embedding / np.linalg.norm(query_embedding)
231
+
232
+ # Reshape to a row vector if needed
233
+ if len(query_embedding.shape) == 1:
234
+ query_embedding = query_embedding.reshape(1, -1)
235
+
236
+ # Query the index
237
+ scores, indices = index.search(query_embedding, k)
238
+
239
+ return scores[0], indices[0] # Return the scores and indices as flat arrays
240
 
241
+ def calculate_bm25_scores(self, resume_texts, job_description):
242
+ """Calculate BM25 scores for keyword matching"""
243
+ # Tokenize job description
244
+ job_tokens = word_tokenize(job_description.lower())
245
+
246
+ # Prepare corpus from resumes
247
+ corpus = [word_tokenize(resume.lower()) for resume in resume_texts]
248
+
249
+ # Initialize BM25
250
+ bm25 = BM25Okapi(corpus)
251
+
252
+ # Calculate scores
253
+ scores = bm25.get_scores(job_tokens)
254
+
255
+ return scores
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
256
 
257
+ def calculate_hybrid_scores(self, resume_texts, resume_embeddings, job_embedding, semantic_weight=0.7, use_faiss=True):
258
+ """Calculate hybrid scores combining semantic similarity and BM25"""
259
+ # Calculate semantic similarity scores (cosine similarity)
260
+ if use_faiss and len(resume_embeddings) > 10:
261
+ # Create FAISS index if not already created
262
+ if st.session_state.faiss_index is None:
263
+ index = self.create_faiss_index(resume_embeddings)
264
+ st.session_state.faiss_index = index
265
+ else:
266
+ index = st.session_state.faiss_index
267
+
268
+ # Query index with job embedding
269
+ faiss_scores, faiss_indices = self.query_faiss_index(index, job_embedding, k=len(resume_embeddings))
270
+
271
+ # Create full semantic scores array
272
+ semantic_scores = np.zeros(len(resume_embeddings))
273
+ for i, idx in enumerate(faiss_indices):
274
+ if idx < len(resume_embeddings):
275
+ semantic_scores[idx] = faiss_scores[i]
276
  else:
277
+ # Direct cosine similarity calculation for smaller datasets
278
+ semantic_scores = []
279
+ for emb in resume_embeddings:
280
+ # Normalize the embeddings for cosine similarity
281
+ emb_norm = emb / np.linalg.norm(emb)
282
+ job_emb_norm = job_embedding / np.linalg.norm(job_embedding)
283
+
284
+ # Calculate cosine similarity
285
+ similarity = np.dot(emb_norm, job_emb_norm)
286
+ semantic_scores.append(similarity)
287
+
288
+ # Calculate BM25 scores
289
+ bm25_scores = self.calculate_bm25_scores(resume_texts, job_description)
290
 
291
+ # Normalize BM25 scores
292
+ if max(bm25_scores) > 0:
293
+ bm25_scores = [score / max(bm25_scores) for score in bm25_scores]
294
+
295
+ # Calculate hybrid scores
296
+ keyword_weight = 1.0 - semantic_weight
297
+ hybrid_scores = [
298
+ (semantic_weight * sem_score) + (keyword_weight * bm25_score)
299
+ for sem_score, bm25_score in zip(semantic_scores, bm25_scores)
300
+ ]
301
+
302
+ return hybrid_scores, semantic_scores, bm25_scores
303
 
304
+ def extract_skills(self, text, job_description):
305
+ """Extract skills from text based on job description"""
306
+ # Simple skill extraction using regex and job description keywords
307
+ # In a real implementation, this could be enhanced with ML-based skill extraction
308
+
309
+ # Extract potential skills from job description (words 3 letters or longer)
310
+ potential_skills = set()
311
+
312
+ # Common skill-related phrases that might appear in job descriptions
313
+ skill_indicators = ["experience with", "knowledge of", "familiar with", "proficient in",
314
+ "skills in", "expertise in", "background in", "capabilities in",
315
+ "years of experience in", "understanding of", "trained in"]
316
+
317
+ # Extract skills from sentences containing skill indicators
318
+ sentences = sent_tokenize(job_description)
319
+ for sentence in sentences:
320
+ sentence_lower = sentence.lower()
321
+ for indicator in skill_indicators:
322
+ if indicator in sentence_lower:
323
+ # Extract words after the indicator, possibly until end of sentence or punctuation
324
+ skills_part = sentence_lower.split(indicator, 1)[1]
325
+
326
+ # Extract words, cleaning up symbols
327
+ words = re.findall(r'\b[a-zA-Z0-9+#/.]+\b', skills_part)
328
+ for word in words:
329
+ if len(word) >= 3: # Only consider words 3 letters or longer
330
+ potential_skills.add(word.lower())
331
+
332
+ # Add explicit skills - look for comma-separated lists or bullet points
333
+ skill_lists = re.findall(r'(?:skills|requirements|qualifications)[^\n.]*?:(.+?)(?:\n|$)', job_description.lower())
334
+ for skill_list in skill_lists:
335
+ words = re.findall(r'\b[a-zA-Z0-9+#/.]+\b', skill_list)
336
+ for word in words:
337
+ if len(word) >= 3:
338
+ potential_skills.add(word.lower())
339
+
340
+ # Add common tech skills if they appear in the job description
341
+ common_tech_skills = ["python", "java", "c++", "javascript", "sql", "react", "node.js", "typescript",
342
+ "html", "css", "aws", "azure", "gcp", "docker", "kubernetes", "terraform",
343
+ "git", "ci/cd", "agile", "scrum", "rest", "graphql", "ml", "ai", "data science"]
344
+
345
+ for skill in common_tech_skills:
346
+ if skill in job_description.lower():
347
+ potential_skills.add(skill)
348
+
349
+ # Find skills in the resume
350
+ matched_skills = []
351
+ for skill in potential_skills:
352
+ # Make it a word boundary search with regex
353
+ pattern = r'\b' + re.escape(skill) + r'\b'
354
+ matches = re.findall(pattern, text.lower())
355
+ if matches:
356
+ matched_skills.append(skill)
357
+
358
+ return list(set(matched_skills))
359
 
360
+ def extract_key_phrases(self, text, job_description):
361
+ """Extract key phrases from text that match job description keywords"""
362
+ # Identify job skills first
363
+ skills = self.extract_skills(job_description, job_description)
364
+
365
+ # Extract sentences that contain skills
366
+ sentences = sent_tokenize(text)
367
+ skill_sentences = []
368
+
369
+ for sentence in sentences:
370
+ sentence_lower = sentence.lower()
371
+ for skill in skills:
372
+ if skill in sentence_lower:
373
+ # Append the sentence with the skill highlighted
374
+ highlighted = sentence.replace(skill, f"**{skill}**")
375
+ skill_sentences.append(highlighted)
376
+ break
377
+
378
+ # Get additional generic matches if we don't have enough skill sentences
379
+ if len(skill_sentences) < 5:
380
+ # Simple extraction based on job description keywords
381
+ job_tokens = set(word.lower() for word in word_tokenize(job_description) if len(word) > 3)
382
+ text_tokens = word_tokenize(text)
383
+
384
+ matches = []
385
+ for i, token in enumerate(text_tokens):
386
+ if token.lower() in job_tokens:
387
+ # Get a phrase context (5 words before and after)
388
+ start = max(0, i - 5)
389
+ end = min(len(text_tokens), i + 6)
390
+ phrase = " ".join(text_tokens[start:end])
391
+ matches.append(phrase)
392
+
393
+ # Add unique phrases to complement skill sentences
394
+ unique_matches = list(set(matches))
395
+ skill_sentences.extend(unique_matches[:5 - len(skill_sentences)])
396
+
397
+ # Return unique phrases, up to 5
398
+ return skill_sentences[:5]
399
 
400
+ def generate_explanation(self, resume_text, job_description, score, semantic_score, bm25_score, skills):
401
+ """Generate explanation for why a resume was ranked highly using QwQ-32B model"""
402
+ # Use the explanation generator if available
403
+ if use_explanation and self.explanation_generator:
404
+ return self.explanation_generator.generate_explanation(
405
+ resume_text,
406
+ job_description,
407
+ score,
408
+ semantic_score,
409
+ bm25_score,
410
+ skills
411
+ )
412
+ else:
413
+ # Fallback to simple explanation
414
+ matching_phrases = self.extract_key_phrases(resume_text, job_description)
415
+
416
+ explanation = f"This resume received a score of {score:.2f}, with semantic relevance of {semantic_score:.2f} and keyword match of {bm25_score:.2f}. "
417
+
418
+ if skills:
419
+ explanation += f"The resume shows experience with key skills: {', '.join(skills[:5])}. "
420
+
421
+ if matching_phrases:
422
+ explanation += f"Key matching elements include: {matching_phrases[0]}"
423
+
424
+ return explanation
425
 
426
+ # Function to create a download link for dataframe as CSV
427
+ def get_csv_download_link(df, filename="results.csv"):
428
+ csv = df.to_csv(index=False)
429
+ b64 = base64.b64encode(csv.encode()).decode()
430
+ href = f'<a href="data:file/csv;base64,{b64}" download="{filename}">Download CSV</a>'
431
+ return href
432
 
433
+ # Main app UI
434
+ st.title("Resume Screener & Skill Extractor")
435
+ st.markdown("---")
436
 
437
+ # Initialize the resume screener
438
+ screener = ResumeScreener(embedding_model_name, explanation_model_name)
439
 
440
+ # Job description input
441
+ st.header("1. Enter Job Description")
442
+ job_description = st.text_area(
443
+ "Paste the job description or requirements here:",
444
+ height=200,
445
+ help="Enter the complete job description or a list of required skills and qualifications."
446
+ )
 
447
 
448
+ # Resume upload
449
+ st.header("2. Upload Resumes")
450
+ upload_option = st.radio(
451
+ "Choose upload method:",
452
+ ["Upload Files", "Upload from Dataset"]
453
+ )
454
 
455
+ uploaded_files = []
456
+ resume_texts = []
457
+ file_names = []
458
 
459
+ if upload_option == "Upload Files":
460
+ uploaded_files = st.file_uploader(
461
+ "Upload resume files",
462
+ type=["pdf", "docx", "txt", "csv"],
463
+ accept_multiple_files=True,
464
+ help="Upload multiple resume files in PDF, DOCX, TXT, or CSV format."
465
+ )
466
 
467
+ if uploaded_files:
468
+ with st.spinner("Processing resumes..."):
469
+ for file in uploaded_files:
470
+ file_type = file.name.split('.')[-1].lower()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
471
 
472
+ with tempfile.NamedTemporaryFile(delete=False, suffix=f'.{file_type}') as tmp_file:
473
+ tmp_file.write(file.getvalue())
474
+ tmp_path = tmp_file.name
 
475
 
476
+ text = screener.extract_text_from_file(tmp_path, file_type)
477
+ if text:
478
+ resume_texts.append(text)
479
+ file_names.append(file.name)
480
 
481
+ # Clean up temp file
482
+ os.unlink(tmp_path)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
483
 
484
+ st.session_state.resumes_uploaded = True
485
+ st.success(f"Successfully processed {len(resume_texts)} resumes.")
486
+ else:
487
+ st.write("Upload from dataset feature will be implemented soon.")
488
+ # Here you would implement the connection to Hugging Face datasets
489
+ # Example pseudocode:
490
+ # dataset_name = st.text_input("Enter Hugging Face dataset name:")
491
+ # if st.button("Load Dataset"):
492
+ # with st.spinner("Loading dataset..."):
493
+ # dataset = load_dataset(dataset_name)
494
+ # resume_texts = [item["text"] for item in dataset]
495
+ # file_names = [f"resume_{i}.txt" for i in range(len(resume_texts))]
496
+
497
+ # Process button
498
+ if st.button("Find Top Candidates", disabled=not (job_description and resume_texts)):
499
+ with st.spinner("Loading embedding model..."):
500
+ screener.load_model()
501
+
502
+ with st.spinner("Processing job description and resumes..."):
503
+ # Get job description embedding
504
+ job_embedding = screener.get_embedding(job_description)
505
+
506
+ # Get resume embeddings
507
+ resume_embeddings = []
508
+ progress_bar = st.progress(0)
509
+ for i, text in enumerate(resume_texts):
510
+ embedding = screener.get_embedding(text)
511
+ resume_embeddings.append(embedding)
512
+ progress_bar.progress((i + 1) / len(resume_texts))
 
 
 
 
 
 
 
 
513
 
514
+ # Calculate hybrid scores
515
+ hybrid_scores, semantic_scores, bm25_scores = screener.calculate_hybrid_scores(
516
+ resume_texts,
517
+ resume_embeddings,
518
+ job_embedding,
519
+ semantic_weight,
520
+ use_faiss
521
+ )
522
+
523
+ # Get top candidates
524
+ combined_data = list(zip(file_names, resume_texts, hybrid_scores, semantic_scores, bm25_scores))
525
+ sorted_data = sorted(combined_data, key=lambda x: x[2], reverse=True)
526
+ top_candidates = sorted_data[:int(top_k)]
527
+
528
+ # Create results with explanations if enabled
529
+ results = []
530
+ for name, text, score, semantic_score, bm25_score in top_candidates:
531
+ # Extract skills for this resume
532
+ skills = screener.extract_skills(text, job_description)
533
+
534
+ result = {
535
+ "filename": name,
536
+ "score": score,
537
+ "semantic_score": semantic_score,
538
+ "keyword_score": bm25_score,
539
+ "text_preview": text[:500] + "...",
540
+ "matched_phrases": screener.extract_key_phrases(text, job_description),
541
+ "skills": skills
542
+ }
543
 
544
+ if use_explanation:
545
+ explanation = screener.generate_explanation(
546
+ text,
547
+ job_description,
548
+ score,
549
+ semantic_score,
550
+ bm25_score,
551
+ skills
552
+ )
553
+ result["explanation"] = explanation
554
+ else:
555
+ result["explanation"] = ""
556
+
557
+ results.append(result)
558
+
559
+ st.session_state.results = results
560
+ st.success(f"Found top {len(results)} candidates!")
561
+
562
+ # Display results
563
+ if st.session_state.results:
564
+ st.header("3. Results")
565
+
566
+ # Create a DataFrame for download
567
+ df_data = []
568
+ for result in st.session_state.results:
569
+ df_data.append({
570
+ "Filename": result["filename"],
571
+ "Score": result["score"],
572
+ "Semantic Score": result["semantic_score"],
573
+ "Keyword Score": result["keyword_score"],
574
+ "Skills": ", ".join(result["skills"]),
575
+ "Explanation": result["explanation"]
576
+ })
577
+
578
+ results_df = pd.DataFrame(df_data)
579
+
580
+ # Display download link
581
+ st.markdown(get_csv_download_link(results_df), unsafe_allow_html=True)
582
+
583
+ # Display individual results
584
+ for i, result in enumerate(st.session_state.results):
585
+ with st.expander(f"#{i+1}: {result['filename']} (Score: {result['score']:.4f})"):
586
+ col1, col2 = st.columns([1, 1])
587
 
588
  with col1:
589
+ st.subheader("Scores")
590
+ st.write(f"Total Score: {result['score']:.4f}")
591
+ st.write(f"Semantic Score: {result['semantic_score']:.4f}")
592
+ st.write(f"Keyword Score: {result['keyword_score']:.4f}")
593
+
594
+ st.subheader("Matched Skills")
595
+ if result["skills"]:
596
+ for skill in result["skills"]:
597
+ st.write(f"• {skill}")
598
  else:
599
+ st.write("No specific skills matched.")
600
 
601
  with col2:
602
+ st.subheader("Explanation")
603
+ st.write(result["explanation"])
604
 
605
+ st.subheader("Key Matches")
606
+ for phrase in result["matched_phrases"]:
607
+ st.markdown(f"• {phrase}")
608
 
609
+ st.subheader("Resume Preview")
610
+ st.text_area("", result["text_preview"], height=150, disabled=True)
611
+
612
+ # Visualization of scores
613
+ st.subheader("Score Comparison")
614
+
615
+ # Prepare data for visualization
616
+ chart_data = pd.DataFrame({
617
+ "Resume": [result["filename"] for result in st.session_state.results],
618
+ "Semantic Score": [result["semantic_score"] for result in st.session_state.results],
619
+ "Keyword Score": [result["keyword_score"] for result in st.session_state.results],
620
+ "Total Score": [result["score"] for result in st.session_state.results]
621
+ })
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
622
 
623
+ # Display as a bar chart
624
+ st.bar_chart(chart_data.set_index("Resume")[["Total Score", "Semantic Score", "Keyword Score"]])
 
625
 
626
+ # Footer
627
  st.markdown("---")
628
+ st.markdown("Built with Streamlit and Hugging Face models (NV-Embed-v2 and QwQ-32B)")
explanation_generator.py ADDED
@@ -0,0 +1,178 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Explanation Generator Module
3
+
4
+ This module handles the generation of explanations for resume rankings
5
+ using the QwQ-32B model from Hugging Face.
6
+ """
7
+
8
+ import torch
9
+ from transformers import AutoModelForCausalLM, AutoTokenizer
10
+ import os
11
+ import re
12
+
13
+ class ExplanationGenerator:
14
+ def __init__(self, model_name="Qwen/QwQ-32B"):
15
+ """Initialize the explanation generator with the specified model"""
16
+ self.model_name = model_name
17
+ self.model = None
18
+ self.tokenizer = None
19
+ self.initialized = False
20
+
21
+ def load_model(self):
22
+ """Load the model and tokenizer if not already loaded"""
23
+ if not self.initialized:
24
+ try:
25
+ # Check if we have enough VRAM for loading the model
26
+ if torch.cuda.is_available():
27
+ gpu_memory = torch.cuda.get_device_properties(0).total_memory
28
+ # QwQ-32B requires at least 32GB VRAM for full precision
29
+ if gpu_memory >= 32 * (1024**3): # 32 GB
30
+ device = "cuda"
31
+ else:
32
+ device = "cpu"
33
+ else:
34
+ device = "cpu"
35
+
36
+ # Load tokenizer
37
+ self.tokenizer = AutoTokenizer.from_pretrained(self.model_name)
38
+
39
+ # Load model based on available resources
40
+ if device == "cuda":
41
+ self.model = AutoModelForCausalLM.from_pretrained(
42
+ self.model_name,
43
+ torch_dtype=torch.bfloat16,
44
+ device_map="auto"
45
+ )
46
+ else:
47
+ # Fall back to a simpler template-based solution if we can't load the model
48
+ self.model = None
49
+ print("Warning: Loading QwQ-32B on CPU is not recommended. Using template-based explanations instead.")
50
+
51
+ self.initialized = True
52
+ except Exception as e:
53
+ print(f"Error loading QwQ-32B model: {str(e)}")
54
+ print("Falling back to template-based explanations.")
55
+ self.model = None
56
+ self.initialized = True
57
+
58
+ def generate_explanation(self, resume_text, job_description, score, semantic_score, keyword_score, skills):
59
+ """Generate explanation for why a resume was ranked highly"""
60
+ # Check if we need to load the model
61
+ if not self.initialized:
62
+ self.load_model()
63
+
64
+ # If the model is loaded and available, use it for generating explanations
65
+ if self.model is not None:
66
+ try:
67
+ # Prepare prompt for QwQ-32B
68
+ prompt = self._create_prompt(resume_text, job_description, score, semantic_score, keyword_score, skills)
69
+
70
+ # Create messages for chat format
71
+ messages = [
72
+ {"role": "user", "content": prompt}
73
+ ]
74
+
75
+ # Apply chat template
76
+ text = self.tokenizer.apply_chat_template(
77
+ messages,
78
+ tokenize=False,
79
+ add_generation_prompt=True
80
+ )
81
+
82
+ # Tokenize
83
+ inputs = self.tokenizer(text, return_tensors="pt").to(self.model.device)
84
+
85
+ # Generate response
86
+ output_ids = self.model.generate(
87
+ **inputs,
88
+ max_new_tokens=300,
89
+ temperature=0.6,
90
+ top_p=0.95,
91
+ top_k=30
92
+ )
93
+
94
+ # Decode the response
95
+ response = self.tokenizer.decode(output_ids[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
96
+
97
+ # Clean up the response
98
+ cleaned_response = self._clean_response(response)
99
+
100
+ return cleaned_response
101
+
102
+ except Exception as e:
103
+ print(f"Error generating explanation with QwQ-32B: {str(e)}")
104
+ # Fall back to template-based explanation
105
+ return self._generate_template_explanation(score, semantic_score, keyword_score, skills)
106
+ else:
107
+ # Use template-based explanation if model is not available
108
+ return self._generate_template_explanation(score, semantic_score, keyword_score, skills)
109
+
110
+ def _create_prompt(self, resume_text, job_description, score, semantic_score, keyword_score, skills):
111
+ """Create a prompt for the explanation generation"""
112
+ # Use only the first 1000 characters of the resume to keep prompt size manageable
113
+ resume_excerpt = resume_text[:1000] + "..." if len(resume_text) > 1000 else resume_text
114
+
115
+ prompt = f"""You are an AI assistant helping a recruiter understand why a candidate's resume was matched with a job posting.
116
+
117
+ The resume has been assigned the following scores:
118
+ - Overall Match Score: {score:.2f} out of 1.0
119
+ - Semantic Relevance Score: {semantic_score:.2f} out of 1.0
120
+ - Keyword Match Score: {keyword_score:.2f} out of 1.0
121
+
122
+ The job description is:
123
+ ```
124
+ {job_description}
125
+ ```
126
+
127
+ Based on analysis, the resume contains these skills relevant to the job: {', '.join(skills)}
128
+
129
+ Resume excerpt:
130
+ ```
131
+ {resume_excerpt}
132
+ ```
133
+
134
+ Please provide a short explanation (3-5 sentences) of why this resume received these scores and how well it matches the job requirements. Focus on the relationship between the candidate's experience and the job requirements."""
135
+
136
+ return prompt
137
+
138
+ def _clean_response(self, response):
139
+ """Clean the response from the model"""
140
+ # Remove any thinking or internal processing tokens
141
+ response = re.sub(r'<think>.*?</think>', '', response, flags=re.DOTALL)
142
+
143
+ # Limit to a reasonable length
144
+ if len(response) > 500:
145
+ sentences = response.split('.')
146
+ shortened = '.'.join(sentences[:5]) + '.'
147
+ return shortened
148
+
149
+ return response
150
+
151
+ def _generate_template_explanation(self, score, semantic_score, keyword_score, skills):
152
+ """Generate a template-based explanation when the model is not available"""
153
+ # Simple template-based explanation
154
+ if score > 0.8:
155
+ quality = "excellent"
156
+ elif score > 0.6:
157
+ quality = "good"
158
+ elif score > 0.4:
159
+ quality = "moderate"
160
+ else:
161
+ quality = "limited"
162
+
163
+ explanation = f"This resume shows {quality} alignment with the job requirements, with an overall score of {score:.2f}. "
164
+
165
+ if semantic_score > keyword_score:
166
+ explanation += f"The candidate's experience demonstrates strong semantic relevance ({semantic_score:.2f}) to the position, though specific keyword matches ({keyword_score:.2f}) could be improved. "
167
+ else:
168
+ explanation += f"The resume contains many relevant keywords ({keyword_score:.2f}), but could benefit from better contextual alignment ({semantic_score:.2f}) with the job requirements. "
169
+
170
+ if skills:
171
+ if len(skills) > 3:
172
+ explanation += f"Key skills identified include {', '.join(skills[:3])}, and {len(skills)-3} others that match the job requirements."
173
+ else:
174
+ explanation += f"Key skills identified include {', '.join(skills)}."
175
+ else:
176
+ explanation += "No specific skills were identified that directly match the requirements."
177
+
178
+ return explanation
fix_dependencies.py ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python
2
+ """
3
+ Dependency fixer for Resume Screener and Skill Extractor
4
+ This script ensures all dependencies are properly installed with compatible versions.
5
+ """
6
+
7
+ import sys
8
+ import subprocess
9
+ import pkg_resources
10
+ import os
11
+
12
+ def install(package):
13
+ """Install a package using pip"""
14
+ subprocess.check_call([sys.executable, "-m", "pip", "install", package])
15
+
16
+ def install_with_message(package, message=None):
17
+ """Install a package with an optional message"""
18
+ if message:
19
+ print(f"\n{message}")
20
+ print(f"Installing {package}...")
21
+ install(package)
22
+
23
+ def main():
24
+ print("Running dependency fixer for Resume Screener and Skill Extractor...")
25
+
26
+ # Install core dependencies first
27
+ install_with_message("pip==23.1.2", "Upgrading pip to ensure compatibility")
28
+ install_with_message("setuptools==68.0.0", "Installing compatible setuptools")
29
+
30
+ # Check if we're in a Hugging Face Space
31
+ in_hf_space = os.environ.get("SPACE_ID") is not None
32
+
33
+ # Install key libraries with specific versions to ensure compatibility
34
+ dependencies = [
35
+ ("streamlit==1.31.0", "Installing Streamlit for the web interface"),
36
+ ("pdfplumber==0.10.1", "Installing PDF processing libraries"),
37
+ ("PyPDF2==3.0.1", None),
38
+ ("python-docx==1.0.1", None),
39
+ ("rank-bm25==0.2.2", "Installing BM25 ranking library"),
40
+ ("tqdm==4.66.1", "Installing progress bar utility"),
41
+ ("faiss-cpu==1.7.4", "Installing FAISS for vector similarity search"),
42
+ ("huggingface-hub==0.20.3", "Installing Hugging Face Hub"),
43
+ ("transformers==4.36.2", "Installing Transformers"),
44
+ ("sentence-transformers==2.2.2", "Installing Sentence Transformers"),
45
+ ("torch==2.1.2", "Installing PyTorch"),
46
+ ("nltk==3.8.1", "Installing NLTK for text processing"),
47
+ ("pandas==2.1.3", "Installing data processing libraries"),
48
+ ("numpy==1.24.3", None),
49
+ ("plotly==5.18.0", "Installing visualization libraries"),
50
+ ("spacy==3.7.2", "Installing spaCy for NLP"),
51
+ ]
52
+
53
+ # Install all dependencies
54
+ for package, message in dependencies:
55
+ install_with_message(package, message)
56
+
57
+ # Download required NLTK data
58
+ print("\nDownloading NLTK data...")
59
+ install("nltk")
60
+ import nltk
61
+ nltk.download('punkt')
62
+
63
+ # Download spaCy model if not in a Hugging Face Space
64
+ # (Spaces should include this in the requirements.txt)
65
+ if not in_hf_space:
66
+ print("\nDownloading spaCy model...")
67
+ try:
68
+ subprocess.check_call([sys.executable, "-m", "spacy", "download", "en_core_web_sm"])
69
+ except:
70
+ install("https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.0/en_core_web_sm-3.7.0.tar.gz")
71
+
72
+ print("\nDependency installation complete!")
73
+ print("You can now run the Resume Screener with: streamlit run app.py")
74
+
75
+ if __name__ == "__main__":
76
+ main()
requirements.txt CHANGED
@@ -1,22 +1,17 @@
1
- # Core dependencies - order matters!
2
- pydantic==1.10.8
3
- spacy==3.5.0
 
 
 
 
 
 
 
 
4
  sentence-transformers==2.2.2
5
- torch==1.13.1
6
- transformers==4.28.1
7
-
8
- # PDF processing
9
- pdfplumber==0.9.0
10
-
11
- # Web UI
12
- streamlit==1.22.0
13
-
14
- # Data processing
15
- pandas==1.5.3
16
  numpy==1.24.3
17
- matplotlib==3.7.1
18
- plotly==5.14.1
19
-
20
- # Utilities
21
- nltk==3.8.1
22
- scikit-learn==1.0.2
 
1
+ streamlit==1.31.0
2
+ pdfplumber==0.10.1
3
+ PyPDF2==3.0.1
4
+ python-docx==1.0.1
5
+ spacy==3.7.2
6
+ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.0/en_core_web_sm-3.7.0.tar.gz
7
+ transformers==4.36.2
8
+ torch==2.1.2
9
+ nltk==3.8.1
10
+ faiss-cpu==1.7.4
11
+ rank-bm25==0.2.2
12
  sentence-transformers==2.2.2
13
+ plotly==5.18.0
14
+ pandas==2.1.3
 
 
 
 
 
 
 
 
 
15
  numpy==1.24.3
16
+ tqdm==4.66.1
17
+ huggingface-hub==0.20.3