root
commited on
Commit
ยท
6cea573
1
Parent(s):
309041b
ss
Browse files
README.md
CHANGED
@@ -1,6 +1,6 @@
|
|
1 |
---
|
2 |
-
title:
|
3 |
-
emoji:
|
4 |
colorFrom: blue
|
5 |
colorTo: green
|
6 |
sdk: streamlit
|
@@ -10,80 +10,139 @@ pinned: false
|
|
10 |
license: mit
|
11 |
---
|
12 |
|
13 |
-
#
|
14 |
|
15 |
-
|
16 |
|
17 |
-
## Features
|
18 |
|
19 |
-
- **
|
20 |
-
- **
|
21 |
-
- **
|
|
|
22 |
- **Multi-format Support**: Processes PDFs, DOCX, TXT, and CSV files
|
23 |
-
- **
|
24 |
-
- **
|
25 |
-
- **
|
26 |
|
27 |
-
## How It Works
|
28 |
|
29 |
-
|
30 |
-
2. **Processing**: The system creates embeddings for both the job description and resumes using the NV-Embed-v2 model
|
31 |
-
3. **Ranking**: Calculates a hybrid score based on:
|
32 |
-
- Semantic similarity (cosine similarity between embeddings)
|
33 |
-
- Keyword relevance (BM25 scoring)
|
34 |
-
4. **Results**: Returns the top 10 most suitable resumes with:
|
35 |
-
- Overall score and individual component scores
|
36 |
-
- Matched skills and key phrases
|
37 |
-
- Explanations for why each resume was ranked highly
|
38 |
|
39 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
|
41 |
### Models Used
|
42 |
-
- **
|
43 |
-
- **
|
|
|
44 |
|
45 |
-
### Libraries
|
46 |
-
- **FAISS**: Facebook AI Similarity Search for
|
47 |
-
- **
|
48 |
-
- **
|
49 |
-
- **
|
|
|
50 |
|
51 |
-
## Configuration Options
|
52 |
|
53 |
-
The sidebar provides several
|
54 |
-
- **
|
55 |
-
- **
|
56 |
-
- **
|
57 |
-
- **FAISS Usage**: Toggle the use of FAISS for faster searching with large resume collections
|
58 |
|
59 |
-
## Getting Started
|
60 |
|
61 |
### Online Usage
|
62 |
-
1. Visit the
|
63 |
-
2. Enter a job description
|
64 |
-
3. Upload
|
65 |
-
4. Click "
|
66 |
-
5. Review
|
67 |
|
68 |
### Local Installation
|
69 |
|
70 |
```bash
|
71 |
-
git clone
|
72 |
cd Resume_Screener_and_Skill_Extractor
|
73 |
pip install -r requirements.txt
|
74 |
streamlit run app.py
|
75 |
```
|
76 |
|
77 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
78 |
|
79 |
-
|
80 |
-
-
|
81 |
-
-
|
82 |
-
-
|
83 |
-
- Customizable scoring algorithms and weights
|
84 |
|
85 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
86 |
|
87 |
-
|
88 |
|
89 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
1 |
---
|
2 |
+
title: AI-driven Candidate Matcher
|
3 |
+
emoji: ๐ฏ
|
4 |
colorFrom: blue
|
5 |
colorTo: green
|
6 |
sdk: streamlit
|
|
|
10 |
license: mit
|
11 |
---
|
12 |
|
13 |
+
# AI-driven Candidate Matcher
|
14 |
|
15 |
+
An advanced AI-powered resume screening application that uses a sophisticated 5-stage pipeline to efficiently match resumes against job descriptions. Built with state-of-the-art machine learning models for accurate candidate ranking.
|
16 |
|
17 |
+
## ๐ Features
|
18 |
|
19 |
+
- **5-Stage Advanced Pipeline**: Multi-layered approach combining semantic similarity, keyword matching, and AI intent analysis
|
20 |
+
- **State-of-the-Art Models**: Uses BAAI/bge-large-en-v1.5 embeddings and Cross-Encoder re-ranking
|
21 |
+
- **FAISS Integration**: Lightning-fast similarity search for large resume collections
|
22 |
+
- **AI Intent Analysis**: Qwen3-1.7B model analyzes candidate job-seeking intent
|
23 |
- **Multi-format Support**: Processes PDFs, DOCX, TXT, and CSV files
|
24 |
+
- **Interactive Visualizations**: Comprehensive score breakdowns and comparative analysis
|
25 |
+
- **Batch Processing**: Upload and analyze multiple resumes simultaneously
|
26 |
+
- **Export Results**: Download detailed analysis as CSV
|
27 |
|
28 |
+
## ๐ง How It Works
|
29 |
|
30 |
+
### 5-Stage Advanced Pipeline
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
|
32 |
+
1. **FAISS Recall (Top 50)**: Initial semantic similarity search using BAAI/bge-large-en-v1.5 embeddings
|
33 |
+
2. **Cross-Encoder Re-ranking (Top 20)**: Deep semantic relevance scoring with ms-marco-MiniLM-L6-v2
|
34 |
+
3. **BM25 Keyword Matching**: Traditional keyword-based scoring for skill alignment
|
35 |
+
4. **LLM Intent Analysis**: Qwen3-1.7B analyzes candidate suitability and job-seeking intent
|
36 |
+
5. **Combined Scoring**: Weighted combination of all scores for final ranking
|
37 |
+
|
38 |
+
### Scoring Formula
|
39 |
+
**Final Score = Cross-Encoder (0-0.7) + BM25 (0.1-0.2) + Intent (0-0.1)**
|
40 |
+
|
41 |
+
### Input & Output
|
42 |
+
- **Input**: Job description + Resume files (PDF/DOCX/TXT/CSV)
|
43 |
+
- **Output**: Ranked candidates with detailed score breakdowns and AI explanations
|
44 |
+
|
45 |
+
## ๐ค Technical Details
|
46 |
|
47 |
### Models Used
|
48 |
+
- **BAAI/bge-large-en-v1.5**: Advanced embedding model for semantic similarity
|
49 |
+
- **Cross-Encoder/ms-marco-MiniLM-L6-v2**: Deep re-ranking for relevance scoring
|
50 |
+
- **Qwen3-1.7B**: Large language model for intent analysis and explanations
|
51 |
|
52 |
+
### Key Libraries
|
53 |
+
- **FAISS**: Facebook AI Similarity Search for efficient vector operations
|
54 |
+
- **Sentence Transformers**: For embedding generation and cross-encoding
|
55 |
+
- **rank_bm25**: BM25 algorithm implementation for keyword matching
|
56 |
+
- **Streamlit**: Interactive web interface
|
57 |
+
- **PyTorch**: Deep learning framework
|
58 |
|
59 |
+
## ๐ Configuration Options
|
60 |
|
61 |
+
The sidebar provides several customization options:
|
62 |
+
- **Results Count**: Choose how many top candidates to display (1-5)
|
63 |
+
- **Pipeline Visualization**: Real-time progress through the 5-stage pipeline
|
64 |
+
- **Score Breakdown**: Detailed view of individual scoring components
|
|
|
65 |
|
66 |
+
## ๐ Getting Started
|
67 |
|
68 |
### Online Usage
|
69 |
+
1. Visit the application
|
70 |
+
2. Enter a comprehensive job description
|
71 |
+
3. Upload resume files or CSV dataset
|
72 |
+
4. Click "Advanced Pipeline Analysis"
|
73 |
+
5. Review ranked candidates with detailed insights
|
74 |
|
75 |
### Local Installation
|
76 |
|
77 |
```bash
|
78 |
+
git clone <repository-url>
|
79 |
cd Resume_Screener_and_Skill_Extractor
|
80 |
pip install -r requirements.txt
|
81 |
streamlit run app.py
|
82 |
```
|
83 |
|
84 |
+
### Requirements
|
85 |
+
- Python 3.8+
|
86 |
+
- CUDA-compatible GPU (optional, for faster processing)
|
87 |
+
- Minimum 8GB RAM recommended
|
88 |
+
|
89 |
+
## ๐ Supported File Formats
|
90 |
+
|
91 |
+
- **PDF**: Extracted using pdfplumber with PyPDF2 fallback
|
92 |
+
- **DOCX**: Microsoft Word documents
|
93 |
+
- **TXT**: Plain text files
|
94 |
+
- **CSV**: Structured datasets with resume text columns
|
95 |
+
|
96 |
+
## ๐ Privacy & Security
|
97 |
+
|
98 |
+
### Data Privacy Statement
|
99 |
+
|
100 |
+
**Your privacy is our top priority. We are committed to protecting the confidentiality of all resume data processed through this application.**
|
101 |
+
|
102 |
+
#### Data Handling
|
103 |
+
- **No Data Storage**: Resume content is processed in memory only and never stored permanently
|
104 |
+
- **Session-Based**: All data is cleared when you close the browser or reset the application
|
105 |
+
- **Local Processing**: All AI analysis happens locally within the application environment
|
106 |
+
- **No External Transmission**: Resume data is never sent to external services or third parties
|
107 |
+
|
108 |
+
#### Security Measures
|
109 |
+
- **Temporary Files**: Uploaded files are processed in secure temporary locations and immediately deleted
|
110 |
+
- **Memory Management**: Automatic cleanup of resume data from system memory
|
111 |
+
- **No Logging**: Resume content is never logged or cached
|
112 |
+
- **Secure Processing**: All text extraction and analysis occurs within isolated processing environments
|
113 |
|
114 |
+
#### User Control
|
115 |
+
- **Clear Data Options**: Multiple options to clear resume data and free memory
|
116 |
+
- **Session Management**: Complete control over when and how data is processed
|
117 |
+
- **Transparent Processing**: Full visibility into what data is being analyzed
|
|
|
118 |
|
119 |
+
**We recommend reviewing your organization's data handling policies before uploading sensitive resume information.**
|
120 |
+
|
121 |
+
## ๐ Performance Metrics
|
122 |
+
|
123 |
+
- **Accuracy**: Advanced multi-stage pipeline ensures high-quality candidate ranking
|
124 |
+
- **Speed**: FAISS indexing enables sub-second search across thousands of resumes
|
125 |
+
- **Scalability**: Efficient memory management for large resume datasets
|
126 |
+
- **Reliability**: Fallback models ensure consistent operation
|
127 |
+
|
128 |
+
## ๐ฎ Future Enhancements
|
129 |
+
|
130 |
+
- **Multi-language Support**: Extend to non-English resumes and job descriptions
|
131 |
+
- **Custom Scoring Weights**: User-configurable importance of different scoring components
|
132 |
+
- **Advanced Skill Extraction**: Enhanced NLP for technical skill identification
|
133 |
+
- **Integration APIs**: Connect with ATS and HR management systems
|
134 |
+
- **Batch Job Processing**: Queue-based processing for large-scale screening
|
135 |
+
|
136 |
+
## ๐ License
|
137 |
+
|
138 |
+
MIT License - See LICENSE file for details
|
139 |
+
|
140 |
+
## ๐ค Contributing
|
141 |
+
|
142 |
+
Contributions are welcome! Please feel free to submit pull requests or open issues for bugs and feature requests.
|
143 |
+
|
144 |
+
---
|
145 |
|
146 |
+
*Built with โค๏ธ using Streamlit, Transformers, and FAISS*
|
147 |
|
148 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
app.py
CHANGED
@@ -36,7 +36,7 @@ except LookupError:
|
|
36 |
|
37 |
# Set page configuration
|
38 |
st.set_page_config(
|
39 |
-
page_title="AI
|
40 |
page_icon="๐ฏ",
|
41 |
layout="wide",
|
42 |
initial_sidebar_state="expanded"
|
@@ -580,8 +580,36 @@ def create_download_link(df, filename="resume_screening_results.csv"):
|
|
580 |
return f'<a href="data:file/csv;base64,{b64}" download="{filename}" class="download-btn">๐ฅ Download Results CSV</a>'
|
581 |
|
582 |
# Main App Interface
|
583 |
-
st.title("๐ฏ AI-
|
584 |
-
st.markdown("*
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
585 |
st.markdown("---")
|
586 |
|
587 |
# Initialize screener
|
@@ -931,7 +959,7 @@ st.markdown("---")
|
|
931 |
st.markdown(
|
932 |
"""
|
933 |
<div style='text-align: center; color: #666;'>
|
934 |
-
๐ Powered by BAAI/bge-large-en-v1.5 & Qwen3-1.7B | Built with Streamlit
|
935 |
</div>
|
936 |
""",
|
937 |
unsafe_allow_html=True
|
|
|
36 |
|
37 |
# Set page configuration
|
38 |
st.set_page_config(
|
39 |
+
page_title="AI-driven Candidate Matcher",
|
40 |
page_icon="๐ฏ",
|
41 |
layout="wide",
|
42 |
initial_sidebar_state="expanded"
|
|
|
580 |
return f'<a href="data:file/csv;base64,{b64}" download="{filename}" class="download-btn">๐ฅ Download Results CSV</a>'
|
581 |
|
582 |
# Main App Interface
|
583 |
+
st.title("๐ฏ AI-driven Candidate Matcher")
|
584 |
+
st.markdown("*Advanced 5-stage pipeline using BAAI/bge-large-en-v1.5 embeddings, Cross-Encoder re-ranking, and Qwen3-1.7B intent analysis*")
|
585 |
+
|
586 |
+
# Privacy Statement
|
587 |
+
with st.expander("๐ Privacy & Data Security", expanded=False):
|
588 |
+
st.markdown("""
|
589 |
+
### Data Privacy Statement
|
590 |
+
|
591 |
+
**Your privacy is our top priority. We are committed to protecting the confidentiality of all resume data.**
|
592 |
+
|
593 |
+
#### ๐ก๏ธ Data Handling
|
594 |
+
- **No Permanent Storage**: Resume content is processed in memory only and never stored permanently
|
595 |
+
- **Session-Based**: All data is automatically cleared when you close the browser or reset the application
|
596 |
+
- **Local Processing**: All AI analysis happens locally within this application environment
|
597 |
+
- **No External Transmission**: Resume data is never sent to external services or third parties
|
598 |
+
|
599 |
+
#### ๐ Security Measures
|
600 |
+
- **Temporary Files**: Uploaded files are processed in secure temporary locations and immediately deleted
|
601 |
+
- **Memory Management**: Automatic cleanup of resume data from system memory
|
602 |
+
- **No Logging**: Resume content is never logged or cached anywhere
|
603 |
+
- **Secure Processing**: All text extraction and analysis occurs within isolated processing environments
|
604 |
+
|
605 |
+
#### ๐ค User Control
|
606 |
+
- **Clear Data Options**: Multiple options available to clear resume data and free memory
|
607 |
+
- **Session Management**: Complete control over when and how your data is processed
|
608 |
+
- **Transparent Processing**: Full visibility into what data is being analyzed
|
609 |
+
|
610 |
+
**We recommend reviewing your organization's data handling policies before uploading sensitive resume information.**
|
611 |
+
""")
|
612 |
+
|
613 |
st.markdown("---")
|
614 |
|
615 |
# Initialize screener
|
|
|
959 |
st.markdown(
|
960 |
"""
|
961 |
<div style='text-align: center; color: #666;'>
|
962 |
+
๐ Powered by BAAI/bge-large-en-v1.5, Cross-Encoder/ms-marco-MiniLM-L6-v2 & Qwen3-1.7B | Built with Streamlit
|
963 |
</div>
|
964 |
""",
|
965 |
unsafe_allow_html=True
|