root commited on
Commit
6cea573
ยท
1 Parent(s): 309041b
Files changed (2) hide show
  1. README.md +109 -50
  2. app.py +32 -4
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
- title: Resume Screener and Skill Extractor
3
- emoji: ๐Ÿ“„
4
  colorFrom: blue
5
  colorTo: green
6
  sdk: streamlit
@@ -10,80 +10,139 @@ pinned: false
10
  license: mit
11
  ---
12
 
13
- # Resume Screener and Skill Extractor
14
 
15
- A Hugging Face Space application for efficiently screening resumes against job descriptions using a hybrid ranking approach that combines semantic similarity with keyword-based scoring.
16
 
17
- ## Features
18
 
19
- - **Hybrid Resume Ranking**: Combines semantic similarity (via NV-Embed-v2) with keyword-based BM25 scoring
20
- - **Skill Extraction**: Automatically identifies relevant skills from resumes based on job requirements
21
- - **Fast Search**: Uses FAISS for efficient similarity search with large resume collections
 
22
  - **Multi-format Support**: Processes PDFs, DOCX, TXT, and CSV files
23
- - **Explanation Generation**: Provides explanations for why each resume was ranked highly
24
- - **Visualization**: Displays comparative scores and key matches for easy analysis
25
- - **Batch Processing**: Supports uploading multiple resumes simultaneously
26
 
27
- ## How It Works
28
 
29
- 1. **Input**: Provide a job description and upload resumes (PDF, DOCX, TXT, or CSV format)
30
- 2. **Processing**: The system creates embeddings for both the job description and resumes using the NV-Embed-v2 model
31
- 3. **Ranking**: Calculates a hybrid score based on:
32
- - Semantic similarity (cosine similarity between embeddings)
33
- - Keyword relevance (BM25 scoring)
34
- 4. **Results**: Returns the top 10 most suitable resumes with:
35
- - Overall score and individual component scores
36
- - Matched skills and key phrases
37
- - Explanations for why each resume was ranked highly
38
 
39
- ## Technical Details
 
 
 
 
 
 
 
 
 
 
 
 
 
40
 
41
  ### Models Used
42
- - **NV-Embed-v2**: State-of-the-art embedding model for semantic similarity
43
- - **QwQ-32B**: Used for generating explanations (simulated in the current version)
 
44
 
45
- ### Libraries
46
- - **FAISS**: Facebook AI Similarity Search for fast vector similarity search
47
- - **rank_bm25**: Implementation of the BM25 algorithm for keyword-based scoring
48
- - **Streamlit**: For the user interface
49
- - **Hugging Face Transformers**: For accessing and using the models
 
50
 
51
- ## Configuration Options
52
 
53
- The sidebar provides several configuration options:
54
- - **Model Selection**: Choose which embedding model to use
55
- - **Ranking Weights**: Adjust the balance between semantic similarity and keyword matching
56
- - **Results Count**: Set how many top results to display
57
- - **FAISS Usage**: Toggle the use of FAISS for faster searching with large resume collections
58
 
59
- ## Getting Started
60
 
61
  ### Online Usage
62
- 1. Visit the Hugging Face Space at [URL]
63
- 2. Enter a job description
64
- 3. Upload resumes (PDF, DOCX, TXT, or CSV)
65
- 4. Click "Find Top Candidates"
66
- 5. Review the results
67
 
68
  ### Local Installation
69
 
70
  ```bash
71
- git clone https://huggingface.co/spaces/[username]/Resume_Screener_and_Skill_Extractor
72
  cd Resume_Screener_and_Skill_Extractor
73
  pip install -r requirements.txt
74
  streamlit run app.py
75
  ```
76
 
77
- ## Future Enhancements
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
78
 
79
- - Integration with Hugging Face datasets for loading resumes directly
80
- - Enhanced skill extraction using more sophisticated NLP techniques
81
- - Real-time explanation generation using QwQ-32B
82
- - Support for additional file formats and languages
83
- - Customizable scoring algorithms and weights
84
 
85
- ## License
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86
 
87
- MIT License
88
 
89
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
  ---
2
+ title: AI-driven Candidate Matcher
3
+ emoji: ๐ŸŽฏ
4
  colorFrom: blue
5
  colorTo: green
6
  sdk: streamlit
 
10
  license: mit
11
  ---
12
 
13
+ # AI-driven Candidate Matcher
14
 
15
+ An advanced AI-powered resume screening application that uses a sophisticated 5-stage pipeline to efficiently match resumes against job descriptions. Built with state-of-the-art machine learning models for accurate candidate ranking.
16
 
17
+ ## ๐Ÿš€ Features
18
 
19
+ - **5-Stage Advanced Pipeline**: Multi-layered approach combining semantic similarity, keyword matching, and AI intent analysis
20
+ - **State-of-the-Art Models**: Uses BAAI/bge-large-en-v1.5 embeddings and Cross-Encoder re-ranking
21
+ - **FAISS Integration**: Lightning-fast similarity search for large resume collections
22
+ - **AI Intent Analysis**: Qwen3-1.7B model analyzes candidate job-seeking intent
23
  - **Multi-format Support**: Processes PDFs, DOCX, TXT, and CSV files
24
+ - **Interactive Visualizations**: Comprehensive score breakdowns and comparative analysis
25
+ - **Batch Processing**: Upload and analyze multiple resumes simultaneously
26
+ - **Export Results**: Download detailed analysis as CSV
27
 
28
+ ## ๐Ÿ”ง How It Works
29
 
30
+ ### 5-Stage Advanced Pipeline
 
 
 
 
 
 
 
 
31
 
32
+ 1. **FAISS Recall (Top 50)**: Initial semantic similarity search using BAAI/bge-large-en-v1.5 embeddings
33
+ 2. **Cross-Encoder Re-ranking (Top 20)**: Deep semantic relevance scoring with ms-marco-MiniLM-L6-v2
34
+ 3. **BM25 Keyword Matching**: Traditional keyword-based scoring for skill alignment
35
+ 4. **LLM Intent Analysis**: Qwen3-1.7B analyzes candidate suitability and job-seeking intent
36
+ 5. **Combined Scoring**: Weighted combination of all scores for final ranking
37
+
38
+ ### Scoring Formula
39
+ **Final Score = Cross-Encoder (0-0.7) + BM25 (0.1-0.2) + Intent (0-0.1)**
40
+
41
+ ### Input & Output
42
+ - **Input**: Job description + Resume files (PDF/DOCX/TXT/CSV)
43
+ - **Output**: Ranked candidates with detailed score breakdowns and AI explanations
44
+
45
+ ## ๐Ÿค– Technical Details
46
 
47
  ### Models Used
48
+ - **BAAI/bge-large-en-v1.5**: Advanced embedding model for semantic similarity
49
+ - **Cross-Encoder/ms-marco-MiniLM-L6-v2**: Deep re-ranking for relevance scoring
50
+ - **Qwen3-1.7B**: Large language model for intent analysis and explanations
51
 
52
+ ### Key Libraries
53
+ - **FAISS**: Facebook AI Similarity Search for efficient vector operations
54
+ - **Sentence Transformers**: For embedding generation and cross-encoding
55
+ - **rank_bm25**: BM25 algorithm implementation for keyword matching
56
+ - **Streamlit**: Interactive web interface
57
+ - **PyTorch**: Deep learning framework
58
 
59
+ ## ๐Ÿ“Š Configuration Options
60
 
61
+ The sidebar provides several customization options:
62
+ - **Results Count**: Choose how many top candidates to display (1-5)
63
+ - **Pipeline Visualization**: Real-time progress through the 5-stage pipeline
64
+ - **Score Breakdown**: Detailed view of individual scoring components
 
65
 
66
+ ## ๐Ÿš€ Getting Started
67
 
68
  ### Online Usage
69
+ 1. Visit the application
70
+ 2. Enter a comprehensive job description
71
+ 3. Upload resume files or CSV dataset
72
+ 4. Click "Advanced Pipeline Analysis"
73
+ 5. Review ranked candidates with detailed insights
74
 
75
  ### Local Installation
76
 
77
  ```bash
78
+ git clone <repository-url>
79
  cd Resume_Screener_and_Skill_Extractor
80
  pip install -r requirements.txt
81
  streamlit run app.py
82
  ```
83
 
84
+ ### Requirements
85
+ - Python 3.8+
86
+ - CUDA-compatible GPU (optional, for faster processing)
87
+ - Minimum 8GB RAM recommended
88
+
89
+ ## ๐Ÿ“‹ Supported File Formats
90
+
91
+ - **PDF**: Extracted using pdfplumber with PyPDF2 fallback
92
+ - **DOCX**: Microsoft Word documents
93
+ - **TXT**: Plain text files
94
+ - **CSV**: Structured datasets with resume text columns
95
+
96
+ ## ๐Ÿ”’ Privacy & Security
97
+
98
+ ### Data Privacy Statement
99
+
100
+ **Your privacy is our top priority. We are committed to protecting the confidentiality of all resume data processed through this application.**
101
+
102
+ #### Data Handling
103
+ - **No Data Storage**: Resume content is processed in memory only and never stored permanently
104
+ - **Session-Based**: All data is cleared when you close the browser or reset the application
105
+ - **Local Processing**: All AI analysis happens locally within the application environment
106
+ - **No External Transmission**: Resume data is never sent to external services or third parties
107
+
108
+ #### Security Measures
109
+ - **Temporary Files**: Uploaded files are processed in secure temporary locations and immediately deleted
110
+ - **Memory Management**: Automatic cleanup of resume data from system memory
111
+ - **No Logging**: Resume content is never logged or cached
112
+ - **Secure Processing**: All text extraction and analysis occurs within isolated processing environments
113
 
114
+ #### User Control
115
+ - **Clear Data Options**: Multiple options to clear resume data and free memory
116
+ - **Session Management**: Complete control over when and how data is processed
117
+ - **Transparent Processing**: Full visibility into what data is being analyzed
 
118
 
119
+ **We recommend reviewing your organization's data handling policies before uploading sensitive resume information.**
120
+
121
+ ## ๐Ÿ“ˆ Performance Metrics
122
+
123
+ - **Accuracy**: Advanced multi-stage pipeline ensures high-quality candidate ranking
124
+ - **Speed**: FAISS indexing enables sub-second search across thousands of resumes
125
+ - **Scalability**: Efficient memory management for large resume datasets
126
+ - **Reliability**: Fallback models ensure consistent operation
127
+
128
+ ## ๐Ÿ”ฎ Future Enhancements
129
+
130
+ - **Multi-language Support**: Extend to non-English resumes and job descriptions
131
+ - **Custom Scoring Weights**: User-configurable importance of different scoring components
132
+ - **Advanced Skill Extraction**: Enhanced NLP for technical skill identification
133
+ - **Integration APIs**: Connect with ATS and HR management systems
134
+ - **Batch Job Processing**: Queue-based processing for large-scale screening
135
+
136
+ ## ๐Ÿ“„ License
137
+
138
+ MIT License - See LICENSE file for details
139
+
140
+ ## ๐Ÿค Contributing
141
+
142
+ Contributions are welcome! Please feel free to submit pull requests or open issues for bugs and feature requests.
143
+
144
+ ---
145
 
146
+ *Built with โค๏ธ using Streamlit, Transformers, and FAISS*
147
 
148
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
app.py CHANGED
@@ -36,7 +36,7 @@ except LookupError:
36
 
37
  # Set page configuration
38
  st.set_page_config(
39
- page_title="AI Resume Screener",
40
  page_icon="๐ŸŽฏ",
41
  layout="wide",
42
  initial_sidebar_state="expanded"
@@ -580,8 +580,36 @@ def create_download_link(df, filename="resume_screening_results.csv"):
580
  return f'<a href="data:file/csv;base64,{b64}" download="{filename}" class="download-btn">๐Ÿ“ฅ Download Results CSV</a>'
581
 
582
  # Main App Interface
583
- st.title("๐ŸŽฏ AI-Powered Resume Screener")
584
- st.markdown("*Find the perfect candidates using BAAI/bge-large-en-v1.5 embeddings and Qwen3-14B explanations*")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
585
  st.markdown("---")
586
 
587
  # Initialize screener
@@ -931,7 +959,7 @@ st.markdown("---")
931
  st.markdown(
932
  """
933
  <div style='text-align: center; color: #666;'>
934
- ๐Ÿš€ Powered by BAAI/bge-large-en-v1.5 & Qwen3-1.7B | Built with Streamlit
935
  </div>
936
  """,
937
  unsafe_allow_html=True
 
36
 
37
  # Set page configuration
38
  st.set_page_config(
39
+ page_title="AI-driven Candidate Matcher",
40
  page_icon="๐ŸŽฏ",
41
  layout="wide",
42
  initial_sidebar_state="expanded"
 
580
  return f'<a href="data:file/csv;base64,{b64}" download="{filename}" class="download-btn">๐Ÿ“ฅ Download Results CSV</a>'
581
 
582
  # Main App Interface
583
+ st.title("๐ŸŽฏ AI-driven Candidate Matcher")
584
+ st.markdown("*Advanced 5-stage pipeline using BAAI/bge-large-en-v1.5 embeddings, Cross-Encoder re-ranking, and Qwen3-1.7B intent analysis*")
585
+
586
+ # Privacy Statement
587
+ with st.expander("๐Ÿ”’ Privacy & Data Security", expanded=False):
588
+ st.markdown("""
589
+ ### Data Privacy Statement
590
+
591
+ **Your privacy is our top priority. We are committed to protecting the confidentiality of all resume data.**
592
+
593
+ #### ๐Ÿ›ก๏ธ Data Handling
594
+ - **No Permanent Storage**: Resume content is processed in memory only and never stored permanently
595
+ - **Session-Based**: All data is automatically cleared when you close the browser or reset the application
596
+ - **Local Processing**: All AI analysis happens locally within this application environment
597
+ - **No External Transmission**: Resume data is never sent to external services or third parties
598
+
599
+ #### ๐Ÿ” Security Measures
600
+ - **Temporary Files**: Uploaded files are processed in secure temporary locations and immediately deleted
601
+ - **Memory Management**: Automatic cleanup of resume data from system memory
602
+ - **No Logging**: Resume content is never logged or cached anywhere
603
+ - **Secure Processing**: All text extraction and analysis occurs within isolated processing environments
604
+
605
+ #### ๐Ÿ‘ค User Control
606
+ - **Clear Data Options**: Multiple options available to clear resume data and free memory
607
+ - **Session Management**: Complete control over when and how your data is processed
608
+ - **Transparent Processing**: Full visibility into what data is being analyzed
609
+
610
+ **We recommend reviewing your organization's data handling policies before uploading sensitive resume information.**
611
+ """)
612
+
613
  st.markdown("---")
614
 
615
  # Initialize screener
 
959
  st.markdown(
960
  """
961
  <div style='text-align: center; color: #666;'>
962
+ ๐Ÿš€ Powered by BAAI/bge-large-en-v1.5, Cross-Encoder/ms-marco-MiniLM-L6-v2 & Qwen3-1.7B | Built with Streamlit
963
  </div>
964
  """,
965
  unsafe_allow_html=True