Spaces:

jacob-c
/

Resume_Screener_and_Skill_Extractor

Paused

App Files Files Community

root commited on Jun 1

Commit

6cea573

1 Parent(s): 309041b

ss

Browse files

Files changed (2) hide show

README.md +109 -50
app.py +32 -4

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
-title: Resume Screener and Skill Extractor
-emoji: 📄
 colorFrom: blue
 colorTo: green
 sdk: streamlit
@@ -10,80 +10,139 @@ pinned: false
 license: mit
 ---
-# Resume Screener and Skill Extractor
-A Hugging Face Space application for efficiently screening resumes against job descriptions using a hybrid ranking approach that combines semantic similarity with keyword-based scoring.
-## Features
-- **Hybrid Resume Ranking**: Combines semantic similarity (via NV-Embed-v2) with keyword-based BM25 scoring
-- **Skill Extraction**: Automatically identifies relevant skills from resumes based on job requirements
-- **Fast Search**: Uses FAISS for efficient similarity search with large resume collections
 - **Multi-format Support**: Processes PDFs, DOCX, TXT, and CSV files
-- **Explanation Generation**: Provides explanations for why each resume was ranked highly
-- **Visualization**: Displays comparative scores and key matches for easy analysis
-- **Batch Processing**: Supports uploading multiple resumes simultaneously
-## How It Works
-1. **Input**: Provide a job description and upload resumes (PDF, DOCX, TXT, or CSV format)
-2. **Processing**: The system creates embeddings for both the job description and resumes using the NV-Embed-v2 model
-3. **Ranking**: Calculates a hybrid score based on:
-   - Semantic similarity (cosine similarity between embeddings)
-   - Keyword relevance (BM25 scoring)
-4. **Results**: Returns the top 10 most suitable resumes with:
-   - Overall score and individual component scores
-   - Matched skills and key phrases
-   - Explanations for why each resume was ranked highly
-## Technical Details
 ### Models Used
-- **NV-Embed-v2**: State-of-the-art embedding model for semantic similarity
-- **QwQ-32B**: Used for generating explanations (simulated in the current version)
-### Libraries
-- **FAISS**: Facebook AI Similarity Search for fast vector similarity search
-- **rank_bm25**: Implementation of the BM25 algorithm for keyword-based scoring
-- **Streamlit**: For the user interface
-- **Hugging Face Transformers**: For accessing and using the models
-## Configuration Options
-The sidebar provides several configuration options:
-- **Model Selection**: Choose which embedding model to use
-- **Ranking Weights**: Adjust the balance between semantic similarity and keyword matching
-- **Results Count**: Set how many top results to display
-- **FAISS Usage**: Toggle the use of FAISS for faster searching with large resume collections
-## Getting Started
 ### Online Usage
-1. Visit the Hugging Face Space at [URL]
-2. Enter a job description
-3. Upload resumes (PDF, DOCX, TXT, or CSV)
-4. Click "Find Top Candidates"
-5. Review the results
 ### Local Installation
 ```bash
-git clone https://huggingface.co/spaces/[username]/Resume_Screener_and_Skill_Extractor
 cd Resume_Screener_and_Skill_Extractor
 pip install -r requirements.txt
 streamlit run app.py
 ```
-## Future Enhancements
-- Integration with Hugging Face datasets for loading resumes directly
-- Enhanced skill extraction using more sophisticated NLP techniques
-- Real-time explanation generation using QwQ-32B
-- Support for additional file formats and languages
-- Customizable scoring algorithms and weights
-## License
-MIT License
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: AI-driven Candidate Matcher
+emoji: 🎯
 colorFrom: blue
 colorTo: green
 sdk: streamlit
 license: mit
 ---
+# AI-driven Candidate Matcher
+An advanced AI-powered resume screening application that uses a sophisticated 5-stage pipeline to efficiently match resumes against job descriptions. Built with state-of-the-art machine learning models for accurate candidate ranking.
+## 🚀 Features
+- **5-Stage Advanced Pipeline**: Multi-layered approach combining semantic similarity, keyword matching, and AI intent analysis
+- **State-of-the-Art Models**: Uses BAAI/bge-large-en-v1.5 embeddings and Cross-Encoder re-ranking
+- **FAISS Integration**: Lightning-fast similarity search for large resume collections
+- **AI Intent Analysis**: Qwen3-1.7B model analyzes candidate job-seeking intent
 - **Multi-format Support**: Processes PDFs, DOCX, TXT, and CSV files
+- **Interactive Visualizations**: Comprehensive score breakdowns and comparative analysis
+- **Batch Processing**: Upload and analyze multiple resumes simultaneously
+- **Export Results**: Download detailed analysis as CSV
+## 🔧 How It Works
+### 5-Stage Advanced Pipeline
+1. **FAISS Recall (Top 50)**: Initial semantic similarity search using BAAI/bge-large-en-v1.5 embeddings
+2. **Cross-Encoder Re-ranking (Top 20)**: Deep semantic relevance scoring with ms-marco-MiniLM-L6-v2
+3. **BM25 Keyword Matching**: Traditional keyword-based scoring for skill alignment
+4. **LLM Intent Analysis**: Qwen3-1.7B analyzes candidate suitability and job-seeking intent
+5. **Combined Scoring**: Weighted combination of all scores for final ranking
+### Scoring Formula
+**Final Score = Cross-Encoder (0-0.7) + BM25 (0.1-0.2) + Intent (0-0.1)**
+### Input & Output
+- **Input**: Job description + Resume files (PDF/DOCX/TXT/CSV)
+- **Output**: Ranked candidates with detailed score breakdowns and AI explanations
+## 🤖 Technical Details
 ### Models Used
+- **BAAI/bge-large-en-v1.5**: Advanced embedding model for semantic similarity
+- **Cross-Encoder/ms-marco-MiniLM-L6-v2**: Deep re-ranking for relevance scoring
+- **Qwen3-1.7B**: Large language model for intent analysis and explanations
+### Key Libraries
+- **FAISS**: Facebook AI Similarity Search for efficient vector operations
+- **Sentence Transformers**: For embedding generation and cross-encoding
+- **rank_bm25**: BM25 algorithm implementation for keyword matching
+- **Streamlit**: Interactive web interface
+- **PyTorch**: Deep learning framework
+## 📊 Configuration Options
+The sidebar provides several customization options:
+- **Results Count**: Choose how many top candidates to display (1-5)
+- **Pipeline Visualization**: Real-time progress through the 5-stage pipeline
+- **Score Breakdown**: Detailed view of individual scoring components
+## 🚀 Getting Started
 ### Online Usage
+1. Visit the application
+2. Enter a comprehensive job description
+3. Upload resume files or CSV dataset
+4. Click "Advanced Pipeline Analysis"
+5. Review ranked candidates with detailed insights
 ### Local Installation
 ```bash
+git clone <repository-url>
 cd Resume_Screener_and_Skill_Extractor
 pip install -r requirements.txt
 streamlit run app.py
 ```
+### Requirements
+- Python 3.8+
+- CUDA-compatible GPU (optional, for faster processing)
+- Minimum 8GB RAM recommended
+## 📋 Supported File Formats
+- **PDF**: Extracted using pdfplumber with PyPDF2 fallback
+- **DOCX**: Microsoft Word documents
+- **TXT**: Plain text files
+- **CSV**: Structured datasets with resume text columns
+## 🔒 Privacy & Security
+### Data Privacy Statement
+**Your privacy is our top priority. We are committed to protecting the confidentiality of all resume data processed through this application.**
+#### Data Handling
+- **No Data Storage**: Resume content is processed in memory only and never stored permanently
+- **Session-Based**: All data is cleared when you close the browser or reset the application
+- **Local Processing**: All AI analysis happens locally within the application environment
+- **No External Transmission**: Resume data is never sent to external services or third parties
+#### Security Measures
+- **Temporary Files**: Uploaded files are processed in secure temporary locations and immediately deleted
+- **Memory Management**: Automatic cleanup of resume data from system memory
+- **No Logging**: Resume content is never logged or cached
+- **Secure Processing**: All text extraction and analysis occurs within isolated processing environments
+#### User Control
+- **Clear Data Options**: Multiple options to clear resume data and free memory
+- **Session Management**: Complete control over when and how data is processed
+- **Transparent Processing**: Full visibility into what data is being analyzed
+**We recommend reviewing your organization's data handling policies before uploading sensitive resume information.**
+## 📈 Performance Metrics
+- **Accuracy**: Advanced multi-stage pipeline ensures high-quality candidate ranking
+- **Speed**: FAISS indexing enables sub-second search across thousands of resumes
+- **Scalability**: Efficient memory management for large resume datasets
+- **Reliability**: Fallback models ensure consistent operation
+## 🔮 Future Enhancements
+- **Multi-language Support**: Extend to non-English resumes and job descriptions
+- **Custom Scoring Weights**: User-configurable importance of different scoring components
+- **Advanced Skill Extraction**: Enhanced NLP for technical skill identification
+- **Integration APIs**: Connect with ATS and HR management systems
+- **Batch Job Processing**: Queue-based processing for large-scale screening
+## 📄 License
+MIT License - See LICENSE file for details
+## 🤝 Contributing
+Contributions are welcome! Please feel free to submit pull requests or open issues for bugs and feature requests.
+---
+*Built with ❤️ using Streamlit, Transformers, and FAISS*
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

app.py CHANGED Viewed

@@ -36,7 +36,7 @@ except LookupError:
 # Set page configuration
 st.set_page_config(
-    page_title="AI Resume Screener",
     page_icon="🎯",
     layout="wide",
     initial_sidebar_state="expanded"
@@ -580,8 +580,36 @@ def create_download_link(df, filename="resume_screening_results.csv"):
     return f'<a href="data:file/csv;base64,{b64}" download="{filename}" class="download-btn">📥 Download Results CSV</a>'
 # Main App Interface
-st.title("🎯 AI-Powered Resume Screener")
-st.markdown("*Find the perfect candidates using BAAI/bge-large-en-v1.5 embeddings and Qwen3-14B explanations*")
 st.markdown("---")
 # Initialize screener
@@ -931,7 +959,7 @@ st.markdown("---")
 st.markdown(
     """
     <div style='text-align: center; color: #666;'>
-        🚀 Powered by BAAI/bge-large-en-v1.5 & Qwen3-1.7B | Built with Streamlit
     </div>
     """,
     unsafe_allow_html=True

 # Set page configuration
 st.set_page_config(
+    page_title="AI-driven Candidate Matcher",
     page_icon="🎯",
     layout="wide",
     initial_sidebar_state="expanded"
     return f'<a href="data:file/csv;base64,{b64}" download="{filename}" class="download-btn">📥 Download Results CSV</a>'
 # Main App Interface
+st.title("🎯 AI-driven Candidate Matcher")
+st.markdown("*Advanced 5-stage pipeline using BAAI/bge-large-en-v1.5 embeddings, Cross-Encoder re-ranking, and Qwen3-1.7B intent analysis*")
+# Privacy Statement
+with st.expander("🔒 Privacy & Data Security", expanded=False):
+    st.markdown("""
+    ### Data Privacy Statement
+    **Your privacy is our top priority. We are committed to protecting the confidentiality of all resume data.**
+    #### 🛡️ Data Handling
+    - **No Permanent Storage**: Resume content is processed in memory only and never stored permanently
+    - **Session-Based**: All data is automatically cleared when you close the browser or reset the application
+    - **Local Processing**: All AI analysis happens locally within this application environment
+    - **No External Transmission**: Resume data is never sent to external services or third parties
+    #### 🔐 Security Measures
+    - **Temporary Files**: Uploaded files are processed in secure temporary locations and immediately deleted
+    - **Memory Management**: Automatic cleanup of resume data from system memory
+    - **No Logging**: Resume content is never logged or cached anywhere
+    - **Secure Processing**: All text extraction and analysis occurs within isolated processing environments
+    #### 👤 User Control
+    - **Clear Data Options**: Multiple options available to clear resume data and free memory
+    - **Session Management**: Complete control over when and how your data is processed
+    - **Transparent Processing**: Full visibility into what data is being analyzed
+    **We recommend reviewing your organization's data handling policies before uploading sensitive resume information.**
+    """)
 st.markdown("---")
 # Initialize screener
 st.markdown(
     """
     <div style='text-align: center; color: #666;'>
+        🚀 Powered by BAAI/bge-large-en-v1.5, Cross-Encoder/ms-marco-MiniLM-L6-v2 & Qwen3-1.7B | Built with Streamlit
     </div>
     """,
     unsafe_allow_html=True