--- title: AI-driven Candidate Matcher emoji: 🎯 colorFrom: blue colorTo: green sdk: streamlit sdk_version: 1.31.0 app_file: app.py pinned: false license: mit --- # AI-driven Candidate Matcher An advanced AI-powered resume screening application that uses a sophisticated 5-stage pipeline to efficiently match resumes against job descriptions. Built with state-of-the-art machine learning models for accurate candidate ranking. ## 🚀 Features - **5-Stage Advanced Pipeline**: Multi-layered approach combining semantic similarity, keyword matching, and AI intent analysis - **State-of-the-Art Models**: Uses BAAI/bge-large-en-v1.5 embeddings and Cross-Encoder re-ranking - **FAISS Integration**: Lightning-fast similarity search for large resume collections - **AI Intent Analysis**: Qwen3-1.7B model analyzes candidate job-seeking intent - **Multi-format Support**: Processes PDFs, DOCX, TXT, and CSV files - **Interactive Visualizations**: Comprehensive score breakdowns and comparative analysis - **Batch Processing**: Upload and analyze multiple resumes simultaneously - **Export Results**: Download detailed analysis as CSV ## 🔧 How It Works ### 5-Stage Advanced Pipeline 1. **FAISS Recall (Top 50)**: Initial semantic similarity search using BAAI/bge-large-en-v1.5 embeddings 2. **Cross-Encoder Re-ranking (Top 20)**: Deep semantic relevance scoring with ms-marco-MiniLM-L6-v2 3. **BM25 Keyword Matching**: Traditional keyword-based scoring for skill alignment 4. **LLM Intent Analysis**: Qwen3-1.7B analyzes candidate suitability and job-seeking intent 5. **Combined Scoring**: Weighted combination of all scores for final ranking ### Scoring Formula **Final Score = Cross-Encoder (0-0.7) + BM25 (0.1-0.2) + Intent (0-0.1)** ### Input & Output - **Input**: Job description + Resume files (PDF/DOCX/TXT/CSV) - **Output**: Ranked candidates with detailed score breakdowns and AI explanations ## 🤖 Technical Details ### Models Used - **BAAI/bge-large-en-v1.5**: Advanced embedding model for semantic similarity - **Cross-Encoder/ms-marco-MiniLM-L6-v2**: Deep re-ranking for relevance scoring - **Qwen3-1.7B**: Large language model for intent analysis and explanations ### Key Libraries - **FAISS**: Facebook AI Similarity Search for efficient vector operations - **Sentence Transformers**: For embedding generation and cross-encoding - **rank_bm25**: BM25 algorithm implementation for keyword matching - **Streamlit**: Interactive web interface - **PyTorch**: Deep learning framework ## 📊 Configuration Options The sidebar provides several customization options: - **Results Count**: Choose how many top candidates to display (1-5) - **Pipeline Visualization**: Real-time progress through the 5-stage pipeline - **Score Breakdown**: Detailed view of individual scoring components ## 🚀 Getting Started ### Online Usage 1. Visit the application 2. Enter a comprehensive job description 3. Upload resume files or CSV dataset 4. Click "Advanced Pipeline Analysis" 5. Review ranked candidates with detailed insights ### Local Installation ```bash git clone cd Resume_Screener_and_Skill_Extractor pip install -r requirements.txt streamlit run app.py ``` ### Requirements - Python 3.8+ - CUDA-compatible GPU (optional, for faster processing) - Minimum 8GB RAM recommended ## 📋 Supported File Formats - **PDF**: Extracted using pdfplumber with PyPDF2 fallback - **DOCX**: Microsoft Word documents - **TXT**: Plain text files - **CSV**: Structured datasets with resume text columns ## 🔒 Privacy & Security ### Data Privacy Statement **Your privacy is our top priority. We are committed to protecting the confidentiality of all resume data processed through this application.** #### Data Handling - **No Data Storage**: Resume content is processed in memory only and never stored permanently - **Session-Based**: All data is cleared when you close the browser or reset the application - **Local Processing**: All AI analysis happens locally within the application environment - **No External Transmission**: Resume data is never sent to external services or third parties #### Security Measures - **Temporary Files**: Uploaded files are processed in secure temporary locations and immediately deleted - **Memory Management**: Automatic cleanup of resume data from system memory - **No Logging**: Resume content is never logged or cached - **Secure Processing**: All text extraction and analysis occurs within isolated processing environments #### User Control - **Clear Data Options**: Multiple options to clear resume data and free memory - **Session Management**: Complete control over when and how data is processed - **Transparent Processing**: Full visibility into what data is being analyzed **We recommend reviewing your organization's data handling policies before uploading sensitive resume information.** ## 📈 Performance Metrics - **Accuracy**: Advanced multi-stage pipeline ensures high-quality candidate ranking - **Speed**: FAISS indexing enables sub-second search across thousands of resumes - **Scalability**: Efficient memory management for large resume datasets - **Reliability**: Fallback models ensure consistent operation ## 🔮 Future Enhancements - **Multi-language Support**: Extend to non-English resumes and job descriptions - **Custom Scoring Weights**: User-configurable importance of different scoring components - **Advanced Skill Extraction**: Enhanced NLP for technical skill identification - **Integration APIs**: Connect with ATS and HR management systems - **Batch Job Processing**: Queue-based processing for large-scale screening ## 📄 License MIT License - See LICENSE file for details ## 🤝 Contributing Contributions are welcome! Please feel free to submit pull requests or open issues for bugs and feature requests. --- *Built with ❤️ using Streamlit, Transformers, and FAISS* Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference