Spaces:

jacob-c
/

Resume_Screener_and_Skill_Extractor

Paused

App Files Files Community

root commited on May 21

Commit

e232281

1 Parent(s): 72d33a9

ss

Browse files

Files changed (5) hide show

README.md +61 -85
app.py +569 -583
explanation_generator.py +178 -0
fix_dependencies.py +76 -0
requirements.txt +15 -20

README.md CHANGED Viewed

@@ -12,102 +12,78 @@ license: mit
 # Resume Screener and Skill Extractor
-A comprehensive application for analyzing resumes, matching them to job positions, and providing personalized career advice.
 ## Features
-- **Skill Extraction**: Identifies relevant skills for specific job positions
-- **Resume Summarization**: Generates concise summaries of candidate backgrounds
-- **Skill Gap Analysis**: Identifies missing skills for target roles
-- **Career Advice**: Provides personalized recommendations for skill development and projects
-- **Experience Analysis**: Analyzes work history and career progression
-- **Fraud Detection**: Flags potential inconsistencies for verification
-## Installation
-### Fix Dependencies (Recommended)
-If you encounter any dependency issues, run the dependency fixer script:
-```bash
-python fix_dependencies.py
-```
-This will install compatible versions of all required packages.
-### Manual Installation
-Alternatively, you can install the dependencies manually:
 ```bash
 pip install -r requirements.txt
-python -m spacy download en_core_web_sm
-python -c "import nltk; nltk.download('punkt')"
-```
-## Common Issues and Solutions
-### ImportError: cannot import name 'cached_download' from 'huggingface_hub'
-This occurs due to version incompatibility between huggingface_hub and sentence_transformers. To fix:
-1. Run the dependency fixer script: `python fix_dependencies.py`
-2. Or manually install compatible versions: `pip install huggingface-hub==0.14.1 sentence-transformers==2.2.2`
-### PydanticImportError: `pydantic:ConstrainedStr` has been removed in V2
-This error occurs when using spaCy 3.5.0 with pydantic v2. To fix:
-1. Run the dependency fixer script: `python fix_dependencies.py`
-2. Or manually install a compatible pydantic version: `pip install "pydantic<2.0.0"`
-## Running the Application
-```bash
 streamlit run app.py
 ```
-## Usage
-1. Upload a resume in PDF format
-2. Select a target job position
-3. Review the analysis results in the different tabs
-4. Click "Generate Personalized Career Advice" to get recommendations
-## Dependencies
-- streamlit
-- pdfplumber
-- spacy
-- transformers
-- sentence-transformers
-- torch
-- nltk
-- plotly
-- pandas
-- numpy
-- matplotlib
-## Supported Job Positions
-- Software Engineer
-- Interaction Designer
-- Data Scientist
-## How it Works
-1. Upload your resume (PDF or DOCX format)
-2. Select the target job position
-3. The app will analyze your resume and provide:
-   - A list of matched skills with a match percentage
-   - An AI-generated summary of your resume
-   - Suggestions for skills you might want to develop
-## Technologies Used
-- Streamlit for the web interface
-- Hugging Face Transformers for AI-powered text summarization
-- spaCy for natural language processing
-- PyPDF2 and python-docx for document parsing
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 # Resume Screener and Skill Extractor
+A Hugging Face Space application for efficiently screening resumes against job descriptions using a hybrid ranking approach that combines semantic similarity with keyword-based scoring.
 ## Features
+- **Hybrid Resume Ranking**: Combines semantic similarity (via NV-Embed-v2) with keyword-based BM25 scoring
+- **Skill Extraction**: Automatically identifies relevant skills from resumes based on job requirements
+- **Fast Search**: Uses FAISS for efficient similarity search with large resume collections
+- **Multi-format Support**: Processes PDFs, DOCX, TXT, and CSV files
+- **Explanation Generation**: Provides explanations for why each resume was ranked highly
+- **Visualization**: Displays comparative scores and key matches for easy analysis
+- **Batch Processing**: Supports uploading multiple resumes simultaneously
+## How It Works
+1. **Input**: Provide a job description and upload resumes (PDF, DOCX, TXT, or CSV format)
+2. **Processing**: The system creates embeddings for both the job description and resumes using the NV-Embed-v2 model
+3. **Ranking**: Calculates a hybrid score based on:
+   - Semantic similarity (cosine similarity between embeddings)
+   - Keyword relevance (BM25 scoring)
+4. **Results**: Returns the top 10 most suitable resumes with:
+   - Overall score and individual component scores
+   - Matched skills and key phrases
+   - Explanations for why each resume was ranked highly
+## Technical Details
+### Models Used
+- **NV-Embed-v2**: State-of-the-art embedding model for semantic similarity
+- **QwQ-32B**: Used for generating explanations (simulated in the current version)
+### Libraries
+- **FAISS**: Facebook AI Similarity Search for fast vector similarity search
+- **rank_bm25**: Implementation of the BM25 algorithm for keyword-based scoring
+- **Streamlit**: For the user interface
+- **Hugging Face Transformers**: For accessing and using the models
+## Configuration Options
+The sidebar provides several configuration options:
+- **Model Selection**: Choose which embedding model to use
+- **Ranking Weights**: Adjust the balance between semantic similarity and keyword matching
+- **Results Count**: Set how many top results to display
+- **FAISS Usage**: Toggle the use of FAISS for faster searching with large resume collections
+## Getting Started
+### Online Usage
+1. Visit the Hugging Face Space at [URL]
+2. Enter a job description
+3. Upload resumes (PDF, DOCX, TXT, or CSV)
+4. Click "Find Top Candidates"
+5. Review the results
+### Local Installation
 ```bash
+git clone https://huggingface.co/spaces/[username]/Resume_Screener_and_Skill_Extractor
+cd Resume_Screener_and_Skill_Extractor
 pip install -r requirements.txt
 streamlit run app.py
 ```
+## Future Enhancements
+- Integration with Hugging Face datasets for loading resumes directly
+- Enhanced skill extraction using more sophisticated NLP techniques
+- Real-time explanation generation using QwQ-32B
+- Support for additional file formats and languages
+- Customizable scoring algorithms and weights
+## License
+MIT License
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

app.py CHANGED Viewed

@@ -1,642 +1,628 @@
 import streamlit as st
 import pdfplumber
-import re
 import pandas as pd
-import matplotlib.pyplot as plt
-import torch
-from datetime import datetime
-import plotly.express as px
-import plotly.graph_objects as go
 import numpy as np
-# Display startup message
 st.set_page_config(
     page_title="Resume Screener & Skill Extractor",
     page_icon="📄",
-    layout="wide"
 )
-st.title("📄 Resume Screener & Skill Extractor")
-startup_message = st.empty()
-startup_message.info("Loading dependencies and models... This may take a minute on first run.")
-# Import dependencies with fallbacks
-try:
-    import spacy
-    spacy_available = True
-except ImportError:
-    spacy_available = False
-    st.warning("spaCy is not available. Some features will be limited.")
-try:
-    from transformers import pipeline
-    transformers_available = True
-except ImportError:
-    transformers_available = False
-    st.warning("Transformers is not available. Summary generation will be limited.")
-try:
-    import nltk
-    from nltk.tokenize import word_tokenize
-    nltk_available = True
-    # Download required NLTK resources
-    try:
-        nltk.data.find('tokenizers/punkt')
-    except LookupError:
-        nltk.download('punkt')
-except ImportError:
-    nltk_available = False
-    st.warning("NLTK is not available. Some text processing features will be limited.")
-# Custom sentence-transformers fallback
-try:
-    from sentence_transformers import SentenceTransformer
-    try:
-        from sentence_transformers import util as st_util
-        sentence_transformers_available = True
-    except ImportError:
-        # Define our own utility functions
-        class CustomSTUtil:
-            @staticmethod
-            def pytorch_cos_sim(a, b):
-                if not isinstance(a, torch.Tensor):
-                    a = torch.tensor(a)
-                if not isinstance(b, torch.Tensor):
-                    b = torch.tensor(b)
-                if len(a.shape) == 1:
-                    a = a.unsqueeze(0)
-                if len(b.shape) == 1:
-                    b = b.unsqueeze(0)
-                a_norm = torch.nn.functional.normalize(a, p=2, dim=1)
-                b_norm = torch.nn.functional.normalize(b, p=2, dim=1)
-                return torch.mm(a_norm, b_norm.transpose(0, 1))
-        st_util = CustomSTUtil()
-        sentence_transformers_available = True
-except ImportError:
-    sentence_transformers_available = False
-    st.warning("Sentence Transformers is not available. Semantic matching will be disabled.")
-# Load models with exception handling
-@st.cache_resource
-def load_models():
-    models = {}
-    # Load spaCy if available
-    if spacy_available:
-        try:
-            models['nlp'] = spacy.load("en_core_web_sm")
-        except OSError:
-            try:
-                import subprocess
-                import sys
-                subprocess.check_call([sys.executable, "-m", "spacy", "download", "en_core_web_sm"])
-                models['nlp'] = spacy.load("en_core_web_sm")
-            except Exception as e:
-                st.warning(f"Could not load spaCy model: {e}")
-                models['nlp'] = None
-    else:
-        models['nlp'] = None
-    # Load summarizer if transformers available
-    if transformers_available:
-        try:
-            models['summarizer'] = pipeline("summarization", model="facebook/bart-large-cnn")
-        except Exception as e:
-            st.warning(f"Could not load summarizer model: {e}")
-            # Simple fallback summarizer
-            models['summarizer'] = lambda text, **kwargs: [{"summary_text": ". ".join(text.split(". ")[:5]) + "."}]
-    else:
-        # Simple fallback summarizer
-        models['summarizer'] = lambda text, **kwargs: [{"summary_text": ". ".join(text.split(". ")[:5]) + "."}]
-    # Load sentence transformer if available
-    if sentence_transformers_available:
-        try:
-            models['sentence_model'] = SentenceTransformer('paraphrase-MiniLM-L6-v2')
-        except Exception as e:
-            st.warning(f"Could not load sentence transformer model: {e}")
-            models['sentence_model'] = None
-    else:
-        models['sentence_model'] = None
-    return models
-# Job descriptions dictionary
-job_descriptions = {
-    "Software Engineer": {
-        "skills": ["python", "java", "javascript", "sql", "algorithms", "data structures",
-                  "git", "cloud", "web development", "software development", "coding"],
-        "description": "Looking for software engineers with strong programming skills and experience in software development.",
-        "must_have": ["python", "git", "algorithms"],
-        "nice_to_have": ["cloud", "java", "javascript"],
-        "seniority_levels": {
-            "Junior": "0-2 years of experience, familiar with basic programming concepts",
-            "Mid-level": "3-5 years of experience, proficient in multiple languages, experience with system design",
-            "Senior": "6+ years of experience, expert in software architecture, mentoring, and leading projects"
-        }
-    },
-    "Interaction Designer": {
-        "skills": ["ui", "ux", "user research", "wireframing", "prototyping", "figma",
-                  "sketch", "adobe", "design thinking", "interaction design"],
-        "description": "Seeking interaction designers with expertise in user experience and interface design.",
-        "must_have": ["ui", "ux", "prototyping"],
-        "nice_to_have": ["figma", "sketch", "user research"],
-        "seniority_levels": {
-            "Junior": "0-2 years of experience, basic design skills, understanding of UX principles",
-            "Mid-level": "3-5 years of experience, strong portfolio, experience with user research",
-            "Senior": "6+ years of experience, leadership in design systems, driving design strategy"
-        }
-    },
-    "Data Scientist": {
-        "skills": ["python", "r", "statistics", "machine learning", "data analysis",
-                  "sql", "tensorflow", "pytorch", "pandas", "numpy"],
-        "description": "Looking for data scientists with strong analytical and machine learning skills.",
-        "must_have": ["python", "statistics", "machine learning"],
-        "nice_to_have": ["tensorflow", "pytorch", "r"],
-        "seniority_levels": {
-            "Junior": "0-2 years of experience, basic knowledge of statistics and ML algorithms",
-            "Mid-level": "3-5 years of experience, model development, feature engineering",
-            "Senior": "6+ years of experience, advanced ML techniques, research experience"
-        }
-    }
-}
-# Core functionality
-def extract_text_from_pdf(pdf_file):
-    """Extract text from PDF file."""
-    text = ""
-    try:
-        with pdfplumber.open(pdf_file) as pdf:
-            for page in pdf.pages:
-                text += page.extract_text() or ""
-    except Exception as e:
-        st.error(f"Error extracting text from PDF: {e}")
-    return text
-def extract_skills(text, job_title, nlp=None):
-    """Extract skills from resume text."""
-    found_skills = []
-    required_skills = job_descriptions[job_title]["skills"]
-    # Simple keyword matching (no NLP needed)
-    for skill in required_skills:
-        if skill.lower() in text.lower():
-            found_skills.append(skill)
-    return found_skills
-def extract_experience(text):
-    """Extract work experience from resume text."""
-    experiences = []
-    # Define regex pattern for experiences
-    experience_pattern = r"(?i)(\w+[\w\s&,.']+)\s*(?:[-|•]|\bat\b)\s*([A-Za-z][\w\s&,.']+)\s*(?:[-|•]|\bfrom\b)\s*(\d{4}(?:\s*[-–]\s*(?:\d{4}|present|current)))"
-    matches = re.finditer(experience_pattern, text)
-    for match in matches:
-        company = match.group(1).strip()
-        role = match.group(2).strip()
-        duration = match.group(3).strip()
-        # Process dates
-        try:
-            date_parts = re.split(r'[-–]', duration)
-            start_year = int(date_parts[0].strip())
-            if len(date_parts) > 1 and 'present' not in date_parts[1].lower() and 'current' not in date_parts[1].lower():
-                end_year = int(date_parts[1].strip())
-                end_date = datetime(end_year, 12, 31)
-            else:
-                end_year = datetime.now().year
-                end_date = datetime.now()
-            start_date = datetime(start_year, 1, 1)
-            duration_months = (end_date.year - start_date.year) * 12 + (end_date.month - start_date.month)
-            experiences.append({
-                'company': company,
-                'role': role,
-                'start_date': start_date,
-                'end_date': end_date,
-                'duration_months': duration_months
-            })
-        except:
-            experiences.append({
-                'company': company,
-                'role': role,
-                'duration': duration
-            })
-    return experiences
-def analyze_resume(text, job_title, models):
-    """Analyze resume text."""
-    # Extract skills
-    found_skills = extract_skills(text, job_title, models.get('nlp'))
-    # Generate summary
-    if models.get('summarizer'):
         try:
-            summary = models['summarizer'](text[:3000], max_length=150, min_length=50, do_sample=False)[0]["summary_text"]
         except Exception as e:
-            st.warning(f"Error generating summary: {e}")
-            summary = text[:500] + "..."
-    else:
-        summary = text[:500] + "..."
-    # Extract work experience
-    experiences = extract_experience(text)
-    # Calculate semantic match score
-    match_score = 0
-    if models.get('sentence_model') and sentence_transformers_available:
-        try:
-            resume_embedding = models['sentence_model'].encode(text[:5000], convert_to_tensor=True)
-            job_embedding = models['sentence_model'].encode(job_descriptions[job_title]["description"], convert_to_tensor=True)
-            match_score = float(st_util.pytorch_cos_sim(resume_embedding, job_embedding)[0][0]) * 100
-        except Exception as e:
-            st.warning(f"Error calculating semantic match: {e}")
-    else:
-        # Fallback to keyword-based score
-        match_score = (len(found_skills) / len(job_descriptions[job_title]["skills"])) * 100
-    # Calculate seniority level
-    years_exp = sum(exp.get('duration_months', 0) for exp in experiences if 'duration_months' in exp) / 12
-    if years_exp < 3:
-        seniority = "Junior"
-    elif years_exp < 6:
-        seniority = "Mid-level"
-    else:
-        seniority = "Senior"
-    # Detect skill levels
-    skill_levels = {}
-    for skill in found_skills:
-        # Default level
-        skill_levels[skill] = "intermediate"
-        # Look for advanced indicators
-        advanced_patterns = [
-            f"expert in {skill}",
-            f"advanced {skill}",
-            f"extensive experience with {skill}"
-        ]
-        if any(pattern in text.lower() for pattern in advanced_patterns):
-            skill_levels[skill] = "advanced"
-        # Look for basic indicators
-        basic_patterns = [
-            f"familiar with {skill}",
-            f"basic knowledge of {skill}",
-            f"introduced to {skill}"
-        ]
-        if any(pattern in text.lower() for pattern in basic_patterns):
-            skill_levels[skill] = "basic"
-    # Check for inconsistencies in timeline
-    inconsistencies = []
-    if len(experiences) >= 2:
-        # Sort experiences by start date
-        sorted_exps = sorted(
-            [exp for exp in experiences if 'start_date' in exp],
-            key=lambda x: x['start_date']
-        )
-        # Check for overlaps
-        for i in range(len(sorted_exps) - 1):
-            current = sorted_exps[i]
-            next_exp = sorted_exps[i+1]
-            if current['end_date'] > next_exp['start_date']:
-                inconsistencies.append({
-                    'type': 'overlap',
-                    'description': f"Overlapping roles at {current['company']} and {next_exp['company']}"
-                })
-    # Generate a simple career prediction
-    career_prediction = predict_career_path(seniority, job_title)
-    return {
-        'found_skills': found_skills,
-        'skill_levels': skill_levels,
-        'summary': summary,
-        'experiences': experiences,
-        'match_score': match_score,
-        'seniority': seniority,
-        'years_experience': years_exp,
-        'inconsistencies': inconsistencies,
-        'career_prediction': career_prediction
-    }
-def predict_career_path(seniority, job_title):
-    """Generate a simple career prediction."""
-    if seniority == "Junior":
-        return f"Next potential role: Senior {job_title}"
-    elif seniority == "Mid-level":
-        roles = {
-            "Software Engineer": "Team Lead, Technical Lead, or Engineering Manager",
-            "Data Scientist": "Senior Data Scientist or Data Science Lead",
-            "Interaction Designer": "Senior Designer or UX Lead"
-        }
-        return f"Next potential roles: {roles.get(job_title, f'Senior {job_title}')}"
-    else: # Senior
-        roles = {
-            "Software Engineer": "Engineering Manager, Software Architect, or CTO",
-            "Data Scientist": "Head of Data Science, ML Engineering Manager, or Chief Data Officer",
-            "Interaction Designer": "Design Director, Head of UX, or VP of Design"
-        }
-        return f"Next potential roles: {roles.get(job_title, f'Director of {job_title}')}"
-def generate_career_advice(resume_text, job_title, found_skills, missing_skills):
-    """Generate career advice based on resume analysis."""
-    advice = f"""## Career Development Plan for {job_title}
-### Skills to Develop
-The following skills would strengthen your profile for this position:
-"""
-    for skill in missing_skills:
-        advice += f"- **{skill.title()}**: "
-        if skill == "python":
-            advice += "Take online courses like Coursera's Python for Everybody or follow tutorials on Real Python."
-        elif skill == "java":
-            advice += "Complete the Oracle Java Certification or contribute to open-source Java projects."
-        elif skill == "javascript":
-            advice += "Build interactive web applications using modern frameworks like React or Vue."
-        elif skill == "cloud":
-            advice += "Get hands-on experience with AWS, Azure, or GCP through their free tier offerings."
-        elif "algorithm" in skill or "data structure" in skill:
-            advice += "Practice on platforms like LeetCode or HackerRank and study algorithm design principles."
-        elif "ui" in skill or "ux" in skill:
-            advice += "Create a portfolio of design work and study interaction design principles."
-        elif "machine learning" in skill:
-            advice += "Take Andrew Ng's Machine Learning course on Coursera and work on ML projects with real datasets."
         else:
-            advice += f"Research and practice this skill through online courses, tutorials, and hands-on projects."
-        advice += "\n\n"
-    advice += f"""
-### Project Ideas
-Consider these projects to showcase your skills for a {job_title} position:
-"""
-    if job_title == "Software Engineer":
-        advice += """
-1. **Full-Stack Web Application**: Build a complete web app with frontend, backend, and database
-2. **API Service**: Create a RESTful or GraphQL API with proper authentication and documentation
-3. **Open Source Contribution**: Contribute to relevant open-source projects in your area of interest
-"""
-    elif job_title == "Data Scientist":
-        advice += """
-1. **Predictive Model**: Build and deploy a machine learning model that solves a real-world problem
-2. **Data Dashboard**: Create an interactive visualization dashboard for complex datasets
-3. **Natural Language Processing**: Develop a text classification or sentiment analysis project
-"""
-    elif job_title == "Interaction Designer":
-        advice += """
-1. **Design System**: Create a comprehensive design system with components and usage guidelines
-2. **UX Case Study**: Document your design process for a real or fictional product improvement
-3. **Interactive Prototype**: Design a fully functional prototype that demonstrates your interaction design skills
-"""
-    advice += """
-### Learning Resources
-- **Online Platforms**: Coursera, Udemy, Pluralsight, LinkedIn Learning
-- **Practice Sites**: GitHub, HackerRank, LeetCode, Kaggle
-- **Communities**: Stack Overflow, Reddit programming communities, relevant Discord servers
-"""
-    return advice
-# Load models
-models = load_models()
-# Clear startup message
-startup_message.empty()
-# App description
-st.markdown("""
-This app helps recruiters analyze resumes by:
-- Extracting relevant skills for specific job positions
-- Generating a concise summary of the candidate's background
-- Identifying skill gaps for the selected role
-- Providing personalized career advice and project recommendations
-""")
-# Create two columns
-col1, col2 = st.columns([2, 1])
-with col1:
-    # File upload
-    uploaded_file = st.file_uploader("Upload Resume (PDF)", type=["pdf"])
-with col2:
-    # Job selection
-    job_title = st.selectbox("Select Job Position", list(job_descriptions.keys()))
-    # Show job description
-    if job_title:
-        st.info(f"**Required Skills:**\n" +
-                "\n".join([f"- {skill.title()}" for skill in job_descriptions[job_title]["skills"]]))
-if uploaded_file and job_title:
-    try:
-        # Show spinner while processing
-        with st.spinner("Analyzing resume..."):
-            # Extract text from PDF
-            text = extract_text_from_pdf(uploaded_file)
-            # Analyze resume
-            analysis_results = analyze_resume(text, job_title, models)
-            # Calculate missing skills
-            missing_skills = [skill for skill in job_descriptions[job_title]["skills"]
-                             if skill not in analysis_results['found_skills']]
-        # Display results in tabs
-        tab1, tab2, tab3, tab4 = st.tabs([
-            "📊 Skills Match",
-            "📝 Resume Summary",
-            "🎯 Skills Gap",
-            "🚀 Career Advice"
-        ])
-        with tab1:
-            # Create two columns
-            col1, col2 = st.columns(2)
-            with col1:
-                # Display matched skills
-                st.subheader("🎯 Matched Skills")
-                if analysis_results['found_skills']:
-                    for skill in analysis_results['found_skills']:
-                        # Show skill with proficiency level
-                        level = analysis_results['skill_levels'].get(skill, 'intermediate')
-                        level_emoji = "🟢" if level == 'advanced' else "🟡" if level == 'intermediate' else "🟠"
-                        st.success(f"{level_emoji} {skill.title()} ({level.title()})")
-                    # Calculate match percentage
-                    match_percentage = len(analysis_results['found_skills']) / len(job_descriptions[job_title]["skills"]) * 100
-                    st.metric("Skills Match", f"{match_percentage:.1f}%")
-                else:
-                    st.warning("No direct skill matches found.")
-            with col2:
-                # Display semantic match score
-                st.subheader("💡 Semantic Match")
-                st.metric("Overall Match Score", f"{analysis_results['match_score']:.1f}%")
-                # Display must-have skills match
-                must_have_skills = job_descriptions[job_title]["must_have"]
-                must_have_count = sum(1 for skill in must_have_skills if skill in analysis_results['found_skills'])
-                must_have_percentage = (must_have_count / len(must_have_skills)) * 100
-                st.write("Must-have skills:")
-                st.progress(must_have_percentage / 100)
-                st.write(f"{must_have_count} out of {len(must_have_skills)} ({must_have_percentage:.1f}%)")
-                # Professional level assessment
-                st.subheader("🧠 Seniority Assessment")
-                st.info(f"**{analysis_results['seniority']}** ({analysis_results['years_experience']:.1f} years equivalent experience)")
-                st.write(job_descriptions[job_title]["seniority_levels"][analysis_results['seniority']])
-        with tab2:
-            # Display resume summary
-            st.subheader("📝 Resume Summary")
-            st.write(analysis_results['summary'])
-            # Display experience timeline
-            st.subheader("⏳ Experience Timeline")
-            if analysis_results['experiences']:
-                # Convert experiences to dataframe for display
-                exp_data = []
-                for exp in analysis_results['experiences']:
-                    if 'start_date' in exp and 'end_date' in exp:
-                        exp_data.append({
-                            'Company': exp['company'],
-                            'Role': exp['role'],
-                            'Start Date': exp['start_date'].strftime('%b %Y') if exp['start_date'] else 'Unknown',
-                            'End Date': exp['end_date'].strftime('%b %Y') if exp['end_date'] != datetime.now() else 'Present',
-                            'Duration (months)': exp.get('duration_months', 'Unknown')
-                        })
-                    else:
-                        exp_data.append({
-                            'Company': exp['company'],
-                            'Role': exp['role'],
-                            'Duration': exp.get('duration', 'Unknown')
-                        })
-                if exp_data:
-                    exp_df = pd.DataFrame(exp_data)
-                    st.dataframe(exp_df)
-                    # Create a timeline visualization if dates are available
-                    timeline_data = [exp for exp in analysis_results['experiences'] if 'start_date' in exp and 'end_date' in exp]
-                    if timeline_data and len(timeline_data) > 0:
-                        try:
-                            # Sort by start date
-                            timeline_data = sorted(timeline_data, key=lambda x: x['start_date'])
-                            # Create figure
-                            fig = go.Figure()
-                            for i, exp in enumerate(timeline_data):
-                                fig.add_trace(go.Bar(
-                                    x=[(exp['end_date'] - exp['start_date']).days / 30],  # Duration in months
-                                    y=[exp['company']],
-                                    orientation='h',
-                                    name=exp['role'],
-                                    hovertext=f"{exp['role']} at {exp['company']}",
-                                    marker=dict(color=px.colors.qualitative.Plotly[i % len(px.colors.qualitative.Plotly)])
-                                ))
-                            fig.update_layout(
-                                title="Career Timeline",
-                                xaxis_title="Duration (months)",
-                                yaxis_title="Company",
-                                height=400,
-                                margin=dict(l=0, r=0, b=0, t=30)
-                            )
-                            st.plotly_chart(fig, use_container_width=True)
-                        except Exception as e:
-                            st.warning(f"Could not create timeline visualization: {e}")
-            else:
-                st.warning("No work experience data could be extracted.")
-        with tab3:
-            # Display missing skills
-            st.subheader("📌 Skills to Develop")
-            # Create two columns
-            col1, col2 = st.columns(2)
             with col1:
-                # Missing skills
-                if missing_skills:
-                    for skill in missing_skills:
-                        st.warning(f"➖ {skill.title()}")
                 else:
-                    st.success("Great! The candidate has all the required skills!")
             with col2:
-                # Skills gap analysis
-                st.subheader("🔍 Gap Analysis")
-                # Show must-have skills that are missing
-                missing_must_have = [skill for skill in job_descriptions[job_title]["must_have"]
-                                   if skill not in analysis_results['found_skills']]
-                if missing_must_have:
-                    st.error("**Critical Skills Missing:**")
-                    for skill in missing_must_have:
-                        st.write(f"- {skill.title()}")
-                    st.markdown("These are must-have skills for this position.")
-                else:
-                    st.success("Candidate has all the must-have skills for this position!")
-                # Show nice-to-have skills gap
-                missing_nice_to_have = [skill for skill in job_descriptions[job_title]["nice_to_have"]
-                                      if skill not in analysis_results['found_skills']]
-                if missing_nice_to_have:
-                    st.warning("**Nice-to-Have Skills Missing:**")
-                    for skill in missing_nice_to_have:
-                        st.write(f"- {skill.title()}")
-                else:
-                    st.success("Candidate has all the nice-to-have skills!")
-            # Display career trajectory
-            st.subheader("👨‍💼 Career Trajectory")
-            st.info(analysis_results['career_prediction'])
-        with tab4:
-            # Display career advice
-            st.subheader("🚀 Career Advice and Project Recommendations")
-            if st.button("Generate Career Advice"):
-                with st.spinner("Generating personalized career advice..."):
-                    advice = generate_career_advice(text, job_title, analysis_results['found_skills'], missing_skills)
-                    st.markdown(advice)
-    except Exception as e:
-        st.error(f"An error occurred while processing the resume: {str(e)}")
-        st.exception(e)
-# Add footer
 st.markdown("---")
-st.markdown("Made with ❤️ using Streamlit and Hugging Face")

 import streamlit as st
 import pdfplumber
 import pandas as pd
 import numpy as np
+import torch
+import nltk
+import faiss
+import os
+import tempfile
+import base64
+from rank_bm25 import BM25Okapi
+from transformers import AutoModel, AutoTokenizer
+from sentence_transformers import SentenceTransformer
+from nltk.tokenize import word_tokenize, sent_tokenize
+from tqdm import tqdm
+import re
+import io
+import PyPDF2
+from docx import Document
+import csv
+from explanation_generator import ExplanationGenerator
+# Download NLTK resources
+try:
+    nltk.data.find('tokenizers/punkt')
+except LookupError:
+    nltk.download('punkt')
+# Set page configuration
 st.set_page_config(
     page_title="Resume Screener & Skill Extractor",
     page_icon="📄",
+    layout="wide",
+    initial_sidebar_state="expanded"
 )
+# Sidebar for model selection and weights
+with st.sidebar:
+    st.title("Configuration")
+    # Model selection
+    embedding_model_name = st.selectbox(
+        "Embedding Model",
+        ["nvidia/NV-Embed-v2"],
+        index=0
+    )
+    explanation_model_name = st.selectbox(
+        "Explanation Model",
+        ["Qwen/QwQ-32B"],
+        index=0
+    )
+    # Ranking weights
+    st.subheader("Ranking Weights")
+    semantic_weight = st.slider("Semantic Similarity Weight", 0.0, 1.0, 0.7, 0.1)
+    keyword_weight = 1.0 - semantic_weight
+    st.write(f"Keyword Weight: {keyword_weight:.1f}")
+    # Advanced options
+    st.subheader("Advanced Options")
+    top_k = st.number_input("Number of results to display", min_value=1, max_value=20, value=10, step=1)
+    use_explanation = st.checkbox("Generate Explanations", value=True)
+    use_faiss = st.checkbox("Use FAISS for fast search", value=True)
+    st.markdown("---")
+    st.markdown("### About")
+    st.markdown("This app uses a hybrid ranking system combining semantic similarity with keyword matching to find the most suitable resumes for a job position.")
+# Initialize session state variables
+if 'resumes_uploaded' not in st.session_state:
+    st.session_state.resumes_uploaded = False
+if 'job_description' not in st.session_state:
+    st.session_state.job_description = ""
+if 'results' not in st.session_state:
+    st.session_state.results = []
+if 'embedding_model' not in st.session_state:
+    st.session_state.embedding_model = None
+if 'tokenizer' not in st.session_state:
+    st.session_state.tokenizer = None
+if 'faiss_index' not in st.session_state:
+    st.session_state.faiss_index = None
+if 'explanation_generator' not in st.session_state:
+    st.session_state.explanation_generator = None
+class ResumeScreener:
+    def __init__(self, embedding_model_name="nvidia/NV-Embed-v2", explanation_model_name="Qwen/QwQ-32B"):
+        """Initialize the ResumeScreener with the specified embedding model"""
+        self.embedding_model_name = embedding_model_name
+        self.explanation_model_name = explanation_model_name
+        self.model = None
+        self.tokenizer = None
+        self.faiss_index = None
+        self.embedding_size = None
+        self.explanation_generator = None
+    def load_model(self):
+        """Load the embedding model from Hugging Face"""
+        if st.session_state.embedding_model is None:
+            with st.spinner(f"Loading model {self.embedding_model_name}..."):
+                try:
+                    if "sentence-transformers" in self.embedding_model_name:
+                        self.model = SentenceTransformer(self.embedding_model_name)
+                    else:
+                        self.tokenizer = AutoTokenizer.from_pretrained(self.embedding_model_name)
+                        self.model = AutoModel.from_pretrained(self.embedding_model_name)
+                    st.session_state.embedding_model = self.model
+                    st.session_state.tokenizer = self.tokenizer
+                    # Get embedding size
+                    if "sentence-transformers" in self.embedding_model_name:
+                        self.embedding_size = self.model.get_sentence_embedding_dimension()
+                    else:
+                        # For non-sentence-transformers, we'll determine this after first embedding
+                        pass
+                except Exception as e:
+                    st.error(f"Error loading model: {str(e)}")
+                    st.stop()
+        else:
+            self.model = st.session_state.embedding_model
+            self.tokenizer = st.session_state.tokenizer
+        # Initialize explanation generator if needed
+        if use_explanation and st.session_state.explanation_generator is None:
+            st.session_state.explanation_generator = ExplanationGenerator(self.explanation_model_name)
+            self.explanation_generator = st.session_state.explanation_generator
+        elif use_explanation:
+            self.explanation_generator = st.session_state.explanation_generator
+    def extract_text_from_file(self, file, file_type):
+        """Extract text from various file types"""
         try:
+            if file_type == "pdf":
+                # Use pdfplumber for better text extraction
+                with pdfplumber.open(file) as pdf:
+                    text = ""
+                    for page in pdf.pages:
+                        text += page.extract_text() or ""
+                    # If pdfplumber fails, try PyPDF2 as fallback
+                    if not text.strip():
+                        reader = PyPDF2.PdfReader(file)
+                        text = ""
+                        for page_num in range(len(reader.pages)):
+                            page = reader.pages[page_num]
+                            text += page.extract_text() or ""
+                    return text
+            elif file_type == "docx":
+                doc = Document(file)
+                return " ".join([paragraph.text for paragraph in doc.paragraphs])
+            elif file_type == "txt":
+                return file.read().decode("utf-8")
+            elif file_type == "csv":
+                csv_text = ""
+                csv_reader = csv.reader(io.StringIO(file.read().decode("utf-8")))
+                for row in csv_reader:
+                    csv_text += " ".join(row) + " "
+                return csv_text
+            else:
+                st.error(f"Unsupported file type: {file_type}")
+                return ""
         except Exception as e:
+            st.error(f"Error extracting text from file: {str(e)}")
+            return ""
+    def get_embedding(self, text):
+        """Generate text embedding for a given text"""
+        if "sentence-transformers" in self.embedding_model_name:
+            # For sentence-transformers models
+            embedding = self.model.encode([text], convert_to_tensor=True, show_progress_bar=False)[0]
+            embedding_np = embedding.cpu().detach().numpy()
+            # Set embedding size if not set
+            if self.embedding_size is None:
+                self.embedding_size = embedding_np.shape[0]
+            return embedding_np
+        else:
+            # For HuggingFace models
+            inputs = self.tokenizer(text, return_tensors="pt", truncation=True, max_length=512, padding=True)
+            with torch.no_grad():
+                outputs = self.model(**inputs)
+            # Use [CLS] token embedding or mean pooling based on model architecture
+            if hasattr(outputs, "last_hidden_state"):
+                # Mean pooling across token dimension
+                embeddings = outputs.last_hidden_state.mean(dim=1).squeeze()
+                embedding_np = embeddings.cpu().detach().numpy()
+                # Set embedding size if not set
+                if self.embedding_size is None:
+                    self.embedding_size = embedding_np.shape[0]
+                return embedding_np
+            else:
+                # For models that return a specific embedding
+                embedding_np = outputs.cpu().detach().numpy()
+                # Set embedding size if not set
+                if self.embedding_size is None:
+                    self.embedding_size = embedding_np.shape[0]
+                return embedding_np
+    def create_faiss_index(self, embeddings):
+        """Create a FAISS index for fast similarity search"""
+        # Get the dimension of the embeddings
+        dimension = embeddings[0].shape[0]
+        # Create a FAISS index
+        index = faiss.IndexFlatIP(dimension)  # Inner product for cosine similarity with normalized vectors
+        # Add normalized vectors to the index
+        embeddings_normalized = np.vstack([emb / np.linalg.norm(emb) for emb in embeddings])
+        index.add(embeddings_normalized)
+        return index
+    def query_faiss_index(self, index, query_embedding, k=10):
+        """Query the FAISS index with a query embedding"""
+        # Normalize query embedding
+        query_embedding = query_embedding / np.linalg.norm(query_embedding)
+        # Reshape to a row vector if needed
+        if len(query_embedding.shape) == 1:
+            query_embedding = query_embedding.reshape(1, -1)
+        # Query the index
+        scores, indices = index.search(query_embedding, k)
+        return scores[0], indices[0]  # Return the scores and indices as flat arrays
+    def calculate_bm25_scores(self, resume_texts, job_description):
+        """Calculate BM25 scores for keyword matching"""
+        # Tokenize job description
+        job_tokens = word_tokenize(job_description.lower())
+        # Prepare corpus from resumes
+        corpus = [word_tokenize(resume.lower()) for resume in resume_texts]
+        # Initialize BM25
+        bm25 = BM25Okapi(corpus)
+        # Calculate scores
+        scores = bm25.get_scores(job_tokens)
+        return scores
+    def calculate_hybrid_scores(self, resume_texts, resume_embeddings, job_embedding, semantic_weight=0.7, use_faiss=True):
+        """Calculate hybrid scores combining semantic similarity and BM25"""
+        # Calculate semantic similarity scores (cosine similarity)
+        if use_faiss and len(resume_embeddings) > 10:
+            # Create FAISS index if not already created
+            if st.session_state.faiss_index is None:
+                index = self.create_faiss_index(resume_embeddings)
+                st.session_state.faiss_index = index
+            else:
+                index = st.session_state.faiss_index
+            # Query index with job embedding
+            faiss_scores, faiss_indices = self.query_faiss_index(index, job_embedding, k=len(resume_embeddings))
+            # Create full semantic scores array
+            semantic_scores = np.zeros(len(resume_embeddings))
+            for i, idx in enumerate(faiss_indices):
+                if idx < len(resume_embeddings):
+                    semantic_scores[idx] = faiss_scores[i]
         else:
+            # Direct cosine similarity calculation for smaller datasets
+            semantic_scores = []
+            for emb in resume_embeddings:
+                # Normalize the embeddings for cosine similarity
+                emb_norm = emb / np.linalg.norm(emb)
+                job_emb_norm = job_embedding / np.linalg.norm(job_embedding)
+                # Calculate cosine similarity
+                similarity = np.dot(emb_norm, job_emb_norm)
+                semantic_scores.append(similarity)
+        # Calculate BM25 scores
+        bm25_scores = self.calculate_bm25_scores(resume_texts, job_description)
+        # Normalize BM25 scores
+        if max(bm25_scores) > 0:
+            bm25_scores = [score / max(bm25_scores) for score in bm25_scores]
+        # Calculate hybrid scores
+        keyword_weight = 1.0 - semantic_weight
+        hybrid_scores = [
+            (semantic_weight * sem_score) + (keyword_weight * bm25_score)
+            for sem_score, bm25_score in zip(semantic_scores, bm25_scores)
+        ]
+        return hybrid_scores, semantic_scores, bm25_scores
+    def extract_skills(self, text, job_description):
+        """Extract skills from text based on job description"""
+        # Simple skill extraction using regex and job description keywords
+        # In a real implementation, this could be enhanced with ML-based skill extraction
+        # Extract potential skills from job description (words 3 letters or longer)
+        potential_skills = set()
+        # Common skill-related phrases that might appear in job descriptions
+        skill_indicators = ["experience with", "knowledge of", "familiar with", "proficient in",
+                           "skills in", "expertise in", "background in", "capabilities in",
+                           "years of experience in", "understanding of", "trained in"]
+        # Extract skills from sentences containing skill indicators
+        sentences = sent_tokenize(job_description)
+        for sentence in sentences:
+            sentence_lower = sentence.lower()
+            for indicator in skill_indicators:
+                if indicator in sentence_lower:
+                    # Extract words after the indicator, possibly until end of sentence or punctuation
+                    skills_part = sentence_lower.split(indicator, 1)[1]
+                    # Extract words, cleaning up symbols
+                    words = re.findall(r'\b[a-zA-Z0-9+#/.]+\b', skills_part)
+                    for word in words:
+                        if len(word) >= 3:  # Only consider words 3 letters or longer
+                            potential_skills.add(word.lower())
+        # Add explicit skills - look for comma-separated lists or bullet points
+        skill_lists = re.findall(r'(?:skills|requirements|qualifications)[^\n.]*?:(.+?)(?:\n|$)', job_description.lower())
+        for skill_list in skill_lists:
+            words = re.findall(r'\b[a-zA-Z0-9+#/.]+\b', skill_list)
+            for word in words:
+                if len(word) >= 3:
+                    potential_skills.add(word.lower())
+        # Add common tech skills if they appear in the job description
+        common_tech_skills = ["python", "java", "c++", "javascript", "sql", "react", "node.js", "typescript",
+                              "html", "css", "aws", "azure", "gcp", "docker", "kubernetes", "terraform",
+                              "git", "ci/cd", "agile", "scrum", "rest", "graphql", "ml", "ai", "data science"]
+        for skill in common_tech_skills:
+            if skill in job_description.lower():
+                potential_skills.add(skill)
+        # Find skills in the resume
+        matched_skills = []
+        for skill in potential_skills:
+            # Make it a word boundary search with regex
+            pattern = r'\b' + re.escape(skill) + r'\b'
+            matches = re.findall(pattern, text.lower())
+            if matches:
+                matched_skills.append(skill)
+        return list(set(matched_skills))
+    def extract_key_phrases(self, text, job_description):
+        """Extract key phrases from text that match job description keywords"""
+        # Identify job skills first
+        skills = self.extract_skills(job_description, job_description)
+        # Extract sentences that contain skills
+        sentences = sent_tokenize(text)
+        skill_sentences = []
+        for sentence in sentences:
+            sentence_lower = sentence.lower()
+            for skill in skills:
+                if skill in sentence_lower:
+                    # Append the sentence with the skill highlighted
+                    highlighted = sentence.replace(skill, f"**{skill}**")
+                    skill_sentences.append(highlighted)
+                    break
+        # Get additional generic matches if we don't have enough skill sentences
+        if len(skill_sentences) < 5:
+            # Simple extraction based on job description keywords
+            job_tokens = set(word.lower() for word in word_tokenize(job_description) if len(word) > 3)
+            text_tokens = word_tokenize(text)
+            matches = []
+            for i, token in enumerate(text_tokens):
+                if token.lower() in job_tokens:
+                    # Get a phrase context (5 words before and after)
+                    start = max(0, i - 5)
+                    end = min(len(text_tokens), i + 6)
+                    phrase = " ".join(text_tokens[start:end])
+                    matches.append(phrase)
+            # Add unique phrases to complement skill sentences
+            unique_matches = list(set(matches))
+            skill_sentences.extend(unique_matches[:5 - len(skill_sentences)])
+        # Return unique phrases, up to 5
+        return skill_sentences[:5]
+    def generate_explanation(self, resume_text, job_description, score, semantic_score, bm25_score, skills):
+        """Generate explanation for why a resume was ranked highly using QwQ-32B model"""
+        # Use the explanation generator if available
+        if use_explanation and self.explanation_generator:
+            return self.explanation_generator.generate_explanation(
+                resume_text,
+                job_description,
+                score,
+                semantic_score,
+                bm25_score,
+                skills
+            )
+        else:
+            # Fallback to simple explanation
+            matching_phrases = self.extract_key_phrases(resume_text, job_description)
+            explanation = f"This resume received a score of {score:.2f}, with semantic relevance of {semantic_score:.2f} and keyword match of {bm25_score:.2f}. "
+            if skills:
+                explanation += f"The resume shows experience with key skills: {', '.join(skills[:5])}. "
+            if matching_phrases:
+                explanation += f"Key matching elements include: {matching_phrases[0]}"
+            return explanation
+# Function to create a download link for dataframe as CSV
+def get_csv_download_link(df, filename="results.csv"):
+    csv = df.to_csv(index=False)
+    b64 = base64.b64encode(csv.encode()).decode()
+    href = f'<a href="data:file/csv;base64,{b64}" download="{filename}">Download CSV</a>'
+    return href
+# Main app UI
+st.title("Resume Screener & Skill Extractor")
+st.markdown("---")
+# Initialize the resume screener
+screener = ResumeScreener(embedding_model_name, explanation_model_name)
+# Job description input
+st.header("1. Enter Job Description")
+job_description = st.text_area(
+    "Paste the job description or requirements here:",
+    height=200,
+    help="Enter the complete job description or a list of required skills and qualifications."
+)
+# Resume upload
+st.header("2. Upload Resumes")
+upload_option = st.radio(
+    "Choose upload method:",
+    ["Upload Files", "Upload from Dataset"]
+)
+uploaded_files = []
+resume_texts = []
+file_names = []
+if upload_option == "Upload Files":
+    uploaded_files = st.file_uploader(
+        "Upload resume files",
+        type=["pdf", "docx", "txt", "csv"],
+        accept_multiple_files=True,
+        help="Upload multiple resume files in PDF, DOCX, TXT, or CSV format."
+    )
+    if uploaded_files:
+        with st.spinner("Processing resumes..."):
+            for file in uploaded_files:
+                file_type = file.name.split('.')[-1].lower()
+                with tempfile.NamedTemporaryFile(delete=False, suffix=f'.{file_type}') as tmp_file:
+                    tmp_file.write(file.getvalue())
+                    tmp_path = tmp_file.name
+                text = screener.extract_text_from_file(tmp_path, file_type)
+                if text:
+                    resume_texts.append(text)
+                    file_names.append(file.name)
+                # Clean up temp file
+                os.unlink(tmp_path)
+        st.session_state.resumes_uploaded = True
+        st.success(f"Successfully processed {len(resume_texts)} resumes.")
+else:
+    st.write("Upload from dataset feature will be implemented soon.")
+    # Here you would implement the connection to Hugging Face datasets
+    # Example pseudocode:
+    # dataset_name = st.text_input("Enter Hugging Face dataset name:")
+    # if st.button("Load Dataset"):
+    #     with st.spinner("Loading dataset..."):
+    #         dataset = load_dataset(dataset_name)
+    #         resume_texts = [item["text"] for item in dataset]
+    #         file_names = [f"resume_{i}.txt" for i in range(len(resume_texts))]
+# Process button
+if st.button("Find Top Candidates", disabled=not (job_description and resume_texts)):
+    with st.spinner("Loading embedding model..."):
+        screener.load_model()
+    with st.spinner("Processing job description and resumes..."):
+        # Get job description embedding
+        job_embedding = screener.get_embedding(job_description)
+        # Get resume embeddings
+        resume_embeddings = []
+        progress_bar = st.progress(0)
+        for i, text in enumerate(resume_texts):
+            embedding = screener.get_embedding(text)
+            resume_embeddings.append(embedding)
+            progress_bar.progress((i + 1) / len(resume_texts))
+        # Calculate hybrid scores
+        hybrid_scores, semantic_scores, bm25_scores = screener.calculate_hybrid_scores(
+            resume_texts,
+            resume_embeddings,
+            job_embedding,
+            semantic_weight,
+            use_faiss
+        )
+        # Get top candidates
+        combined_data = list(zip(file_names, resume_texts, hybrid_scores, semantic_scores, bm25_scores))
+        sorted_data = sorted(combined_data, key=lambda x: x[2], reverse=True)
+        top_candidates = sorted_data[:int(top_k)]
+        # Create results with explanations if enabled
+        results = []
+        for name, text, score, semantic_score, bm25_score in top_candidates:
+            # Extract skills for this resume
+            skills = screener.extract_skills(text, job_description)
+            result = {
+                "filename": name,
+                "score": score,
+                "semantic_score": semantic_score,
+                "keyword_score": bm25_score,
+                "text_preview": text[:500] + "...",
+                "matched_phrases": screener.extract_key_phrases(text, job_description),
+                "skills": skills
+            }
+            if use_explanation:
+                explanation = screener.generate_explanation(
+                    text,
+                    job_description,
+                    score,
+                    semantic_score,
+                    bm25_score,
+                    skills
+                )
+                result["explanation"] = explanation
+            else:
+                result["explanation"] = ""
+            results.append(result)
+        st.session_state.results = results
+        st.success(f"Found top {len(results)} candidates!")
+# Display results
+if st.session_state.results:
+    st.header("3. Results")
+    # Create a DataFrame for download
+    df_data = []
+    for result in st.session_state.results:
+        df_data.append({
+            "Filename": result["filename"],
+            "Score": result["score"],
+            "Semantic Score": result["semantic_score"],
+            "Keyword Score": result["keyword_score"],
+            "Skills": ", ".join(result["skills"]),
+            "Explanation": result["explanation"]
+        })
+    results_df = pd.DataFrame(df_data)
+    # Display download link
+    st.markdown(get_csv_download_link(results_df), unsafe_allow_html=True)
+    # Display individual results
+    for i, result in enumerate(st.session_state.results):
+        with st.expander(f"#{i+1}: {result['filename']} (Score: {result['score']:.4f})"):
+            col1, col2 = st.columns([1, 1])
             with col1:
+                st.subheader("Scores")
+                st.write(f"Total Score: {result['score']:.4f}")
+                st.write(f"Semantic Score: {result['semantic_score']:.4f}")
+                st.write(f"Keyword Score: {result['keyword_score']:.4f}")
+                st.subheader("Matched Skills")
+                if result["skills"]:
+                    for skill in result["skills"]:
+                        st.write(f"• {skill}")
                 else:
+                    st.write("No specific skills matched.")
             with col2:
+                st.subheader("Explanation")
+                st.write(result["explanation"])
+                st.subheader("Key Matches")
+                for phrase in result["matched_phrases"]:
+                    st.markdown(f"• {phrase}")
+                st.subheader("Resume Preview")
+                st.text_area("", result["text_preview"], height=150, disabled=True)
+    # Visualization of scores
+    st.subheader("Score Comparison")
+    # Prepare data for visualization
+    chart_data = pd.DataFrame({
+        "Resume": [result["filename"] for result in st.session_state.results],
+        "Semantic Score": [result["semantic_score"] for result in st.session_state.results],
+        "Keyword Score": [result["keyword_score"] for result in st.session_state.results],
+        "Total Score": [result["score"] for result in st.session_state.results]
+    })
+    # Display as a bar chart
+    st.bar_chart(chart_data.set_index("Resume")[["Total Score", "Semantic Score", "Keyword Score"]])
+# Footer
 st.markdown("---")
+st.markdown("Built with Streamlit and Hugging Face models (NV-Embed-v2 and QwQ-32B)")

explanation_generator.py ADDED Viewed

	@@ -0,0 +1,178 @@

+"""
+Explanation Generator Module
+This module handles the generation of explanations for resume rankings
+using the QwQ-32B model from Hugging Face.
+"""
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import os
+import re
+class ExplanationGenerator:
+    def __init__(self, model_name="Qwen/QwQ-32B"):
+        """Initialize the explanation generator with the specified model"""
+        self.model_name = model_name
+        self.model = None
+        self.tokenizer = None
+        self.initialized = False
+    def load_model(self):
+        """Load the model and tokenizer if not already loaded"""
+        if not self.initialized:
+            try:
+                # Check if we have enough VRAM for loading the model
+                if torch.cuda.is_available():
+                    gpu_memory = torch.cuda.get_device_properties(0).total_memory
+                    # QwQ-32B requires at least 32GB VRAM for full precision
+                    if gpu_memory >= 32 * (1024**3):  # 32 GB
+                        device = "cuda"
+                    else:
+                        device = "cpu"
+                else:
+                    device = "cpu"
+                # Load tokenizer
+                self.tokenizer = AutoTokenizer.from_pretrained(self.model_name)
+                # Load model based on available resources
+                if device == "cuda":
+                    self.model = AutoModelForCausalLM.from_pretrained(
+                        self.model_name,
+                        torch_dtype=torch.bfloat16,
+                        device_map="auto"
+                    )
+                else:
+                    # Fall back to a simpler template-based solution if we can't load the model
+                    self.model = None
+                    print("Warning: Loading QwQ-32B on CPU is not recommended. Using template-based explanations instead.")
+                self.initialized = True
+            except Exception as e:
+                print(f"Error loading QwQ-32B model: {str(e)}")
+                print("Falling back to template-based explanations.")
+                self.model = None
+                self.initialized = True
+    def generate_explanation(self, resume_text, job_description, score, semantic_score, keyword_score, skills):
+        """Generate explanation for why a resume was ranked highly"""
+        # Check if we need to load the model
+        if not self.initialized:
+            self.load_model()
+        # If the model is loaded and available, use it for generating explanations
+        if self.model is not None:
+            try:
+                # Prepare prompt for QwQ-32B
+                prompt = self._create_prompt(resume_text, job_description, score, semantic_score, keyword_score, skills)
+                # Create messages for chat format
+                messages = [
+                    {"role": "user", "content": prompt}
+                ]
+                # Apply chat template
+                text = self.tokenizer.apply_chat_template(
+                    messages,
+                    tokenize=False,
+                    add_generation_prompt=True
+                )
+                # Tokenize
+                inputs = self.tokenizer(text, return_tensors="pt").to(self.model.device)
+                # Generate response
+                output_ids = self.model.generate(
+                    **inputs,
+                    max_new_tokens=300,
+                    temperature=0.6,
+                    top_p=0.95,
+                    top_k=30
+                )
+                # Decode the response
+                response = self.tokenizer.decode(output_ids[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
+                # Clean up the response
+                cleaned_response = self._clean_response(response)
+                return cleaned_response
+            except Exception as e:
+                print(f"Error generating explanation with QwQ-32B: {str(e)}")
+                # Fall back to template-based explanation
+                return self._generate_template_explanation(score, semantic_score, keyword_score, skills)
+        else:
+            # Use template-based explanation if model is not available
+            return self._generate_template_explanation(score, semantic_score, keyword_score, skills)
+    def _create_prompt(self, resume_text, job_description, score, semantic_score, keyword_score, skills):
+        """Create a prompt for the explanation generation"""
+        # Use only the first 1000 characters of the resume to keep prompt size manageable
+        resume_excerpt = resume_text[:1000] + "..." if len(resume_text) > 1000 else resume_text
+        prompt = f"""You are an AI assistant helping a recruiter understand why a candidate's resume was matched with a job posting.
+The resume has been assigned the following scores:
+- Overall Match Score: {score:.2f} out of 1.0
+- Semantic Relevance Score: {semantic_score:.2f} out of 1.0
+- Keyword Match Score: {keyword_score:.2f} out of 1.0
+The job description is:
+```
+{job_description}
+```
+Based on analysis, the resume contains these skills relevant to the job: {', '.join(skills)}
+Resume excerpt:
+```
+{resume_excerpt}
+```
+Please provide a short explanation (3-5 sentences) of why this resume received these scores and how well it matches the job requirements. Focus on the relationship between the candidate's experience and the job requirements."""
+        return prompt
+    def _clean_response(self, response):
+        """Clean the response from the model"""
+        # Remove any thinking or internal processing tokens
+        response = re.sub(r'<think>.*?</think>', '', response, flags=re.DOTALL)
+        # Limit to a reasonable length
+        if len(response) > 500:
+            sentences = response.split('.')
+            shortened = '.'.join(sentences[:5]) + '.'
+            return shortened
+        return response
+    def _generate_template_explanation(self, score, semantic_score, keyword_score, skills):
+        """Generate a template-based explanation when the model is not available"""
+        # Simple template-based explanation
+        if score > 0.8:
+            quality = "excellent"
+        elif score > 0.6:
+            quality = "good"
+        elif score > 0.4:
+            quality = "moderate"
+        else:
+            quality = "limited"
+        explanation = f"This resume shows {quality} alignment with the job requirements, with an overall score of {score:.2f}. "
+        if semantic_score > keyword_score:
+            explanation += f"The candidate's experience demonstrates strong semantic relevance ({semantic_score:.2f}) to the position, though specific keyword matches ({keyword_score:.2f}) could be improved. "
+        else:
+            explanation += f"The resume contains many relevant keywords ({keyword_score:.2f}), but could benefit from better contextual alignment ({semantic_score:.2f}) with the job requirements. "
+        if skills:
+            if len(skills) > 3:
+                explanation += f"Key skills identified include {', '.join(skills[:3])}, and {len(skills)-3} others that match the job requirements."
+            else:
+                explanation += f"Key skills identified include {', '.join(skills)}."
+        else:
+            explanation += "No specific skills were identified that directly match the requirements."
+        return explanation

fix_dependencies.py ADDED Viewed

	@@ -0,0 +1,76 @@

+#!/usr/bin/env python
+"""
+Dependency fixer for Resume Screener and Skill Extractor
+This script ensures all dependencies are properly installed with compatible versions.
+"""
+import sys
+import subprocess
+import pkg_resources
+import os
+def install(package):
+    """Install a package using pip"""
+    subprocess.check_call([sys.executable, "-m", "pip", "install", package])
+def install_with_message(package, message=None):
+    """Install a package with an optional message"""
+    if message:
+        print(f"\n{message}")
+    print(f"Installing {package}...")
+    install(package)
+def main():
+    print("Running dependency fixer for Resume Screener and Skill Extractor...")
+    # Install core dependencies first
+    install_with_message("pip==23.1.2", "Upgrading pip to ensure compatibility")
+    install_with_message("setuptools==68.0.0", "Installing compatible setuptools")
+    # Check if we're in a Hugging Face Space
+    in_hf_space = os.environ.get("SPACE_ID") is not None
+    # Install key libraries with specific versions to ensure compatibility
+    dependencies = [
+        ("streamlit==1.31.0", "Installing Streamlit for the web interface"),
+        ("pdfplumber==0.10.1", "Installing PDF processing libraries"),
+        ("PyPDF2==3.0.1", None),
+        ("python-docx==1.0.1", None),
+        ("rank-bm25==0.2.2", "Installing BM25 ranking library"),
+        ("tqdm==4.66.1", "Installing progress bar utility"),
+        ("faiss-cpu==1.7.4", "Installing FAISS for vector similarity search"),
+        ("huggingface-hub==0.20.3", "Installing Hugging Face Hub"),
+        ("transformers==4.36.2", "Installing Transformers"),
+        ("sentence-transformers==2.2.2", "Installing Sentence Transformers"),
+        ("torch==2.1.2", "Installing PyTorch"),
+        ("nltk==3.8.1", "Installing NLTK for text processing"),
+        ("pandas==2.1.3", "Installing data processing libraries"),
+        ("numpy==1.24.3", None),
+        ("plotly==5.18.0", "Installing visualization libraries"),
+        ("spacy==3.7.2", "Installing spaCy for NLP"),
+    ]
+    # Install all dependencies
+    for package, message in dependencies:
+        install_with_message(package, message)
+    # Download required NLTK data
+    print("\nDownloading NLTK data...")
+    install("nltk")
+    import nltk
+    nltk.download('punkt')
+    # Download spaCy model if not in a Hugging Face Space
+    # (Spaces should include this in the requirements.txt)
+    if not in_hf_space:
+        print("\nDownloading spaCy model...")
+        try:
+            subprocess.check_call([sys.executable, "-m", "spacy", "download", "en_core_web_sm"])
+        except:
+            install("https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.0/en_core_web_sm-3.7.0.tar.gz")
+    print("\nDependency installation complete!")
+    print("You can now run the Resume Screener with: streamlit run app.py")
+if __name__ == "__main__":
+    main()

requirements.txt CHANGED Viewed

@@ -1,22 +1,17 @@
-# Core dependencies - order matters!
-pydantic==1.10.8
-spacy==3.5.0
 sentence-transformers==2.2.2
-torch==1.13.1
-transformers==4.28.1
-# PDF processing
-pdfplumber==0.9.0
-# Web UI
-streamlit==1.22.0
-# Data processing
-pandas==1.5.3
 numpy==1.24.3
-matplotlib==3.7.1
-plotly==5.14.1
-# Utilities
-nltk==3.8.1
-scikit-learn==1.0.2

+streamlit==1.31.0
+pdfplumber==0.10.1
+PyPDF2==3.0.1
+python-docx==1.0.1
+spacy==3.7.2
+https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.0/en_core_web_sm-3.7.0.tar.gz
+transformers==4.36.2
+torch==2.1.2
+nltk==3.8.1
+faiss-cpu==1.7.4
+rank-bm25==0.2.2
 sentence-transformers==2.2.2
+plotly==5.18.0
+pandas==2.1.3
 numpy==1.24.3
+tqdm==4.66.1
+huggingface-hub==0.20.3