Spaces:
Running
Running
Delete readme.md
Browse files
readme.md
DELETED
@@ -1,70 +0,0 @@
|
|
1 |
-
# Candidate Recommendation Engine powered by LLM
|
2 |
-
|
3 |
-
## 📌 Overview
|
4 |
-
|
5 |
-
This Candidate Recommendation Engine ranks and summarizes resumes against a given job description using Natural Language Processing (NLP) and semantic similarity techniques. It is designed to assist recruiters and hiring managers in quickly identifying top candidates from a pool of resumes.
|
6 |
-
|
7 |
-
The application:
|
8 |
-
|
9 |
-
- Accepts a **job description** (text input).
|
10 |
-
- Accepts **multiple resumes** (PDF, DOCX, or TXT files).
|
11 |
-
- Extracts and cleans resume text.
|
12 |
-
- Generates semantic embeddings using **sentence-transformers**.
|
13 |
-
- Calculates **cosine similarity** between each resume and the job description.
|
14 |
-
- Uses a lightweight LLM (**MBZUAI/LaMini-Flan-T5-248M**) for a short, human-readable explanation of why the candidate is a good fit.
|
15 |
-
- Displays ranked results with:
|
16 |
-
- **Candidate Name** (from resume or filename)
|
17 |
-
- **File Name**
|
18 |
-
- **Similarity Score**
|
19 |
-
- **Summary**
|
20 |
-
|
21 |
-
---
|
22 |
-
|
23 |
-
## 🛠 Approach
|
24 |
-
|
25 |
-
- **Text Extraction**
|
26 |
-
- PDF resumes → parsed using PyPDF2
|
27 |
-
- DOCX resumes → parsed using python-docx
|
28 |
-
- TXT resumes → directly read as text
|
29 |
-
- Minimal cleaning (remove extra spaces, special characters where needed)
|
30 |
-
|
31 |
-
- **Embedding Generation**
|
32 |
-
- Used **all-MiniLM-L6-v2** from sentence-transformers for generating 384-dimensional embeddings.
|
33 |
-
- Generated embeddings for:
|
34 |
-
- Job description (once)
|
35 |
-
- Each resume (individually)
|
36 |
-
|
37 |
-
- **Similarity Calculation**
|
38 |
-
- Computed cosine similarity between the job description embedding and each resume embedding.
|
39 |
-
- Higher score = higher semantic similarity to the job description.
|
40 |
-
|
41 |
-
- **Candidate Name Extraction**
|
42 |
-
- Tried extracting candidate’s name from the first 3 lines of the resume (common format for names at the top).
|
43 |
-
- If not found, fell back to filename without extension.
|
44 |
-
|
45 |
-
- **Summary Generation**
|
46 |
-
- Used **MBZUAI/LaMini-Flan-T5-248M** LLM for generating short summaries.
|
47 |
-
- Prompt: *"Why is this candidate a good fit for the given job description?"*
|
48 |
-
- Helps recruiters understand the matching context, not just the score.
|
49 |
-
|
50 |
-
---
|
51 |
-
|
52 |
-
## 📋 Assumptions
|
53 |
-
|
54 |
-
- Resumes are in English.
|
55 |
-
- Resume files are well-formatted (name near top, text extractable).
|
56 |
-
- Similarity score is a proxy for relevance — not a hiring decision.
|
57 |
-
- LLM summarization works best with cleaned, relevant resume sections.
|
58 |
-
- Job descriptions are detailed enough for meaningful comparison.
|
59 |
-
|
60 |
-
---
|
61 |
-
|
62 |
-
## ⚠ Limitations
|
63 |
-
|
64 |
-
- Cannot perfectly handle image-based (scanned) resumes without OCR.
|
65 |
-
- Candidate name extraction may fail if resumes have unconventional formatting.
|
66 |
-
- LLM summaries depend on model capability — may occasionally be generic.
|
67 |
-
- Cosine similarity does not account for specific skill weights (all terms treated equally).
|
68 |
-
- Large file uploads may impact performance on free hosting tiers.
|
69 |
-
|
70 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|