Spaces:
Running
Running
Upload README.md
Browse files
README.md
CHANGED
@@ -1,12 +1,70 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
---
|
11 |
|
12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Candidate Recommendation Engine powered by LLM
|
2 |
+
|
3 |
+
## ๐ Overview
|
4 |
+
|
5 |
+
This Candidate Recommendation Engine ranks and summarizes resumes against a given job description using Natural Language Processing (NLP) and semantic similarity techniques. It is designed to assist recruiters and hiring managers in quickly identifying top candidates from a pool of resumes.
|
6 |
+
|
7 |
+
The application:
|
8 |
+
|
9 |
+
- Accepts a **job description** (text input).
|
10 |
+
- Accepts **multiple resumes** (PDF, DOCX, or TXT files).
|
11 |
+
- Extracts and cleans resume text.
|
12 |
+
- Generates semantic embeddings using **sentence-transformers**.
|
13 |
+
- Calculates **cosine similarity** between each resume and the job description.
|
14 |
+
- Uses a lightweight LLM (**MBZUAI/LaMini-Flan-T5-248M**) for a short, human-readable explanation of why the candidate is a good fit.
|
15 |
+
- Displays ranked results with:
|
16 |
+
- **Candidate Name** (from resume or filename)
|
17 |
+
- **File Name**
|
18 |
+
- **Similarity Score**
|
19 |
+
- **Summary**
|
20 |
+
|
21 |
---
|
22 |
+
|
23 |
+
## ๐ Approach
|
24 |
+
|
25 |
+
- **Text Extraction**
|
26 |
+
- PDF resumes โ parsed using PyPDF2
|
27 |
+
- DOCX resumes โ parsed using python-docx
|
28 |
+
- TXT resumes โ directly read as text
|
29 |
+
- Minimal cleaning (remove extra spaces, special characters where needed)
|
30 |
+
|
31 |
+
- **Embedding Generation**
|
32 |
+
- Used **all-MiniLM-L6-v2** from sentence-transformers for generating 384-dimensional embeddings.
|
33 |
+
- Generated embeddings for:
|
34 |
+
- Job description (once)
|
35 |
+
- Each resume (individually)
|
36 |
+
|
37 |
+
- **Similarity Calculation**
|
38 |
+
- Computed cosine similarity between the job description embedding and each resume embedding.
|
39 |
+
- Higher score = higher semantic similarity to the job description.
|
40 |
+
|
41 |
+
- **Candidate Name Extraction**
|
42 |
+
- Tried extracting candidateโs name from the first 3 lines of the resume (common format for names at the top).
|
43 |
+
- If not found, fell back to filename without extension.
|
44 |
+
|
45 |
+
- **Summary Generation**
|
46 |
+
- Used **MBZUAI/LaMini-Flan-T5-248M** LLM for generating short summaries.
|
47 |
+
- Prompt: *"Why is this candidate a good fit for the given job description?"*
|
48 |
+
- Helps recruiters understand the matching context, not just the score.
|
49 |
+
|
50 |
---
|
51 |
|
52 |
+
## ๐ Assumptions
|
53 |
+
|
54 |
+
- Resumes are in English.
|
55 |
+
- Resume files are well-formatted (name near top, text extractable).
|
56 |
+
- Similarity score is a proxy for relevance โ not a hiring decision.
|
57 |
+
- LLM summarization works best with cleaned, relevant resume sections.
|
58 |
+
- Job descriptions are detailed enough for meaningful comparison.
|
59 |
+
|
60 |
+
---
|
61 |
+
|
62 |
+
## โ Limitations
|
63 |
+
|
64 |
+
- Cannot perfectly handle image-based (scanned) resumes without OCR.
|
65 |
+
- Candidate name extraction may fail if resumes have unconventional formatting.
|
66 |
+
- LLM summaries depend on model capability โ may occasionally be generic.
|
67 |
+
- Cosine similarity does not account for specific skill weights (all terms treated equally).
|
68 |
+
- Large file uploads may impact performance on free hosting tiers.
|
69 |
+
|
70 |
+
---
|