billyxx commited on
Commit
81fb61e
·
verified ·
1 Parent(s): c83d079

Upload readme.md

Browse files
Files changed (1) hide show
  1. readme.md +70 -0
readme.md ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Candidate Recommendation Engine powered by LLM
2
+
3
+ ## 📌 Overview
4
+
5
+ This Candidate Recommendation Engine ranks and summarizes resumes against a given job description using Natural Language Processing (NLP) and semantic similarity techniques. It is designed to assist recruiters and hiring managers in quickly identifying top candidates from a pool of resumes.
6
+
7
+ The application:
8
+
9
+ - Accepts a **job description** (text input).
10
+ - Accepts **multiple resumes** (PDF, DOCX, or TXT files).
11
+ - Extracts and cleans resume text.
12
+ - Generates semantic embeddings using **sentence-transformers**.
13
+ - Calculates **cosine similarity** between each resume and the job description.
14
+ - Uses a lightweight LLM (**MBZUAI/LaMini-Flan-T5-248M**) for a short, human-readable explanation of why the candidate is a good fit.
15
+ - Displays ranked results with:
16
+ - **Candidate Name** (from resume or filename)
17
+ - **File Name**
18
+ - **Similarity Score**
19
+ - **Summary**
20
+
21
+ ---
22
+
23
+ ## 🛠 Approach
24
+
25
+ - **Text Extraction**
26
+ - PDF resumes → parsed using PyPDF2
27
+ - DOCX resumes → parsed using python-docx
28
+ - TXT resumes → directly read as text
29
+ - Minimal cleaning (remove extra spaces, special characters where needed)
30
+
31
+ - **Embedding Generation**
32
+ - Used **all-MiniLM-L6-v2** from sentence-transformers for generating 384-dimensional embeddings.
33
+ - Generated embeddings for:
34
+ - Job description (once)
35
+ - Each resume (individually)
36
+
37
+ - **Similarity Calculation**
38
+ - Computed cosine similarity between the job description embedding and each resume embedding.
39
+ - Higher score = higher semantic similarity to the job description.
40
+
41
+ - **Candidate Name Extraction**
42
+ - Tried extracting candidate’s name from the first 3 lines of the resume (common format for names at the top).
43
+ - If not found, fell back to filename without extension.
44
+
45
+ - **Summary Generation**
46
+ - Used **MBZUAI/LaMini-Flan-T5-248M** LLM for generating short summaries.
47
+ - Prompt: *"Why is this candidate a good fit for the given job description?"*
48
+ - Helps recruiters understand the matching context, not just the score.
49
+
50
+ ---
51
+
52
+ ## 📋 Assumptions
53
+
54
+ - Resumes are in English.
55
+ - Resume files are well-formatted (name near top, text extractable).
56
+ - Similarity score is a proxy for relevance — not a hiring decision.
57
+ - LLM summarization works best with cleaned, relevant resume sections.
58
+ - Job descriptions are detailed enough for meaningful comparison.
59
+
60
+ ---
61
+
62
+ ## ⚠ Limitations
63
+
64
+ - Cannot perfectly handle image-based (scanned) resumes without OCR.
65
+ - Candidate name extraction may fail if resumes have unconventional formatting.
66
+ - LLM summaries depend on model capability — may occasionally be generic.
67
+ - Cosine similarity does not account for specific skill weights (all terms treated equally).
68
+ - Large file uploads may impact performance on free hosting tiers.
69
+
70
+ ---