billyxx commited on
Commit
cc4cbea
·
verified ·
1 Parent(s): 81fb61e

Delete readme.md

Browse files
Files changed (1) hide show
  1. readme.md +0 -70
readme.md DELETED
@@ -1,70 +0,0 @@
1
- # Candidate Recommendation Engine powered by LLM
2
-
3
- ## 📌 Overview
4
-
5
- This Candidate Recommendation Engine ranks and summarizes resumes against a given job description using Natural Language Processing (NLP) and semantic similarity techniques. It is designed to assist recruiters and hiring managers in quickly identifying top candidates from a pool of resumes.
6
-
7
- The application:
8
-
9
- - Accepts a **job description** (text input).
10
- - Accepts **multiple resumes** (PDF, DOCX, or TXT files).
11
- - Extracts and cleans resume text.
12
- - Generates semantic embeddings using **sentence-transformers**.
13
- - Calculates **cosine similarity** between each resume and the job description.
14
- - Uses a lightweight LLM (**MBZUAI/LaMini-Flan-T5-248M**) for a short, human-readable explanation of why the candidate is a good fit.
15
- - Displays ranked results with:
16
- - **Candidate Name** (from resume or filename)
17
- - **File Name**
18
- - **Similarity Score**
19
- - **Summary**
20
-
21
- ---
22
-
23
- ## 🛠 Approach
24
-
25
- - **Text Extraction**
26
- - PDF resumes → parsed using PyPDF2
27
- - DOCX resumes → parsed using python-docx
28
- - TXT resumes → directly read as text
29
- - Minimal cleaning (remove extra spaces, special characters where needed)
30
-
31
- - **Embedding Generation**
32
- - Used **all-MiniLM-L6-v2** from sentence-transformers for generating 384-dimensional embeddings.
33
- - Generated embeddings for:
34
- - Job description (once)
35
- - Each resume (individually)
36
-
37
- - **Similarity Calculation**
38
- - Computed cosine similarity between the job description embedding and each resume embedding.
39
- - Higher score = higher semantic similarity to the job description.
40
-
41
- - **Candidate Name Extraction**
42
- - Tried extracting candidate’s name from the first 3 lines of the resume (common format for names at the top).
43
- - If not found, fell back to filename without extension.
44
-
45
- - **Summary Generation**
46
- - Used **MBZUAI/LaMini-Flan-T5-248M** LLM for generating short summaries.
47
- - Prompt: *"Why is this candidate a good fit for the given job description?"*
48
- - Helps recruiters understand the matching context, not just the score.
49
-
50
- ---
51
-
52
- ## 📋 Assumptions
53
-
54
- - Resumes are in English.
55
- - Resume files are well-formatted (name near top, text extractable).
56
- - Similarity score is a proxy for relevance — not a hiring decision.
57
- - LLM summarization works best with cleaned, relevant resume sections.
58
- - Job descriptions are detailed enough for meaningful comparison.
59
-
60
- ---
61
-
62
- ## ⚠ Limitations
63
-
64
- - Cannot perfectly handle image-based (scanned) resumes without OCR.
65
- - Candidate name extraction may fail if resumes have unconventional formatting.
66
- - LLM summaries depend on model capability — may occasionally be generic.
67
- - Cosine similarity does not account for specific skill weights (all terms treated equally).
68
- - Large file uploads may impact performance on free hosting tiers.
69
-
70
- ---