billyxx commited on
Commit
cf614b7
ยท
verified ยท
1 Parent(s): cc4cbea

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -9
README.md CHANGED
@@ -1,12 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
- title: Sprouts Assignment
3
- emoji: ๐Ÿƒ
4
- colorFrom: purple
5
- colorTo: gray
6
- sdk: gradio
7
- sdk_version: 5.41.1
8
- app_file: app.py
9
- pinned: false
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Candidate Recommendation Engine powered by LLM
2
+
3
+ ## ๐Ÿ“Œ Overview
4
+
5
+ This Candidate Recommendation Engine ranks and summarizes resumes against a given job description using Natural Language Processing (NLP) and semantic similarity techniques. It is designed to assist recruiters and hiring managers in quickly identifying top candidates from a pool of resumes.
6
+
7
+ The application:
8
+
9
+ - Accepts a **job description** (text input).
10
+ - Accepts **multiple resumes** (PDF, DOCX, or TXT files).
11
+ - Extracts and cleans resume text.
12
+ - Generates semantic embeddings using **sentence-transformers**.
13
+ - Calculates **cosine similarity** between each resume and the job description.
14
+ - Uses a lightweight LLM (**MBZUAI/LaMini-Flan-T5-248M**) for a short, human-readable explanation of why the candidate is a good fit.
15
+ - Displays ranked results with:
16
+ - **Candidate Name** (from resume or filename)
17
+ - **File Name**
18
+ - **Similarity Score**
19
+ - **Summary**
20
+
21
  ---
22
+
23
+ ## ๐Ÿ›  Approach
24
+
25
+ - **Text Extraction**
26
+ - PDF resumes โ†’ parsed using PyPDF2
27
+ - DOCX resumes โ†’ parsed using python-docx
28
+ - TXT resumes โ†’ directly read as text
29
+ - Minimal cleaning (remove extra spaces, special characters where needed)
30
+
31
+ - **Embedding Generation**
32
+ - Used **all-MiniLM-L6-v2** from sentence-transformers for generating 384-dimensional embeddings.
33
+ - Generated embeddings for:
34
+ - Job description (once)
35
+ - Each resume (individually)
36
+
37
+ - **Similarity Calculation**
38
+ - Computed cosine similarity between the job description embedding and each resume embedding.
39
+ - Higher score = higher semantic similarity to the job description.
40
+
41
+ - **Candidate Name Extraction**
42
+ - Tried extracting candidateโ€™s name from the first 3 lines of the resume (common format for names at the top).
43
+ - If not found, fell back to filename without extension.
44
+
45
+ - **Summary Generation**
46
+ - Used **MBZUAI/LaMini-Flan-T5-248M** LLM for generating short summaries.
47
+ - Prompt: *"Why is this candidate a good fit for the given job description?"*
48
+ - Helps recruiters understand the matching context, not just the score.
49
+
50
  ---
51
 
52
+ ## ๐Ÿ“‹ Assumptions
53
+
54
+ - Resumes are in English.
55
+ - Resume files are well-formatted (name near top, text extractable).
56
+ - Similarity score is a proxy for relevance โ€” not a hiring decision.
57
+ - LLM summarization works best with cleaned, relevant resume sections.
58
+ - Job descriptions are detailed enough for meaningful comparison.
59
+
60
+ ---
61
+
62
+ ## โš  Limitations
63
+
64
+ - Cannot perfectly handle image-based (scanned) resumes without OCR.
65
+ - Candidate name extraction may fail if resumes have unconventional formatting.
66
+ - LLM summaries depend on model capability โ€” may occasionally be generic.
67
+ - Cosine similarity does not account for specific skill weights (all terms treated equally).
68
+ - Large file uploads may impact performance on free hosting tiers.
69
+
70
+ ---