Spaces:
Running
Running
Upload 3 files
Browse files- README.md +20 -3
- app.py +17 -1
- requirements.txt +1 -0
README.md
CHANGED
@@ -18,7 +18,7 @@ This Candidate Recommendation Engine ranks and summarizes resumes against a give
|
|
18 |
The application:
|
19 |
|
20 |
- Accepts a **job description** (text input).
|
21 |
-
- Accepts **multiple resumes** (PDF
|
22 |
- Extracts and cleans resume text.
|
23 |
- Generates semantic embeddings using **sentence-transformers**.
|
24 |
- Calculates **cosine similarity** between each resume and the job description.
|
@@ -33,6 +33,24 @@ The application:
|
|
33 |
|
34 |
## π Approach
|
35 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
36 |
- **Text Extraction**
|
37 |
- PDF resumes β parsed using PyPDF2
|
38 |
- DOCX resumes β parsed using python-docx
|
@@ -71,8 +89,7 @@ The application:
|
|
71 |
---
|
72 |
|
73 |
## β Limitations
|
74 |
-
|
75 |
-
- Cannot perfectly handle image-based (scanned) resumes without OCR.
|
76 |
- Candidate name extraction may fail if resumes have unconventional formatting.
|
77 |
- LLM summaries depend on model capability β may occasionally be generic.
|
78 |
- Cosine similarity does not account for specific skill weights (all terms treated equally).
|
|
|
18 |
The application:
|
19 |
|
20 |
- Accepts a **job description** (text input).
|
21 |
+
- Accepts **multiple resumes** (PDF or TXT files).
|
22 |
- Extracts and cleans resume text.
|
23 |
- Generates semantic embeddings using **sentence-transformers**.
|
24 |
- Calculates **cosine similarity** between each resume and the job description.
|
|
|
33 |
|
34 |
## π Approach
|
35 |
|
36 |
+
## AI Summarization
|
37 |
+
|
38 |
+
I chose **not to use GPT API or Gemini API** for the AI-powered candidate summary because
|
39 |
+
I wanted to explore alternative options and deepen my understanding of
|
40 |
+
open-source large language models (LLMs).
|
41 |
+
|
42 |
+
After experimenting with several LLM models, I finalized on using the **MBZUAI/LaMini-Flan-T5-248M**
|
43 |
+
model for generating candidate summaries. This model provides a good balance of performance
|
44 |
+
and efficiency for summarization tasks, and working with it has helped me learn more about
|
45 |
+
LLMs outside of the popular API-based services.
|
46 |
+
|
47 |
+
## Embeddings
|
48 |
+
- For generating semantic embeddings to measure resume-job description similarity,
|
49 |
+
I used **all-mpnet-base-v2** from the sentence-transformers library. This model
|
50 |
+
provided better cosine similarity results compared to other embedding models I tested,
|
51 |
+
making the ranking of candidates more accurate and relevant.
|
52 |
+
|
53 |
+
|
54 |
- **Text Extraction**
|
55 |
- PDF resumes β parsed using PyPDF2
|
56 |
- DOCX resumes β parsed using python-docx
|
|
|
89 |
---
|
90 |
|
91 |
## β Limitations
|
92 |
+
|
|
|
93 |
- Candidate name extraction may fail if resumes have unconventional formatting.
|
94 |
- LLM summaries depend on model capability β may occasionally be generic.
|
95 |
- Cosine similarity does not account for specific skill weights (all terms treated equally).
|
app.py
CHANGED
@@ -2,6 +2,8 @@ import gradio as gr
|
|
2 |
import os
|
3 |
import pdfplumber
|
4 |
from recommender import rank_resumes, summarize_resume_flan, extract_applicant_name
|
|
|
|
|
5 |
|
6 |
UPLOAD_FOLDER = "uploads"
|
7 |
os.makedirs(UPLOAD_FOLDER, exist_ok=True)
|
@@ -26,6 +28,8 @@ def process_resumes(job_description, uploaded_file):
|
|
26 |
with pdfplumber.open(filepath) as pdf:
|
27 |
pages = [page.extract_text() for page in pdf.pages if page.extract_text() is not None]
|
28 |
text = "\n".join(pages)
|
|
|
|
|
29 |
else:
|
30 |
return "Unsupported file format.", None
|
31 |
|
@@ -34,6 +38,9 @@ def process_resumes(job_description, uploaded_file):
|
|
34 |
# Rank resumes
|
35 |
results = rank_resumes(job_description, resume_texts)
|
36 |
|
|
|
|
|
|
|
37 |
# Generate summaries
|
38 |
for candidate in results:
|
39 |
candidate["summary"] = summarize_resume_flan(candidate["text"], job_description)
|
@@ -50,6 +57,14 @@ def process_resumes(job_description, uploaded_file):
|
|
50 |
|
51 |
return "", table_data
|
52 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
53 |
|
54 |
|
55 |
|
@@ -58,7 +73,8 @@ with gr.Blocks() as demo:
|
|
58 |
with gr.Row():
|
59 |
job_desc = gr.Textbox(label="Job Description", lines=10, placeholder="Paste job description here...")
|
60 |
|
61 |
-
resumes = gr.File(label="Upload Resume (.txt
|
|
|
62 |
btn = gr.Button("Rank Candidates")
|
63 |
|
64 |
|
|
|
2 |
import os
|
3 |
import pdfplumber
|
4 |
from recommender import rank_resumes, summarize_resume_flan, extract_applicant_name
|
5 |
+
from docx import Document
|
6 |
+
|
7 |
|
8 |
UPLOAD_FOLDER = "uploads"
|
9 |
os.makedirs(UPLOAD_FOLDER, exist_ok=True)
|
|
|
28 |
with pdfplumber.open(filepath) as pdf:
|
29 |
pages = [page.extract_text() for page in pdf.pages if page.extract_text() is not None]
|
30 |
text = "\n".join(pages)
|
31 |
+
elif filepath.endswith(".docx"):
|
32 |
+
text = extract_text_from_docx(filepath)
|
33 |
else:
|
34 |
return "Unsupported file format.", None
|
35 |
|
|
|
38 |
# Rank resumes
|
39 |
results = rank_resumes(job_description, resume_texts)
|
40 |
|
41 |
+
for i, candidate in enumerate(results):
|
42 |
+
candidate["name"] = resume_texts[i][0]
|
43 |
+
|
44 |
# Generate summaries
|
45 |
for candidate in results:
|
46 |
candidate["summary"] = summarize_resume_flan(candidate["text"], job_description)
|
|
|
57 |
|
58 |
return "", table_data
|
59 |
|
60 |
+
def extract_text_from_docx(filepath):
|
61 |
+
doc = Document(filepath)
|
62 |
+
full_text = []
|
63 |
+
for para in doc.paragraphs:
|
64 |
+
full_text.append(para.text)
|
65 |
+
return "\n".join(full_text)
|
66 |
+
|
67 |
+
|
68 |
|
69 |
|
70 |
|
|
|
73 |
with gr.Row():
|
74 |
job_desc = gr.Textbox(label="Job Description", lines=10, placeholder="Paste job description here...")
|
75 |
|
76 |
+
resumes = gr.File(label="Upload Resume (.txt, .pdf, .docx)", file_types=[".txt", ".pdf", ".docx"])
|
77 |
+
|
78 |
btn = gr.Button("Rank Candidates")
|
79 |
|
80 |
|
requirements.txt
CHANGED
@@ -6,4 +6,5 @@ transformers
|
|
6 |
accelerate
|
7 |
torch
|
8 |
pdfplumber
|
|
|
9 |
|
|
|
6 |
accelerate
|
7 |
torch
|
8 |
pdfplumber
|
9 |
+
python-docx
|
10 |
|