Spaces:
Sleeping
Sleeping
Upload 8 files
Browse files- cv_examples/reddgr_cv.txt +88 -0
- json/ner_schema.json +21 -0
- json/response_schema.json +38 -0
- prompts/ner_pre_prompt.txt +1 -0
- prompts/system_prompt.txt +1 -0
- prompts/user_prompt.txt +1 -0
- src/__pycache__/procesador_de_cvs_con_llm.cpython-311.pyc +0 -0
- src/procesador_de_cvs_con_llm.py +284 -0
cv_examples/reddgr_cv.txt
ADDED
@@ -0,0 +1,88 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
www.linkedin.com/in/davidgonzalezromero (LinkedIn)
|
2 |
+
talkingtochatbots.com (Personal)
|
3 |
+
Top Skills
|
4 |
+
Information and CommunicationsTechnology (ICT)
|
5 |
+
Data Science
|
6 |
+
Artificial Intelligence (AI)
|
7 |
+
Languages
|
8 |
+
English (Full Professional)
|
9 |
+
French (Limited Working)
|
10 |
+
Spanish (Native or Bilingual)
|
11 |
+
Certifications
|
12 |
+
Watson Analytics - Level 1
|
13 |
+
Retail Industry Jumpstart
|
14 |
+
Data Science Foundations - Level 1
|
15 |
+
Generative AI Imaging: WhatCreative Pros Need to Know
|
16 |
+
Prompt Engineering: How to Talk tothe AIs
|
17 |
+
Honors-Awards
|
18 |
+
Desafío Entorno Pre Mercado 2023
|
19 |
+
David González Romero
|
20 |
+
ICT Engineer | Business Consultant | Licensed Financial ServicesProfessional | Web Publisher | Profile not suggested by AI | 499connections | reddgr
|
21 |
+
Greater Madrid Metropolitan Area
|
22 |
+
Summary
|
23 |
+
I am an Information and Communications Technology Engineer andBusiness Consultant with over 15 years of experience in enterprisesoftware solutions, consulting, business analytics, and data science,across multiple countries and cross-functional teams.Over the last two decades, I have enjoyed the privilege of travelingaround the world, teaming up with outstanding professionals,leading teams, developing business opportunities, and buildingand managing longstanding client relationships. Throughout myconsulting and client relationship management career, I haveprimarily served clients in the retail industry, financial services,telecommunications, and the public sector, developing skills andknowledge across diverse domains such as marketing, finance, riskmanagement, software engineering, and data science.In academia, I completed MSc studies in telecommunications,electrical, and computer engineering, with research in ubiquitouscomputing, the Internet of Things, and computer security. Currently,I'm pursuing a master's degree in Artificial Intelligence Applied toFinancial Markets, and managing Talking to Chatbots, a websitededicated to generative AI projects, popular culture, and education,available at https://talkingtochatbots.com.
|
24 |
+
Experience
|
25 |
+
Talking to Chatbots, by Reddgr
|
26 |
+
Web Publisher and Generative AI Researcher
|
27 |
+
October 2006 - Present (18 years 3 months)
|
28 |
+
Spain
|
29 |
+
Developed and managed personal projects on the Internet since 2006.Currently managing the internet domains https://talkingtochatbots.com(website) and https://reddgr.com (search engine and social media keyword:
|
30 |
+
Page 1 of 6
|
31 |
+
“reddgr” stands for “David González Romero network”). Talking to Chatbotsis a knowledge hub that compiles LLM prompts and curated conversations,serving as an entertainment and educational platform for AI hobbyists,learners, and professionals.Since 2023, active developer and contributor in open-source and proprietarygenerative AI platforms and communities (reddgr.com/gpts huggingface.com/reddgr)
|
32 |
+
Acoustic
|
33 |
+
Principal Consultant | Martech SaaS
|
34 |
+
June 2020 - May 2023 (3 years)
|
35 |
+
Spain
|
36 |
+
Advised retail companies on implementing profitable, competitive pricingstrategies and promotions backed by DemandTec software as a service(SaaS) solutions and data science. Primary focus on developing and leadingsuccessful client relationships. Dedicated to continuously improving Acousticproducts and ensuring that clients and prospects receive excellent servicefrom Acoustic’s team and business partners. This involved delivering andmanaging: consultative selling, employee recruitment, training and mentoring,solution implementations, SaaS managed services (data integration andmodeling), technical support, customer relationship management (CRM), andanalytics consulting services.Companies served by the team of consultants and account managers I ledinclude: leading Spanish retailer; Italian supermarket cooperative; multinationalretail company operating in the Middle East, Eastern Europe and Africa;leading Italian retail group; Swedish grocery retailer; online retailer operating inthe UK and Ireland; British retailer with multinational presence; Finland-basedmultinational retail company; leading Norwegian retailer; retail cooperativein the Nordic countries; major multinational retail group operating in SouthAmerica.Participated in or led pre-sales activities (RFI, RFP, POC, RFQ) for variousnational and multinational retailers based in Southern Europe, Central Europe,Nordics, the Middle East and Australia.
|
37 |
+
IBM
|
38 |
+
7 years 3 months
|
39 |
+
Engagement Manager, in support of Acoustic | B2B SaaS Retail Analytics
|
40 |
+
Page 2 of 6
|
41 |
+
July 2019 - May 2020 (11 months)
|
42 |
+
Madrid, Community of Madrid, Spain
|
43 |
+
Employed by IBM exclusively in support of Acoustic, new company founded in2019 by a team of IBM Watson Marketing & Commerce software specialists,led by former IBM executives and funded by private equity. Specialist inAcoustic Pricing and Promotion solutions (DemandTec), acting as AcousticSoftware Services team leader in Spain, and as software delivery EngagementManager and Subject Matter Expert for pre-sales and services projectsworldwide. Acoustic clients I worked with include: leading Spanish retailer, supermarketcooperative based in central and southern Italy, multinational retail companyoperating in Middle East, Eastern Europe and Africa, leading Italian retailgroup.
|
44 |
+
Engagement Manager | B2B SaaS Retail Analytics
|
45 |
+
September 2018 - June 2019 (10 months)
|
46 |
+
Madrid, Community of Madrid, Spain
|
47 |
+
Managing services projects and SaaS engagements for IBM WatsonCommerce solutions. As cognitive solutions specialist and SME in retail pricingand business analytics, I helped IBM clients succeed by coordinating allcomponents of the IBM Omni-Channel Pricing (DemandTec) cloud-basedsolution, including: solution design and PoC's, solution implementation anddelivery, data science services, data integration services, SaaS operations,technical support, product management, benefits assessments, and analyticalconsulting services.IBM clients I worked with as Engagement Manager or SME include: leadingSpanish retailer, multinational retail company operating in Middle East, EasternEurope and Africa, leading Italian retail group, Italian supermarket cooperative.
|
48 |
+
Relationship Manager | Cognitive Solutions SaaS
|
49 |
+
January 2015 - August 2018 (3 years 8 months)
|
50 |
+
Madrid, Community of Madrid, Spain
|
51 |
+
Specialist in the IBM Omni-channel Merchandising (DemandTec) solutionfor the retail industry, including Price Optimization, Promotion Planning andDynamic Pricing software.Managed the day to day relationship with assigned clients (€ 2 million ARR),prospecting and coordinating the delivery of SaaS platform enablementservices (data integration and data science), technical support, project
|
52 |
+
Page 3 of 6
|
53 |
+
management, and end-user enablement. Collaborated in other internationalprojects as DemandTec and pricing SME, delivering training and projectguidance to client end-users and business partners.IBM clients I worked with include: multiple Merchandising divisions of leadingSpanish retailer, multinational supermarket chain based in Spain, supermarketco-operative based in Denmark, multinational retail company operating inMiddle East, Eastern Europe and Africa, Russian supermarket chain, Finland-based retail company, British consumer co-operative.
|
54 |
+
Business Analyst | B2B SaaS Retail Analytics
|
55 |
+
March 2013 - December 2014 (1 year 10 months)
|
56 |
+
Madrid, Community of Madrid, Spain
|
57 |
+
Delivery of IBM Enterprise Marketing Management implementation projects,including DemandTec Price Optimization, Markdown Optimization andAssortment Optimization SaaS solutions. Delivered business and technicalguidance to pricing managers, category managers, buyers and businessconsultants in solution architecture, problem management resolution andchange management. Specialist in performing data analysis on the datascience, optimization and business analytics tools and services included in thesolution.IBM clients I worked with include: leading Spanish retailer, multinationalfashion retailer based in Spain, US-based sports retailer, supermarket co-operative based in Denmark.
|
58 |
+
KPMG España
|
59 |
+
Senior Consultant | Financial Risk Management
|
60 |
+
December 2010 - March 2013 (2 years 4 months)
|
61 |
+
Madrid, Community of Madrid, Spain
|
62 |
+
Senior Consultant in Financial Risk Management. Main projects:• Corporate and Investment Banking financial reporting: data mining andanalytics for Finance and Business Performance & Analytics department atleading multinational banking and financial services company. Developed andmaintained financial reports and insights for CFO, senior management andfront office.• Retail and Business Banking credit risk modeling: led user acceptancetesting and test case development for credit risk models and EBA-compliantreporting (COREP) of capital requirements. Led UAT development team and
|
63 |
+
Page 4 of 6
|
64 |
+
acted as a link between IT teams and Risk department. Supported credit riskmodeling team on early implementation of internal ratings-based (IRB) creditrisk models in compliance with Basel Framework on banking supervision.
|
65 |
+
MBD Analytics
|
66 |
+
Business Intelligence Consultant
|
67 |
+
February 2010 - December 2010 (11 months)
|
68 |
+
Alcobendas, Community of Madrid, Spain
|
69 |
+
Marketing Business Intelligence consulting services. Client-facing consultantfor Competitive Intelligence department at a multinational telecommunicationscompany. Responsible for the development of custom BI reporting solutionsand presenting monthly business reports. The reports included insights,analysis and forecasting of KPIs measuring customer activity and value inconsumer and enterprise telecommunication services.
|
70 |
+
Grupo Eneas
|
71 |
+
Cost Analyst
|
72 |
+
November 2009 - December 2009 (2 months)
|
73 |
+
Madrid, Community of Madrid, Spain
|
74 |
+
Telecommunications cost optimization project for a regional governmentagency of Spain. Gathered and analyzed invoice and contract data in supportof a Request for Quotation (RFQ) to a selection of telecommunication serviceproviders.
|
75 |
+
Deloitte España
|
76 |
+
IT Strategy Consultant
|
77 |
+
September 2008 - January 2009 (5 months)
|
78 |
+
Madrid, Community of Madrid, Spain
|
79 |
+
Management consulting intern, collaborating on IT management projects forinsurance, banking and public sector companies and institutions based inSpain. Collaborated on research, documentation, elaboration of proposals andtechnical support for IT management consulting projects, including: IT strategicplanning and market research, IT service management, IT integration, and ITcost optimization.
|
80 |
+
Education
|
81 |
+
Illinois Institute of Technology
|
82 |
+
Research Scholar, Electrical & Computer Engineering · (2009 - 2009)
|
83 |
+
Page 5 of 6
|
84 |
+
Universidad Politécnica de Madrid
|
85 |
+
Master of Science (MSc), Telecommunications Engineer · (2003 - 2009)
|
86 |
+
Instituto BME
|
87 |
+
Master's degree, Artificial Intelligence Applied to Financial Markets(MIAX) · (October 2023 - May 2025)
|
88 |
+
Page 6 of 6
|
json/ner_schema.json
ADDED
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"type": "object",
|
3 |
+
"properties": {
|
4 |
+
"experiencia": {
|
5 |
+
"type": "array",
|
6 |
+
"items": {
|
7 |
+
"type": "object",
|
8 |
+
"properties": {
|
9 |
+
"empresa": {"type": "string"},
|
10 |
+
"puesto": {"type": "string"},
|
11 |
+
"periodo": {
|
12 |
+
"type": "string",
|
13 |
+
"description": "Formato 'YYYYMM-YYYYMM' o simplemente 'YYYYMM' si no aparece fecha de fin."
|
14 |
+
}
|
15 |
+
},
|
16 |
+
"required": ["empresa", "puesto", "periodo"]
|
17 |
+
}
|
18 |
+
}
|
19 |
+
},
|
20 |
+
"required": ["experiencia"]
|
21 |
+
}
|
json/response_schema.json
ADDED
@@ -0,0 +1,38 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"type": "object",
|
3 |
+
"properties": {
|
4 |
+
"puntuacion": {
|
5 |
+
"type": "number"
|
6 |
+
},
|
7 |
+
"experiencia": {
|
8 |
+
"type": "array",
|
9 |
+
"items": {
|
10 |
+
"type": "object",
|
11 |
+
"properties": {
|
12 |
+
"empresa": {
|
13 |
+
"type": "string"
|
14 |
+
},
|
15 |
+
"puesto": {
|
16 |
+
"type": "string"
|
17 |
+
},
|
18 |
+
"duracion": {
|
19 |
+
"type": "integer"
|
20 |
+
}
|
21 |
+
},
|
22 |
+
"required": [
|
23 |
+
"empresa",
|
24 |
+
"puesto",
|
25 |
+
"duracion"
|
26 |
+
]
|
27 |
+
}
|
28 |
+
},
|
29 |
+
"descripcion de la experiencia": {
|
30 |
+
"type": "string"
|
31 |
+
}
|
32 |
+
},
|
33 |
+
"required": [
|
34 |
+
"puntuacion",
|
35 |
+
"experiencia relevante",
|
36 |
+
"descripcion de la experiencia"
|
37 |
+
]
|
38 |
+
}
|
prompts/ner_pre_prompt.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
Eres un procesador de currículos vitae que extrae títulos de puestos de trabajo, nombres de la empresa, y períodos de los mismos. Usa formato json en la salida con las claves "empresa", "puesto" y "periodo". Para el período, contempla cualquier formato de fecha o rango de fechas incluido en el texto. Un ejemplo de formato de fecha en la entrada es "Octubre 2023 / Marzo 2024". El contenido para la clave "período" debe ser un string con dos elementos en formato YYYYMM separados por un guion, por ejemplo "202310-202403", o uno en caso de no identificarse fecha de fin.
|
prompts/system_prompt.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
Eres un procesador de currículos vitae que recibe una oferta de trabajo un currículum vitae filtrado la experiencia relevante previa, una puntuación precalculada para el currículo entre 0 y 100, y un parámetro de experiencia requerida en meses. La puntuación se ha calculado mediante un algoritmo que usa distancias de embeddings entre cada uno de los puestos y la definición de la oferta, así como la duración de cada puesto y su relación con el parámetro de experiencia requerida. Devuelves un objeto con el esquema predefinido,incluyendo exactamente la misma puntuación proporcionada, el listado de experiencia proporcionado y además devuelves un breve texto explicativo sobre la experiencia del candidato y por qué ha obtenido la puntuación dada. Es importante que el texto explicativo sea coherente con la puntuación. Por ejemplo, si la puntuación es mayor que 80, el texto explicativo debe hacer énfasis en las experiencias pasadas y la duración de las mismas que han llevado a esa puntuación. Cuando menciones alguna duración de una experiencia superior a 12 meses, incluye en el texto sólo el aproximado en años, dado que los datos exactos están en el listado de experiencia adjunto.
|
prompts/user_prompt.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
El título de la oferta de trabajo es: {job}.La experiencia requerida en meses es: {req_experience}.La puntuacion es {puntuacion}, La experiencia relevante es: {exp}. Explica por qué se ha obtenido la puntuación
|
src/__pycache__/procesador_de_cvs_con_llm.cpython-311.pyc
ADDED
Binary file (17.4 kB). View file
|
|
src/procesador_de_cvs_con_llm.py
ADDED
@@ -0,0 +1,284 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
import os
|
3 |
+
import pandas as pd
|
4 |
+
import json
|
5 |
+
import textwrap
|
6 |
+
from scipy import spatial
|
7 |
+
from datetime import datetime
|
8 |
+
from openai import OpenAI
|
9 |
+
|
10 |
+
class ProcesadorCV:
|
11 |
+
|
12 |
+
def __init__(self, api_key, cv_text, job_text, ner_pre_prompt, system_prompt, user_prompt, ner_schema, response_schema,
|
13 |
+
inference_model="gpt-4o-mini", embeddings_model="text-embedding-3-small"):
|
14 |
+
"""
|
15 |
+
Inicializa una instancia de la clase con los parámetros proporcionados.
|
16 |
+
|
17 |
+
Args:
|
18 |
+
api_key (str): La clave de API para autenticar con el cliente OpenAI.
|
19 |
+
cv_text (str): contenido del CV en formato de texto.
|
20 |
+
job_text (str): título de la oferta de trabajo a evaluar.
|
21 |
+
ner_pre_prompt (str): instrucción de "reconocimiento de entidades nombradas" (NER) para el modelo en lenguaje natural.
|
22 |
+
system_prompt (str): instrucción en lenguaje natural para la salida estructurada final.
|
23 |
+
user_prompt (str): instrucción con los parámetros y datos calculados en el preprocesamiento.
|
24 |
+
ner_schema (dict): esquema para la llamada con "structured outputs" al modelo de OpenAI para NER.
|
25 |
+
response_schema (dict): esquema para la respuesta final de la aplicación.
|
26 |
+
inference_model (str, opcional): El modelo de inferencia a utilizar. Por defecto es "gpt-4o-mini".
|
27 |
+
embeddings_model (str, opcional): El modelo de embeddings a utilizar. Por defecto es "text-embedding-3-small".
|
28 |
+
|
29 |
+
Atributos:
|
30 |
+
inference_model (str): Almacena el modelo de inferencia seleccionado.
|
31 |
+
embeddings_model (str): Almacena el modelo de embeddings seleccionado.
|
32 |
+
client (OpenAI): Instancia del cliente OpenAI inicializada con la clave de API proporcionada.
|
33 |
+
cv (str): Almacena el texto del currículum vitae proporcionado.
|
34 |
+
|
35 |
+
"""
|
36 |
+
self.inference_model = inference_model
|
37 |
+
self.embeddings_model = embeddings_model
|
38 |
+
self.ner_pre_prompt = ner_pre_prompt
|
39 |
+
self.user_prompt = user_prompt
|
40 |
+
self.system_prompt = system_prompt
|
41 |
+
self.ner_schema = ner_schema
|
42 |
+
self.response_schema = response_schema
|
43 |
+
self.client = OpenAI(api_key=api_key)
|
44 |
+
self.cv = cv_text
|
45 |
+
self.job_text = job_text
|
46 |
+
print("Cliente inicializado como",self.client)
|
47 |
+
|
48 |
+
def extraer_datos_cv(self, temperature=0.5):
|
49 |
+
"""
|
50 |
+
Extrae datos estructurados de un CV con OpenAI API.
|
51 |
+
Args:
|
52 |
+
pre_prompt (str): instrucción para el modelo en lenguaje natural.
|
53 |
+
schema (dict): esquema de los parámetros que se espera extraer del CV.
|
54 |
+
temperature (float, optional): valor de temperatura para el modelo de lenguaje. Por defecto es 0.5.
|
55 |
+
Returns:
|
56 |
+
pd.DataFrame: DataFrame con los datos estructurados extraídos del CV.
|
57 |
+
Raises:
|
58 |
+
ValueError: si no se pueden extraer datos estructurados del CV.
|
59 |
+
"""
|
60 |
+
response = self.client.chat.completions.create(
|
61 |
+
model=self.inference_model,
|
62 |
+
temperature=temperature,
|
63 |
+
messages=[
|
64 |
+
{"role": "system", "content": self.ner_pre_prompt},
|
65 |
+
{"role": "user", "content": self.cv}
|
66 |
+
],
|
67 |
+
functions=[
|
68 |
+
{
|
69 |
+
"name": "extraer_datos_cv",
|
70 |
+
"description": "Extrae tabla con títulos de puesto de trabajo, nombres de empresa y períodos de un CV.",
|
71 |
+
"parameters": self.ner_schema
|
72 |
+
}
|
73 |
+
],
|
74 |
+
function_call="auto"
|
75 |
+
)
|
76 |
+
|
77 |
+
if response.choices[0].message.function_call:
|
78 |
+
function_call = response.choices[0].message.function_call
|
79 |
+
structured_output = json.loads(function_call.arguments)
|
80 |
+
if structured_output.get("experiencia"):
|
81 |
+
df_cv = pd.DataFrame(structured_output["experiencia"])
|
82 |
+
return df_cv
|
83 |
+
else:
|
84 |
+
raise ValueError(f"No se han podido extraer datos estructurados: {response.choices[0].message.content}")
|
85 |
+
else:
|
86 |
+
raise ValueError(f"No se han podido extraer datos estructurados: {response.choices[0].message.content}")
|
87 |
+
|
88 |
+
|
89 |
+
def procesar_periodos(self, df):
|
90 |
+
"""
|
91 |
+
Procesa los períodos en un DataFrame y añade columnas con las fechas de inicio, fin y duración en meses.
|
92 |
+
Si no hay fecha de fin, se considera la fecha actual.
|
93 |
+
Args:
|
94 |
+
df (pandas.DataFrame): DataFrame que contiene una columna 'periodo' con períodos en formato 'YYYYMM-YYYYMM' o 'YYYYMM'.
|
95 |
+
Returns:
|
96 |
+
pandas.DataFrame: DataFrame con columnas adicionales 'fec_inicio', 'fec_final' y 'duracion'.
|
97 |
+
- 'fec_inicio' (datetime.date): Fecha de inicio del período.
|
98 |
+
- 'fec_final' (datetime.date): Fecha de fin del período.
|
99 |
+
- 'duracion' (int): Duración del período en meses.
|
100 |
+
"""
|
101 |
+
# Función lambda para procesar el período
|
102 |
+
def split_periodo(periodo):
|
103 |
+
dates = periodo.split('-')
|
104 |
+
start_date = datetime.strptime(dates[0], "%Y%m")
|
105 |
+
if len(dates) > 1:
|
106 |
+
end_date = datetime.strptime(dates[1], "%Y%m")
|
107 |
+
else:
|
108 |
+
end_date = datetime.now()
|
109 |
+
return start_date, end_date
|
110 |
+
|
111 |
+
df[['fec_inicio', 'fec_final']] = df['periodo'].apply(lambda x: pd.Series(split_periodo(x)))
|
112 |
+
|
113 |
+
# Formateamos las fechas para mostrar mes, año, y el primer día del mes (dado que el día es irrelevante y no se suele especificar)
|
114 |
+
df['fec_inicio'] = df['fec_inicio'].dt.date
|
115 |
+
df['fec_final'] = df['fec_final'].dt.date
|
116 |
+
|
117 |
+
# Añadimos una columna con la duración en meses
|
118 |
+
df['duracion'] = df.apply(
|
119 |
+
lambda row: (row['fec_final'].year - row['fec_inicio'].year) * 12 +
|
120 |
+
row['fec_final'].month - row['fec_inicio'].month,
|
121 |
+
axis=1
|
122 |
+
)
|
123 |
+
|
124 |
+
return df
|
125 |
+
|
126 |
+
|
127 |
+
def calcular_embeddings(self, df, column='puesto', model_name='text-embedding-3-small'):
|
128 |
+
"""
|
129 |
+
Calcula los embeddings de una columna de un dataframe con OpenAI API.
|
130 |
+
Args:
|
131 |
+
cv_df (pandas.DataFrame): DataFrame con los datos de los CV.
|
132 |
+
column (str, optional): Nombre de la columna que contiene los datos a convertir en embeddings. Por defecto es 'puesto'.
|
133 |
+
model_name (str, optional): Nombre del modelo de embeddings. Por defecto es 'text-embedding-3-small'.
|
134 |
+
"""
|
135 |
+
df['embeddings'] = df[column].apply(
|
136 |
+
lambda puesto: self.client.embeddings.create(
|
137 |
+
input=puesto,
|
138 |
+
model=model_name
|
139 |
+
).data[0].embedding
|
140 |
+
)
|
141 |
+
return df
|
142 |
+
|
143 |
+
|
144 |
+
def calcular_distancias(self, df, column='embeddings', model_name='text-embedding-3-small'):
|
145 |
+
"""
|
146 |
+
Calcula la distancia coseno entre los embeddings del texto y los incluidos en una columna del dataframe.
|
147 |
+
Params:
|
148 |
+
df (pandas.DataFrame): DataFrame que contiene los embeddings.
|
149 |
+
column (str, optional): nombre de la columna del DataFrame que contiene los embeddings. Por defecto, 'embeddings'.
|
150 |
+
model_name (str, optional): modelo de embeddings de la API de OpenAI. Por defecto "text-embedding-3-small".
|
151 |
+
Returns:
|
152 |
+
pandas.DataFrame: DataFrame ordenado de menor a mayor distancia, con las distancias en una nueva columna.
|
153 |
+
"""
|
154 |
+
response = self.client.embeddings.create(
|
155 |
+
input=self.job_text,
|
156 |
+
model=model_name
|
157 |
+
)
|
158 |
+
emb_compare = response.data[0].embedding
|
159 |
+
|
160 |
+
df['distancia'] = df[column].apply(lambda emb: spatial.distance.cosine(emb, emb_compare))
|
161 |
+
df.drop(columns=[column], inplace=True)
|
162 |
+
df.sort_values(by='distancia', ascending=True, inplace=True)
|
163 |
+
return df
|
164 |
+
|
165 |
+
|
166 |
+
def calcular_puntuacion(self, df, req_experience, positions_cap=4, dist_threshold_low=0.6, dist_threshold_high=0.7):
|
167 |
+
"""
|
168 |
+
Calcula la puntuación de un CV a partir de su tabla de distancias (con respecto a un puesto dado) y duraciones.
|
169 |
+
|
170 |
+
Params:
|
171 |
+
df (pandas.DataFrame): datos de un CV incluyendo diferentes experiencias incluyendo duracies y distancia previamente calculadas sobre los embeddings de un puesto de trabajo
|
172 |
+
req_experience (float): experiencia requerida en meses para el puesto de trabajo (valor de referencia para calcular una puntuación entre 0 y 100 en base a diferentes experiencias)
|
173 |
+
positions_cap (int, optional): Maximum number of positions to consider for scoring. Defaults to 4.
|
174 |
+
dist_threshold_low (float, optional): Distancia entre embeddings a partir de la cual el puesto del CV se considera "equivalente" al de la oferta.
|
175 |
+
max_dist_threshold (float, optional): Distancia entre embeddings a partir de la cual el puesto del CV no puntúa.
|
176 |
+
|
177 |
+
Returns:
|
178 |
+
pandas.DataFrame: DataFrame original añadiendo una columna con las puntuaciones individuales contribuidas por cada puesto.
|
179 |
+
float: Puntuación total entre 0 y 100.
|
180 |
+
"""
|
181 |
+
# A efectos de puntuación, computamos para cada puesto como máximo el número total de meses de experiencia requeridos
|
182 |
+
df['duration_capped'] = df['duracion'].apply(lambda x: min(x, req_experience))
|
183 |
+
# Normalizamos la distancia entre 0 y 1, siendo 0 la distancia mínima y 1 la máxima
|
184 |
+
df['adjusted_distance'] = df['distancia'].apply(
|
185 |
+
lambda x: 0 if x <= dist_threshold_low else (
|
186 |
+
1 if x >= dist_threshold_high else (x - dist_threshold_low) / (dist_threshold_high - dist_threshold_low)
|
187 |
+
)
|
188 |
+
)
|
189 |
+
# Cada puesto puntúa en base a su duración y a la inversa de la distancia (a menor distancia, mayor puntuación)
|
190 |
+
df['position_score'] = round(((1 - df['adjusted_distance']) * (df['duration_capped']/req_experience) * 100), 2)
|
191 |
+
# Descartamos puestos con distancia superior al umbral definido (asignamos puntuación 0), y ordenamos por puntuación
|
192 |
+
df.loc[df['distancia'] >= dist_threshold_high, 'position_score'] = 0
|
193 |
+
df = df.sort_values(by='position_score', ascending=False)
|
194 |
+
# Nos quedamos con los puestos con mayor puntuación (positions_cap)
|
195 |
+
df.iloc[positions_cap:, df.columns.get_loc('position_score')] = 0
|
196 |
+
# Totalizamos (no debería superar 100 nunca, pero ponemos un límite para asegurar) y redondeamos a dos decimales
|
197 |
+
total_score = round(min(df['position_score'].sum(), 100), 2)
|
198 |
+
return df, total_score
|
199 |
+
|
200 |
+
def filtra_experiencia_relevante(self, df):
|
201 |
+
"""
|
202 |
+
Filtra las experiencias relevantes del dataframe y las devuelve en formato diccionario.
|
203 |
+
Args:
|
204 |
+
df (pandas.DataFrame): DataFrame con la información completa de experiencia.
|
205 |
+
Returns:
|
206 |
+
dict: Diccionario con las experiencias relevantes.
|
207 |
+
"""
|
208 |
+
df_experiencia = df[df['position_score'] > 0].copy()
|
209 |
+
df_experiencia.drop(columns=['periodo', 'fec_inicio', 'fec_final',
|
210 |
+
'distancia', 'duration_capped', 'adjusted_distance'], inplace=True)
|
211 |
+
experiencia_dict = df_experiencia.to_dict(orient='list')
|
212 |
+
return experiencia_dict
|
213 |
+
|
214 |
+
def llamada_final(self, req_experience, puntuacion, dict_experiencia):
|
215 |
+
"""
|
216 |
+
Realiza la llamada final al modelo de lenguaje para generar la respuesta final.
|
217 |
+
Args:
|
218 |
+
req_experience (int): Experiencia requerida en meses para el puesto de trabajo.
|
219 |
+
puntuacion (float): Puntuación total del CV.
|
220 |
+
dict_experiencia (dict): Diccionario con las experiencias relevantes.
|
221 |
+
Returns:
|
222 |
+
dict: Diccionario con la respuesta final.
|
223 |
+
"""
|
224 |
+
messages = [
|
225 |
+
{
|
226 |
+
"role": "system",
|
227 |
+
"content": self.system_prompt
|
228 |
+
},
|
229 |
+
{
|
230 |
+
"role": "user",
|
231 |
+
"content": self.user_prompt.format(job=self.job_text, req_experience=req_experience,puntuacion=puntuacion, exp=dict_experiencia)
|
232 |
+
}
|
233 |
+
]
|
234 |
+
|
235 |
+
functions = [
|
236 |
+
{
|
237 |
+
"name": "respuesta_formateada",
|
238 |
+
"description": "Devuelve el objeto con puntuacion, experiencia y descripcion de la experiencia",
|
239 |
+
"parameters": self.response_schema
|
240 |
+
}
|
241 |
+
]
|
242 |
+
|
243 |
+
response = self.client.chat.completions.create(
|
244 |
+
model=self.inference_model,
|
245 |
+
temperature=0.5,
|
246 |
+
messages=messages,
|
247 |
+
functions=functions,
|
248 |
+
function_call={"name": "respuesta_formateada"}
|
249 |
+
)
|
250 |
+
|
251 |
+
if response.choices[0].message.function_call:
|
252 |
+
function_call = response.choices[0].message.function_call
|
253 |
+
structured_output = json.loads(function_call.arguments)
|
254 |
+
print("Respuesta:\n", json.dumps(structured_output, indent=4, ensure_ascii=False))
|
255 |
+
wrapped_description = textwrap.fill(structured_output['descripcion de la experiencia'], width=120)
|
256 |
+
print(f"Descripción de la experiencia:\n{wrapped_description}")
|
257 |
+
return structured_output
|
258 |
+
else:
|
259 |
+
raise ValueError(f"Error. No se ha podido generar respuesta:\n {response.choices[0].message.content}")
|
260 |
+
|
261 |
+
def procesar_cv_completo(self, req_experience, positions_cap, dist_threshold_low, dist_threshold_high):
|
262 |
+
"""
|
263 |
+
Procesa un CV y calcula la puntuación final.
|
264 |
+
Args:
|
265 |
+
req_experience (int, optional): Experiencia requerida en meses para el puesto de trabajo.
|
266 |
+
positions_cap (int, optional): Número máximo de puestos a considerar para la puntuación.
|
267 |
+
dist_threshold_low (float, optional): Distancia límite para considerar un puesto equivalente.
|
268 |
+
dist_threshold_high (float, optional): Distancia límite para considerar un puesto no relevante.
|
269 |
+
Returns:
|
270 |
+
pd.DataFrame: DataFrame con las puntuaciones individuales contribuidas por cada puesto.
|
271 |
+
float: Puntuación total entre 0 y 100.
|
272 |
+
"""
|
273 |
+
df_datos_estructurados_cv = self.extraer_datos_cv()
|
274 |
+
df_datos_estructurados_cv = self.procesar_periodos(df_datos_estructurados_cv)
|
275 |
+
df_con_embeddings = self.calcular_embeddings(df_datos_estructurados_cv)
|
276 |
+
df_con_distancias = self.calcular_distancias(df_con_embeddings)
|
277 |
+
df_puntuaciones, puntuacion = self.calcular_puntuacion(df_con_distancias,
|
278 |
+
req_experience=req_experience,
|
279 |
+
positions_cap=positions_cap,
|
280 |
+
dist_threshold_low=dist_threshold_low,
|
281 |
+
dist_threshold_high=dist_threshold_high)
|
282 |
+
dict_experiencia = self.filtra_experiencia_relevante(df_puntuaciones)
|
283 |
+
dict_respuesta = self.llamada_final(req_experience, puntuacion, dict_experiencia)
|
284 |
+
return dict_respuesta
|