Robzy commited on
Commit
e5096ab
·
1 Parent(s): 46d78e6
Files changed (1) hide show
  1. README.md +24 -24
README.md CHANGED
@@ -9,25 +9,38 @@ app_file: app.py
9
  pinned: false
10
  ---
11
 
12
-
13
  # In-demand Skill Monitoring for Machine Learning Industry
14
 
15
  ## About
16
 
17
- This projects strives to monitor in-demand skills for machine learning roles based in Stockholm, Sweden.
18
 
19
- # Project outline
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
- ## Model: skills extraction model
22
 
23
- [Model: skills extraction model from HuggingFace](https://huggingface.co/spaces/jjzha/skill_extraction_demo)
 
 
24
 
25
  ## Inference
26
  1. Extracting new job abs from Indeed/LinkedIn
27
  2. Extract skills from job ads via skills extraction model
28
 
29
  ## Online training
30
- Extract ground truth via LLM and few-shot learning.
31
 
32
  ## Skill compilation
33
  Save all skills. Make a comprehensive overview by:
@@ -35,20 +48,6 @@ Save all skills. Make a comprehensive overview by:
35
  1. Embed skills to a vector with an embedding model
36
  2. Perform clustering with KMeans
37
  2. Visualize clustering with dimensionality reduction (UMAP)
38
-
39
- Inspiration: [link](https://dylancastillo.co/posts/clustering-documents-with-openai-langchain-hdbscan.html)
40
-
41
-
42
- ## Project requirements:
43
-
44
- You should define your own project by writing at most one page description of the project. The proposed project should be approved by the examiner. The project proposal should cover the following headings:
45
-
46
- ### Problem description: what are the data sources and the prediction problem that you will be building a ML System for?
47
- ### Tools: what tools you are going to use? In the course we mainly used Decision Trees and PyTorch/Tensorflow, but you are free to explore new tools and technologies.
48
- ### Data: what data will you use and how are you going to collect it?
49
- ### Methodology and algorithm: what method(s) or algorithm(s) are you proposing?
50
- ### What to deliver
51
- You should deliver your project as a stand alone serverless ML system. You should submit a URL for your service, a zip file containing your code, and a short report (two to three pages) about what you have done, the dataset, your method, your results, and how to run the code. I encourage you to have the README.md for your project in your Github report as the report for your project.
52
 
53
 
54
  # Job Scraping
@@ -97,8 +96,9 @@ We generate embeddings for technical skills listed in .txt files and visualizes
97
 
98
  # Scheduling
99
 
100
- - scrapping: We run scrapping weekly to fetch job descriptions for machine learning from LinkedIn
101
- - LLM tagging:
102
- - Training:
103
- - Embedding and visualization: On weekly basis, we also use the skills extracted to create their embeddings and visualize them using KMeans clustering
104
 
 
 
 
 
 
9
  pinned: false
10
  ---
11
 
 
12
  # In-demand Skill Monitoring for Machine Learning Industry
13
 
14
  ## About
15
 
16
+ This projects aims to monitor in-demand skills for machine learning roles. Skills are extracted with a BERT-based skill extraction model called JobBERT, which is continously fine-tuned on the job postings. The skills are monitored/visualized 1. embedding the extracted skills tokens into vector form, 2. performing dimensionality reduction with UMAP, 3. visualizing the reduced embeddings.
17
 
18
+ ![Header Image](header.png)
19
+
20
+ ### [Monitoring Platform Link](https://huggingface.co/spaces/jjzha/skill_extraction_demo)
21
+
22
+ ## Architecture & Frameworks
23
+
24
+
25
+ - ** Hugging Face Spaces **
26
+ - ** Gradio **
27
+ - ** GitHub Actions **
28
+ - ** Rapid API **
29
+ - ** Weight & Biases **
30
+ - ** Rapid API **
31
+ - ** OpenAI API **
32
 
 
33
 
34
+ # High-Level Overview
35
+
36
+ ## Model: skills extraction model
37
 
38
  ## Inference
39
  1. Extracting new job abs from Indeed/LinkedIn
40
  2. Extract skills from job ads via skills extraction model
41
 
42
  ## Online training
43
+ Continual training, extract ground truth via LLM with multi-shot learning with examples.
44
 
45
  ## Skill compilation
46
  Save all skills. Make a comprehensive overview by:
 
48
  1. Embed skills to a vector with an embedding model
49
  2. Perform clustering with KMeans
50
  2. Visualize clustering with dimensionality reduction (UMAP)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
 
52
 
53
  # Job Scraping
 
96
 
97
  # Scheduling
98
 
99
+ To monitor the in-demand skills and update our model continously, scheduling is employed. The following scripts are scheduled every Sunday:
 
 
 
100
 
101
+ 1. Job-posting scraping: fetching job descriptions for machine learning from LinkedIn
102
+ 2. Skills tagging with LLM: we decide to extract the ground truth of skills from the job descriptions by leveraging multi-shot learning and prompt engeneering.
103
+ 3. Training
104
+ 4. Embedding and visualizatio - skills are embedded and visualized with KMeans clustering