Aqsa Kausar commited on
Commit
89cf21f
·
unverified ·
1 Parent(s): 6a29073

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -1
README.md CHANGED
@@ -50,6 +50,33 @@ You should define your own project by writing at most one page description of th
50
  ### What to deliver
51
  You should deliver your project as a stand alone serverless ML system. You should submit a URL for your service, a zip file containing your code, and a short report (two to three pages) about what you have done, the dataset, your method, your results, and how to run the code. I encourage you to have the README.md for your project in your Github report as the report for your project.
52
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
 
54
 
55
 
@@ -58,4 +85,4 @@ You should deliver your project as a stand alone serverless ML system. You shoul
58
  2. Tagging of JP
59
  - tag date
60
  3. Training
61
- 4. Visualisation
 
50
  ### What to deliver
51
  You should deliver your project as a stand alone serverless ML system. You should submit a URL for your service, a zip file containing your code, and a short report (two to three pages) about what you have done, the dataset, your method, your results, and how to run the code. I encourage you to have the README.md for your project in your Github report as the report for your project.
52
 
53
+ # Skill Embeddings and Visualization
54
+
55
+ We generate embeddings for technical skills listed in .txt files and visualizes their relationships using dimensionality reduction and clustering techniques. The visualizations are created for both 2D and 3D embeddings, and clustering is performed using KMeans to identify groups of similar skills.
56
+
57
+ ## Workflow
58
+
59
+ ### 1. Input Data
60
+ - Skills are loaded from `.txt` files located in date-based subfolders under the `./tags` directory.
61
+ - Each subfolder corresponds to a specific date (e.g., `03-01-2024`).
62
+
63
+ ### 2. Embedding Generation
64
+ - The script uses the `SentenceTransformer` model (`paraphrase-MiniLM-L3-v2`) to generate high-dimensional embeddings for the unique skills.
65
+
66
+ ### 3. Dimensionality Reduction
67
+ - UMAP (Uniform Manifold Approximation and Projection) is used to reduce the embeddings to:
68
+ - **2D**: For creating simple scatter plots.
69
+ - **3D**: For interactive visualizations.
70
+
71
+ ### 4. Clustering
72
+ - KMeans clustering is applied to the 3D embeddings to group similar skills into clusters.
73
+ - The number of clusters can be specified in the script.
74
+
75
+ ### 5. Visualization and Outputs
76
+ - **2D Projection**: Saved as PNG images in the `./plots` folder.
77
+ - **3D Projection**: Saved as interactive HTML files in the `./plots` folder.
78
+ - **3D Clustering Visualization**: Saved as HTML files, showing clusters with different colors.
79
+
80
 
81
 
82
 
 
85
  2. Tagging of JP
86
  - tag date
87
  3. Training
88
+ 4. Visualisation