Aqsa Kausar
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -50,6 +50,33 @@ You should define your own project by writing at most one page description of th
|
|
50 |
### What to deliver
|
51 |
You should deliver your project as a stand alone serverless ML system. You should submit a URL for your service, a zip file containing your code, and a short report (two to three pages) about what you have done, the dataset, your method, your results, and how to run the code. I encourage you to have the README.md for your project in your Github report as the report for your project.
|
52 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
53 |
|
54 |
|
55 |
|
@@ -58,4 +85,4 @@ You should deliver your project as a stand alone serverless ML system. You shoul
|
|
58 |
2. Tagging of JP
|
59 |
- tag date
|
60 |
3. Training
|
61 |
-
4. Visualisation
|
|
|
50 |
### What to deliver
|
51 |
You should deliver your project as a stand alone serverless ML system. You should submit a URL for your service, a zip file containing your code, and a short report (two to three pages) about what you have done, the dataset, your method, your results, and how to run the code. I encourage you to have the README.md for your project in your Github report as the report for your project.
|
52 |
|
53 |
+
# Skill Embeddings and Visualization
|
54 |
+
|
55 |
+
We generate embeddings for technical skills listed in .txt files and visualizes their relationships using dimensionality reduction and clustering techniques. The visualizations are created for both 2D and 3D embeddings, and clustering is performed using KMeans to identify groups of similar skills.
|
56 |
+
|
57 |
+
## Workflow
|
58 |
+
|
59 |
+
### 1. Input Data
|
60 |
+
- Skills are loaded from `.txt` files located in date-based subfolders under the `./tags` directory.
|
61 |
+
- Each subfolder corresponds to a specific date (e.g., `03-01-2024`).
|
62 |
+
|
63 |
+
### 2. Embedding Generation
|
64 |
+
- The script uses the `SentenceTransformer` model (`paraphrase-MiniLM-L3-v2`) to generate high-dimensional embeddings for the unique skills.
|
65 |
+
|
66 |
+
### 3. Dimensionality Reduction
|
67 |
+
- UMAP (Uniform Manifold Approximation and Projection) is used to reduce the embeddings to:
|
68 |
+
- **2D**: For creating simple scatter plots.
|
69 |
+
- **3D**: For interactive visualizations.
|
70 |
+
|
71 |
+
### 4. Clustering
|
72 |
+
- KMeans clustering is applied to the 3D embeddings to group similar skills into clusters.
|
73 |
+
- The number of clusters can be specified in the script.
|
74 |
+
|
75 |
+
### 5. Visualization and Outputs
|
76 |
+
- **2D Projection**: Saved as PNG images in the `./plots` folder.
|
77 |
+
- **3D Projection**: Saved as interactive HTML files in the `./plots` folder.
|
78 |
+
- **3D Clustering Visualization**: Saved as HTML files, showing clusters with different colors.
|
79 |
+
|
80 |
|
81 |
|
82 |
|
|
|
85 |
2. Tagging of JP
|
86 |
- tag date
|
87 |
3. Training
|
88 |
+
4. Visualisation
|