File size: 1,877 Bytes
0a26abe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# Clustering Algorithms for Customer Segmentation

This repository contains a comprehensive implementation of various clustering algorithms to perform customer segmentation on a synthetic dataset. The project explores K-Means, Hierarchical Clustering, DBSCAN, and Gaussian Mixture Models (GMM) to identify distinct customer groups based on age and income.

## Project Structure

- `implementation.ipynb`: The main Jupyter notebook containing the entire analysis, from data generation to model evaluation and visualization.
- `data/`: Contains the synthetic `customer_data.csv` file.
- `models/`: Stores the trained clustering models and the data scaler.
- `results/`: Includes the algorithm comparison, detailed analysis, and experiment summary.
- `visualizations/`: Contains the output plots, such as the elbow method analysis and cluster comparisons.

## Features

- **Data Generation**: A synthetic customer dataset is generated with clear cluster structures for effective model training and evaluation.
- **Multiple Algorithms**: Implements and compares four popular clustering algorithms:
    - K-Means
    - Hierarchical Clustering
    - DBSCAN
    - Gaussian Mixture Models (GMM)
- **Model Evaluation**: Uses the elbow method and silhouette scores to determine the optimal number of clusters and evaluate performance.
- **Comprehensive Visualization**: Generates plots to visualize the clusters, compare algorithm performance, and analyze the optimal 'k'.

## How to Use

1.  **Clone the repository:**
    ```bash
    git clone https://github.com/GruheshKurra/ClusteringAlgorithms.git
    ```
2.  **Install dependencies:**
    ```bash
    pip install -r requirements.txt
    ```
3.  **Run the notebook:**
    Open and run the `implementation.ipynb` notebook in a Jupyter environment to see the full analysis.

## License

This project is licensed under the MIT License.