|
# Clustering Algorithms for Customer Segmentation |
|
|
|
This repository contains a comprehensive implementation of various clustering algorithms to perform customer segmentation on a synthetic dataset. The project explores K-Means, Hierarchical Clustering, DBSCAN, and Gaussian Mixture Models (GMM) to identify distinct customer groups based on age and income. |
|
|
|
## Project Structure |
|
|
|
- `implementation.ipynb`: The main Jupyter notebook containing the entire analysis, from data generation to model evaluation and visualization. |
|
- `data/`: Contains the synthetic `customer_data.csv` file. |
|
- `models/`: Stores the trained clustering models and the data scaler. |
|
- `results/`: Includes the algorithm comparison, detailed analysis, and experiment summary. |
|
- `visualizations/`: Contains the output plots, such as the elbow method analysis and cluster comparisons. |
|
|
|
## Features |
|
|
|
- **Data Generation**: A synthetic customer dataset is generated with clear cluster structures for effective model training and evaluation. |
|
- **Multiple Algorithms**: Implements and compares four popular clustering algorithms: |
|
- K-Means |
|
- Hierarchical Clustering |
|
- DBSCAN |
|
- Gaussian Mixture Models (GMM) |
|
- **Model Evaluation**: Uses the elbow method and silhouette scores to determine the optimal number of clusters and evaluate performance. |
|
- **Comprehensive Visualization**: Generates plots to visualize the clusters, compare algorithm performance, and analyze the optimal 'k'. |
|
|
|
## How to Use |
|
|
|
1. **Clone the repository:** |
|
```bash |
|
git clone https://github.com/GruheshKurra/ClusteringAlgorithms.git |
|
``` |
|
2. **Install dependencies:** |
|
```bash |
|
pip install -r requirements.txt |
|
``` |
|
3. **Run the notebook:** |
|
Open and run the `implementation.ipynb` notebook in a Jupyter environment to see the full analysis. |
|
|
|
## License |
|
|
|
This project is licensed under the MIT License. |