# Tennis Match Win Streak Analysis

This project aims to analyze the most challenging path to a tennis historical winning-streak using machine learning and deep learning techniques. The model incorporates advanced feature engineering, data preprocessing, and a combination of neural networks and ensemble models to achieve accurate analysis of the level of competition.

## Table of Contents

- [Features](#features)
- [Installation](#installation)
- [Usage](#usage)
- [Data](#data)
- [Model Architecture](#model-architecture)
- [Hyperparameter Tuning](#hyperparameter-tuning)
- [Results](#results)
- [Contributing](#contributing)
- [License](#license)

## Features

- Advanced feature engineering, including rank and Elo rating differences.
- Data preprocessing with error handling and missing value management.
- PyTorch Lightning framework for building and training neural networks.
- Hyperparameter optimization using Optuna.
- Ensemble methods for improved accuracy.
- Winning streak analysis using clustering techniques.

## Installation

To get started with this project, you need to have Python 3.x installed. You can then install the required packages using pip:
pip install -r requirements.txt

## Usage
Place your match data CSV files (named PlayerMatches2.csv to PlayerMatches15.csv) in the project directory.

Run the script to load the data, preprocess it, and train the models:

python main.py

The script will save the best model and configuration for later analysis.

## Data
The project uses historical player match data in CSV format. Each file should contain the following columns:

date

tournament

winner_name

winner_rank

winner_eloRating

loser_name

loser_rank

loser_eloRating

Optional columns for enhanced feature engineering can also be included.

## Model Architecture
The project utilizes a custom neural network with:

Categorical embeddings for player names and other categorical features.

Fully connected layers to process both embedded and numerical input features.

Dropout layers for regularization.

Hyperparameter Tuning

Hyperparameter optimization is performed using Optuna, allowing for fine-tuning of:

Embedding dimensions

Hidden layer sizes

Learning rates

Dropout rates

Batch sizes

## Results

The model's performance is evaluated using mean squared error (MSE) on the validation set. Ensemble models are also trained and compared for additional insights.

## Contributing

Contributions are welcome! If you have suggestions for improvements or new features, feel free to submit a pull request or open an issue.

## License

This project is licensed under the MIT License. See the LICENSE file for more information