|
# Tennis Match Win Streak Analysis |
|
|
|
This project aims to analyze the most challenging path to a tennis historical winning-streak using machine learning and deep learning techniques. The model incorporates advanced feature engineering, data preprocessing, and a combination of neural networks and ensemble models to achieve accurate analysis of the level of competition. |
|
|
|
## Table of Contents |
|
|
|
- [Features](#features) |
|
- [Installation](#installation) |
|
- [Usage](#usage) |
|
- [Data](#data) |
|
- [Model Architecture](#model-architecture) |
|
- [Hyperparameter Tuning](#hyperparameter-tuning) |
|
- [Results](#results) |
|
- [Contributing](#contributing) |
|
- [License](#license) |
|
|
|
## Features |
|
|
|
- Advanced feature engineering, including rank and Elo rating differences. |
|
- Data preprocessing with error handling and missing value management. |
|
- PyTorch Lightning framework for building and training neural networks. |
|
- Hyperparameter optimization using Optuna. |
|
- Ensemble methods for improved accuracy. |
|
- Winning streak analysis using clustering techniques. |
|
|
|
## Installation |
|
|
|
To get started with this project, you need to have Python 3.x installed. You can then install the required packages using pip: |
|
pip install -r requirements.txt |
|
|
|
## Usage |
|
Place your match data CSV files (named PlayerMatches2.csv to PlayerMatches15.csv) in the project directory. |
|
|
|
Run the script to load the data, preprocess it, and train the models: |
|
|
|
python main.py |
|
|
|
The script will save the best model and configuration for later analysis. |
|
|
|
## Data |
|
The project uses historical player match data in CSV format. Each file should contain the following columns: |
|
|
|
date |
|
|
|
tournament |
|
|
|
winner_name |
|
|
|
winner_rank |
|
|
|
winner_eloRating |
|
|
|
loser_name |
|
|
|
loser_rank |
|
|
|
loser_eloRating |
|
|
|
Optional columns for enhanced feature engineering can also be included. |
|
|
|
## Model Architecture |
|
The project utilizes a custom neural network with: |
|
|
|
Categorical embeddings for player names and other categorical features. |
|
|
|
Fully connected layers to process both embedded and numerical input features. |
|
|
|
Dropout layers for regularization. |
|
|
|
Hyperparameter Tuning |
|
|
|
Hyperparameter optimization is performed using Optuna, allowing for fine-tuning of: |
|
|
|
Embedding dimensions |
|
|
|
Hidden layer sizes |
|
|
|
Learning rates |
|
|
|
Dropout rates |
|
|
|
Batch sizes |
|
|
|
## Results |
|
|
|
The model's performance is evaluated using mean squared error (MSE) on the validation set. Ensemble models are also trained and compared for additional insights. |
|
|
|
## Contributing |
|
|
|
Contributions are welcome! If you have suggestions for improvements or new features, feel free to submit a pull request or open an issue. |
|
|
|
## License |
|
|
|
This project is licensed under the MIT License. See the LICENSE file for more information |
|
|