AuthEcho_project / README.md
SuryaT1's picture
Update README.md
0f7a2ff verified
---
license: mit
language:
- en
tags:
- speaker
- speaker_Recognition
- gender
- voicebased
- ai
- ml
---
# AuthEcho_Project
This project contains well-trained deep learning models to predict the **Speaker** and their **Gender**.
The repository offers a **Speaker and Gender Prediction System** built using **TensorFlow**, **Librosa**, and **Gradio**. The application predicts the top 3 speakers and their probabilities from an audio file, determines the speaker's gender, and classifies unknown speakers using a confidence threshold.
## Features
- Predicts the top 3 speakers from an audio file.
- Determines the gender of the speaker.
- Identifies unknown speakers with a confidence threshold.
- Provides a Gradio interface for easy testing.
## Getting Started
### Prerequisites
To run this application, you need:
- **Python**: Version 3.8 or higher
- Required Python libraries:
- `tensorflow`
- `numpy`
- `librosa`
- `gradio`
- `scikit-learn`
Install the required libraries with:
```
pip install tensorflow numpy librosa gradio scikit-learn
```
### Installation
1. **Clone the Repository**:
```
git clone https://github.com/your-username/speaker-gender-prediction.git
cd speaker-gender-prediction
```
2. **Add Pre-Trained Models and Label Encoders**:
Place the following files in the repository's root directory:
- `lstm_speaker_model.h5`: Pre-trained speaker recognition model.
- `lstm_gender_model.h5`: Pre-trained gender prediction model.
- `lstm_speaker_label.pkl`: Label encoder for speaker classes.
- `lstm_gender_label.pkl`: Label encoder for gender classes.
### Usage
Run the application using:
```
python app.py
```
### Gradio Interface
The Gradio interface allows you to:
- **Upload** an audio file or **record** audio directly.
- Predict the **top 3 speakers** and their probabilities.
- Determine the **gender** of the speaker.
- Detect and classify **unknown speakers** using confidence thresholds.
## Project Structure
```
.
β”œβ”€β”€ app.py # Main application file
β”œβ”€β”€ models/lstm_speaker_model.h5 # Pre-trained speaker model (to be added)
β”œβ”€β”€ models/lstm_gender_model.h5 # Pre-trained gender model (to be added)
β”œβ”€β”€ models/lstm_speaker_label.pkl # Speaker label encoder (to be added)
β”œβ”€β”€ models/lstm_gender_label.pkl # Gender label encoder (to be added)
β”œβ”€β”€ requirements.txt # Python dependencies
└── README.md # Project documentation
```
## Example Output
### Top 3 Predicted Speakers:
```
The top 3 predicted speakers are:
Speaker 1: 85.23%
Speaker 2: 10.12%
Speaker 3: 4.65%
The predicted gender is: Male
```
### Unknown Speaker:
```
The top 3 predicted speakers are:
Unknown: 45.23%
The predicted gender is: Unknown
```
## How It Works
1. **Feature Extraction**:
- Extracts **MFCCs**, **chroma features**, and **spectral contrast** from the input audio file using `librosa`.
2. **Speaker and Gender Models**:
- **Speaker Model**: A pre-trained LSTM model classifies the speaker based on extracted features.
- **Gender Model**: A separate LSTM model determines the gender.
3. **Unknown Detection**:
- If the highest confidence score for a speaker is below a defined threshold, the speaker is classified as "Unknown."
## Roadmap
- Add support for real-time audio predictions.
- Improve unknown speaker detection using open-set recognition techniques.
- Expand the dataset for more robust gender classification.
## Contributing
Contributions are welcome! To contribute:
1. Fork the repository.
2. Create a feature branch (`git checkout -b feature-branch-name`).
3. Commit your changes (`git commit -m "Add new feature"`).
4. Push to the branch (`git push origin feature-branch-name`).
5. Open a Pull Request.
## License
This project is licensed under the **MIT License**. See the [LICENSE](LICENSE) file for details.
## Acknowledgments
- **TensorFlow**: For building the deep learning models.
- **Librosa**: For audio processing and feature extraction.
- **Gradio**: For creating the user interface.