File size: 4,070 Bytes
be27668 0f7a2ff |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 |
---
license: mit
language:
- en
tags:
- speaker
- speaker_Recognition
- gender
- voicebased
- ai
- ml
---
# AuthEcho_Project
This project contains well-trained deep learning models to predict the **Speaker** and their **Gender**.
The repository offers a **Speaker and Gender Prediction System** built using **TensorFlow**, **Librosa**, and **Gradio**. The application predicts the top 3 speakers and their probabilities from an audio file, determines the speaker's gender, and classifies unknown speakers using a confidence threshold.
## Features
- Predicts the top 3 speakers from an audio file.
- Determines the gender of the speaker.
- Identifies unknown speakers with a confidence threshold.
- Provides a Gradio interface for easy testing.
## Getting Started
### Prerequisites
To run this application, you need:
- **Python**: Version 3.8 or higher
- Required Python libraries:
- `tensorflow`
- `numpy`
- `librosa`
- `gradio`
- `scikit-learn`
Install the required libraries with:
```
pip install tensorflow numpy librosa gradio scikit-learn
```
### Installation
1. **Clone the Repository**:
```
git clone https://github.com/your-username/speaker-gender-prediction.git
cd speaker-gender-prediction
```
2. **Add Pre-Trained Models and Label Encoders**:
Place the following files in the repository's root directory:
- `lstm_speaker_model.h5`: Pre-trained speaker recognition model.
- `lstm_gender_model.h5`: Pre-trained gender prediction model.
- `lstm_speaker_label.pkl`: Label encoder for speaker classes.
- `lstm_gender_label.pkl`: Label encoder for gender classes.
### Usage
Run the application using:
```
python app.py
```
### Gradio Interface
The Gradio interface allows you to:
- **Upload** an audio file or **record** audio directly.
- Predict the **top 3 speakers** and their probabilities.
- Determine the **gender** of the speaker.
- Detect and classify **unknown speakers** using confidence thresholds.
## Project Structure
```
.
βββ app.py # Main application file
βββ models/lstm_speaker_model.h5 # Pre-trained speaker model (to be added)
βββ models/lstm_gender_model.h5 # Pre-trained gender model (to be added)
βββ models/lstm_speaker_label.pkl # Speaker label encoder (to be added)
βββ models/lstm_gender_label.pkl # Gender label encoder (to be added)
βββ requirements.txt # Python dependencies
βββ README.md # Project documentation
```
## Example Output
### Top 3 Predicted Speakers:
```
The top 3 predicted speakers are:
Speaker 1: 85.23%
Speaker 2: 10.12%
Speaker 3: 4.65%
The predicted gender is: Male
```
### Unknown Speaker:
```
The top 3 predicted speakers are:
Unknown: 45.23%
The predicted gender is: Unknown
```
## How It Works
1. **Feature Extraction**:
- Extracts **MFCCs**, **chroma features**, and **spectral contrast** from the input audio file using `librosa`.
2. **Speaker and Gender Models**:
- **Speaker Model**: A pre-trained LSTM model classifies the speaker based on extracted features.
- **Gender Model**: A separate LSTM model determines the gender.
3. **Unknown Detection**:
- If the highest confidence score for a speaker is below a defined threshold, the speaker is classified as "Unknown."
## Roadmap
- Add support for real-time audio predictions.
- Improve unknown speaker detection using open-set recognition techniques.
- Expand the dataset for more robust gender classification.
## Contributing
Contributions are welcome! To contribute:
1. Fork the repository.
2. Create a feature branch (`git checkout -b feature-branch-name`).
3. Commit your changes (`git commit -m "Add new feature"`).
4. Push to the branch (`git push origin feature-branch-name`).
5. Open a Pull Request.
## License
This project is licensed under the **MIT License**. See the [LICENSE](LICENSE) file for details.
## Acknowledgments
- **TensorFlow**: For building the deep learning models.
- **Librosa**: For audio processing and feature extraction.
- **Gradio**: For creating the user interface.
|