|
--- |
|
license: mit |
|
language: |
|
- en |
|
tags: |
|
- speaker |
|
- speaker_Recognition |
|
- gender |
|
- voicebased |
|
- ai |
|
- ml |
|
--- |
|
|
|
|
|
# AuthEcho_Project |
|
|
|
This project contains well-trained deep learning models to predict the **Speaker** and their **Gender**. |
|
|
|
The repository offers a **Speaker and Gender Prediction System** built using **TensorFlow**, **Librosa**, and **Gradio**. The application predicts the top 3 speakers and their probabilities from an audio file, determines the speaker's gender, and classifies unknown speakers using a confidence threshold. |
|
|
|
## Features |
|
|
|
- Predicts the top 3 speakers from an audio file. |
|
- Determines the gender of the speaker. |
|
- Identifies unknown speakers with a confidence threshold. |
|
- Provides a Gradio interface for easy testing. |
|
|
|
## Getting Started |
|
|
|
### Prerequisites |
|
|
|
To run this application, you need: |
|
|
|
- **Python**: Version 3.8 or higher |
|
- Required Python libraries: |
|
- `tensorflow` |
|
- `numpy` |
|
- `librosa` |
|
- `gradio` |
|
- `scikit-learn` |
|
|
|
Install the required libraries with: |
|
|
|
``` |
|
pip install tensorflow numpy librosa gradio scikit-learn |
|
``` |
|
|
|
### Installation |
|
|
|
1. **Clone the Repository**: |
|
|
|
``` |
|
git clone https://github.com/your-username/speaker-gender-prediction.git |
|
cd speaker-gender-prediction |
|
``` |
|
|
|
2. **Add Pre-Trained Models and Label Encoders**: |
|
|
|
Place the following files in the repository's root directory: |
|
- `lstm_speaker_model.h5`: Pre-trained speaker recognition model. |
|
- `lstm_gender_model.h5`: Pre-trained gender prediction model. |
|
- `lstm_speaker_label.pkl`: Label encoder for speaker classes. |
|
- `lstm_gender_label.pkl`: Label encoder for gender classes. |
|
|
|
### Usage |
|
|
|
Run the application using: |
|
|
|
``` |
|
python app.py |
|
``` |
|
|
|
### Gradio Interface |
|
|
|
The Gradio interface allows you to: |
|
|
|
- **Upload** an audio file or **record** audio directly. |
|
- Predict the **top 3 speakers** and their probabilities. |
|
- Determine the **gender** of the speaker. |
|
- Detect and classify **unknown speakers** using confidence thresholds. |
|
|
|
## Project Structure |
|
|
|
``` |
|
. |
|
βββ app.py # Main application file |
|
βββ models/lstm_speaker_model.h5 # Pre-trained speaker model (to be added) |
|
βββ models/lstm_gender_model.h5 # Pre-trained gender model (to be added) |
|
βββ models/lstm_speaker_label.pkl # Speaker label encoder (to be added) |
|
βββ models/lstm_gender_label.pkl # Gender label encoder (to be added) |
|
βββ requirements.txt # Python dependencies |
|
βββ README.md # Project documentation |
|
``` |
|
|
|
## Example Output |
|
|
|
### Top 3 Predicted Speakers: |
|
|
|
``` |
|
The top 3 predicted speakers are: |
|
Speaker 1: 85.23% |
|
Speaker 2: 10.12% |
|
Speaker 3: 4.65% |
|
|
|
The predicted gender is: Male |
|
``` |
|
|
|
### Unknown Speaker: |
|
|
|
``` |
|
The top 3 predicted speakers are: |
|
Unknown: 45.23% |
|
|
|
The predicted gender is: Unknown |
|
``` |
|
|
|
## How It Works |
|
|
|
1. **Feature Extraction**: |
|
- Extracts **MFCCs**, **chroma features**, and **spectral contrast** from the input audio file using `librosa`. |
|
|
|
2. **Speaker and Gender Models**: |
|
- **Speaker Model**: A pre-trained LSTM model classifies the speaker based on extracted features. |
|
- **Gender Model**: A separate LSTM model determines the gender. |
|
|
|
3. **Unknown Detection**: |
|
- If the highest confidence score for a speaker is below a defined threshold, the speaker is classified as "Unknown." |
|
|
|
## Roadmap |
|
|
|
- Add support for real-time audio predictions. |
|
- Improve unknown speaker detection using open-set recognition techniques. |
|
- Expand the dataset for more robust gender classification. |
|
|
|
## Contributing |
|
|
|
Contributions are welcome! To contribute: |
|
|
|
1. Fork the repository. |
|
2. Create a feature branch (`git checkout -b feature-branch-name`). |
|
3. Commit your changes (`git commit -m "Add new feature"`). |
|
4. Push to the branch (`git push origin feature-branch-name`). |
|
5. Open a Pull Request. |
|
|
|
## License |
|
|
|
This project is licensed under the **MIT License**. See the [LICENSE](LICENSE) file for details. |
|
|
|
|
|
|
|
|
|
## Acknowledgments |
|
|
|
- **TensorFlow**: For building the deep learning models. |
|
- **Librosa**: For audio processing and feature extraction. |
|
- **Gradio**: For creating the user interface. |
|
|