File size: 4,070 Bytes
be27668
 
 
 
 
 
 
 
 
 
 
0f7a2ff
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
---
license: mit
language:
- en
tags:
- speaker
- speaker_Recognition
- gender
- voicebased
- ai
- ml
---


# AuthEcho_Project

This project contains well-trained deep learning models to predict the **Speaker** and their **Gender**.

The repository offers a **Speaker and Gender Prediction System** built using **TensorFlow**, **Librosa**, and **Gradio**. The application predicts the top 3 speakers and their probabilities from an audio file, determines the speaker's gender, and classifies unknown speakers using a confidence threshold.

## Features

- Predicts the top 3 speakers from an audio file.
- Determines the gender of the speaker.
- Identifies unknown speakers with a confidence threshold.
- Provides a Gradio interface for easy testing.

## Getting Started

### Prerequisites

To run this application, you need:

- **Python**: Version 3.8 or higher
- Required Python libraries:
  - `tensorflow`
  - `numpy`
  - `librosa`
  - `gradio`
  - `scikit-learn`

Install the required libraries with:

```
pip install tensorflow numpy librosa gradio scikit-learn
```

### Installation

1. **Clone the Repository**:

```
git clone https://github.com/your-username/speaker-gender-prediction.git
cd speaker-gender-prediction
```

2. **Add Pre-Trained Models and Label Encoders**:

Place the following files in the repository's root directory:
- `lstm_speaker_model.h5`: Pre-trained speaker recognition model.
- `lstm_gender_model.h5`: Pre-trained gender prediction model.
- `lstm_speaker_label.pkl`: Label encoder for speaker classes.
- `lstm_gender_label.pkl`: Label encoder for gender classes.

### Usage

Run the application using:

```
python app.py
```

### Gradio Interface

The Gradio interface allows you to:

- **Upload** an audio file or **record** audio directly.
- Predict the **top 3 speakers** and their probabilities.
- Determine the **gender** of the speaker.
- Detect and classify **unknown speakers** using confidence thresholds.

## Project Structure

```
.
β”œβ”€β”€ app.py                # Main application file
β”œβ”€β”€ models/lstm_speaker_model.h5   # Pre-trained speaker model (to be added)
β”œβ”€β”€ models/lstm_gender_model.h5    # Pre-trained gender model (to be added)
β”œβ”€β”€ models/lstm_speaker_label.pkl  # Speaker label encoder (to be added)
β”œβ”€β”€ models/lstm_gender_label.pkl   # Gender label encoder (to be added)
β”œβ”€β”€ requirements.txt        # Python dependencies
└── README.md               # Project documentation
```

## Example Output

### Top 3 Predicted Speakers:

```
The top 3 predicted speakers are:
Speaker 1: 85.23%
Speaker 2: 10.12%
Speaker 3: 4.65%

The predicted gender is: Male
```

### Unknown Speaker:

```
The top 3 predicted speakers are:
Unknown: 45.23%

The predicted gender is: Unknown
```

## How It Works

1. **Feature Extraction**:
   - Extracts **MFCCs**, **chroma features**, and **spectral contrast** from the input audio file using `librosa`.

2. **Speaker and Gender Models**:
   - **Speaker Model**: A pre-trained LSTM model classifies the speaker based on extracted features.
   - **Gender Model**: A separate LSTM model determines the gender.

3. **Unknown Detection**:
   - If the highest confidence score for a speaker is below a defined threshold, the speaker is classified as "Unknown."

## Roadmap

- Add support for real-time audio predictions.
- Improve unknown speaker detection using open-set recognition techniques.
- Expand the dataset for more robust gender classification.

## Contributing

Contributions are welcome! To contribute:

1. Fork the repository.
2. Create a feature branch (`git checkout -b feature-branch-name`).
3. Commit your changes (`git commit -m "Add new feature"`).
4. Push to the branch (`git push origin feature-branch-name`).
5. Open a Pull Request.

## License

This project is licensed under the **MIT License**. See the [LICENSE](LICENSE) file for details.




## Acknowledgments

- **TensorFlow**: For building the deep learning models.
- **Librosa**: For audio processing and feature extraction.
- **Gradio**: For creating the user interface.