Prince53 commited on
Commit
50c8eb2
·
verified ·
1 Parent(s): ef279f9

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +102 -0
README.md ADDED
@@ -0,0 +1,102 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - audio-classification
5
+ - deep-speech-detection
6
+ - tensorflow
7
+ - keras
8
+ ---
9
+ # Model Card for Deep Speech Detection
10
+
11
+ ## Model Description
12
+ This is a TensorFlow/Keras CNN model trained to detect deepfake or synthetic speech with >95% accuracy. It uses audio features (MFCCs, chroma, spectral centroid, etc.) extracted with `librosa`.
13
+
14
+ ## Intended Use
15
+ - Deepfake speech detection
16
+ - Audio authenticity verification
17
+
18
+ ## Dependencies
19
+ ```bash
20
+ pip install tensorflow==2.10.0 librosa==0.10.1 joblib==1.3.2 numpy==1.22.4 pandas==1.5.3 scikit-learn==1.2.2
21
+ ```
22
+
23
+ ## Usage
24
+ ```python
25
+ import tensorflow as tf
26
+ import librosa
27
+ import joblib
28
+ import numpy as np
29
+ import pandas as pd
30
+ from huggingface_hub import hf_hub_download, HfApi
31
+ import os
32
+
33
+ # Download model and files
34
+ repo_name = "Prince53/deep-speech-detection"
35
+ model_dir = "downloaded_model"
36
+ scaler_path = hf_hub_download(repo_name, "scaler.pkl", local_dir=model_dir)
37
+ label_encoder_path = hf_hub_download(repo_name, "label_encoder.pkl", local_dir=model_dir)
38
+ api = HfApi()
39
+ api.snapshot_download(repo_name, local_dir=model_dir, allow_patterns="saved_model/*")
40
+
41
+ # Load model and preprocessing objects
42
+ model = tf.keras.models.load_model(os.path.join(model_dir, "saved_model"))
43
+ scaler = joblib.load(scaler_path)
44
+ label_encoder = joblib.load(label_encoder_path)
45
+
46
+ # Feature extraction function
47
+ def segment_and_extract_features(audio, sr=16000):
48
+ segment_samples = int(2.0 * sr)
49
+ step_samples = int(0.25 * sr)
50
+ segments = [audio[i:i+segment_samples] for i in range(0, len(audio) - segment_samples + 1, step_samples)]
51
+ features = []
52
+ for segment in segments:
53
+ if len(segment) < segment_samples:
54
+ continue
55
+ mfccs = librosa.feature.mfcc(y=segment, sr=sr, n_mfcc=13)
56
+ chroma = librosa.feature.chroma_stft(y=segment, sr=sr)
57
+ spectral_centroid = librosa.feature.spectral_centroid(y=segment, sr=sr)
58
+ spectral_bandwidth = librosa.feature.spectral_bandwidth(y=segment, sr=sr)
59
+ rolloff = librosa.feature.spectral_rolloff(y=segment, sr=sr)
60
+ zero_crossing_rate = librosa.feature.zero_crossing_rate(y=segment)
61
+ feature_dict = {
62
+ 'mfcc_mean': np.mean(mfccs, axis=1),
63
+ 'mfcc_std': np.std(mfccs, axis=1),
64
+ 'chroma': np.mean(chroma, axis=1),
65
+ 'spectral_centroid': np.mean(spectral_centroid),
66
+ 'spectral_bandwidth': np.mean(spectral_bandwidth),
67
+ 'rolloff': np.mean(rolloff),
68
+ 'zero_crossing_rate': np.mean(zero_crossing_rate)
69
+ }
70
+ features.append(feature_dict)
71
+ return features
72
+
73
+ # Classify audio
74
+ audio, sr = librosa.load("path/to/audio.wav", sr=16000)
75
+ segments = segment_and_extract_features(audio, sr)
76
+ segment_features = pd.concat([
77
+ pd.DataFrame([seg['mfcc_mean'] for seg in segments]),
78
+ pd.DataFrame([seg['mfcc_std'] for seg in segments]),
79
+ pd.DataFrame([seg['chroma'] for seg in segments]),
80
+ pd.DataFrame([[seg['spectral_centroid'], seg['spectral_bandwidth'], seg['rolloff'], seg['zero_crossing_rate']] for seg in segments])
81
+ ], axis=1)
82
+ segment_features = scaler.transform(segment_features)
83
+ segment_features = segment_features.reshape(segment_features.shape[0], segment_features.shape[1], 1)
84
+ predictions = model.predict(segment_features)
85
+ segment_labels = np.argmax(predictions, axis=1)
86
+ confidence_scores = np.mean(predictions, axis=0)
87
+ final_label = label_encoder.inverse_transform([np.argmax(np.bincount(segment_labels))])[0]
88
+ print(f"Confidence Scores: Real={confidence_scores[0]:.4f}, Fake={confidence_scores[1]:.4f}")
89
+ print(f"Classification: {final_label} ({0 if final_label == 'Real' else 1})")
90
+ ```
91
+
92
+ ## Limitations
93
+ - Requires mono audio at 16kHz sampling rate.
94
+ - May struggle with low-quality audio or unseen domains.
95
+ - Trained on the Comb4 dataset.
96
+
97
+ ## Training Data
98
+ - Dataset: Comb4 (custom dataset with real and fake audio)
99
+ - Size: [Update with number of samples]
100
+
101
+ ## Evaluation
102
+ - Test Accuracy: [Update with >95%]