forwarder1121 commited on
Commit
77489cf
·
verified ·
1 Parent(s): 3091409

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +117 -0
README.md ADDED
@@ -0,0 +1,117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+
3
+ # **AST Fine-Tuned Model for Emotion Classification**
4
+
5
+ This is a fine-tuned Audio Spectrogram Transformer (AST) model, specifically designed for classifying emotions in speech audio. The model was fine-tuned on the **CREMA-D dataset**, focusing on six emotional categories. The base model was sourced from **MIT's pre-trained AST model**.
6
+
7
+ ---
8
+
9
+ ## **Model Details**
10
+ - **Base Model**: `MIT/ast-finetuned-audioset-10-10-0.4593`
11
+ - **Fine-Tuned Dataset**: CREMA-D
12
+ - **Architecture**: Audio Spectrogram Transformer (AST)
13
+ - **Model Type**: Single-label classification
14
+ - **Input Features**: Log-Mel Spectrograms (128 mel bins)
15
+ - **Output Classes**:
16
+ - **ANG**: Anger
17
+ - **DIS**: Disgust
18
+ - **FEA**: Fear
19
+ - **HAP**: Happiness
20
+ - **NEU**: Neutral
21
+ - **SAD**: Sadness
22
+
23
+ ---
24
+
25
+ ## **Model Configuration**
26
+ - **Hidden Size**: 768
27
+ - **Number of Attention Heads**: 12
28
+ - **Number of Hidden Layers**: 12
29
+ - **Patch Size**: 16
30
+ - **Maximum Length**: 1024
31
+ - **Dropout Probability**: 0.0
32
+ - **Activation Function**: GELU (Gaussian Error Linear Unit)
33
+ - **Optimizer**: Adam
34
+ - **Learning Rate**: 1e-4
35
+
36
+ ---
37
+
38
+ ## **Training Details**
39
+ - **Dataset**: CREMA-D (Emotion-Labeled Speech Data)
40
+ - **Data Augmentation**:
41
+ - Noise injection
42
+ - Time shifting
43
+ - Speed perturbation
44
+ - **Fine-Tuning Epochs**: 5
45
+ - **Batch Size**: 16
46
+ - **Learning Rate Scheduler**: Linear decay
47
+ - **Best Validation Accuracy**: 60.71%
48
+ - **Best Checkpoint**: `./results/checkpoint-1119`
49
+
50
+ ---
51
+
52
+ ## **How to Use**
53
+
54
+ ### **Load the Model**
55
+ ```python
56
+ from transformers import AutoModelForAudioClassification, AutoProcessor
57
+
58
+ # Load the model and processor
59
+ model = AutoModelForAudioClassification.from_pretrained("forwarder1121/ast-finetuned-model")
60
+ processor = AutoProcessor.from_pretrained("forwarder1121/ast-finetuned-model")
61
+
62
+ # Prepare input audio (e.g., waveform) as log-mel spectrogram
63
+ inputs = processor("path_to_audio.wav", sampling_rate=16000, return_tensors="pt")
64
+
65
+ # Make predictions
66
+ outputs = model(**inputs)
67
+ predicted_class = outputs.logits.argmax(-1).item()
68
+
69
+ print(f"Predicted emotion: {model.config.id2label[str(predicted_class)]}")
70
+ ```
71
+
72
+ ---
73
+
74
+ ## **Metrics**
75
+
76
+ ### **Validation Results**
77
+ - **Best Validation Accuracy**: 60.71%
78
+ - **Validation Loss**: 1.1126
79
+
80
+ ### **Evaluation Details**
81
+ - **Eval Dataset**: CREMA-D test split
82
+ - **Batch Size**: 16
83
+ - **Number of Steps**: 94
84
+
85
+ ---
86
+
87
+ ## **Limitations**
88
+ - The model was trained on CREMA-D, which has a specific set of speech data. It may not generalize well to datasets with different accents, speech styles, or languages.
89
+ - Validation accuracy is 60.71%, indicating room for improvement for real-world deployment.
90
+
91
+ ---
92
+
93
+ ## **Acknowledgments**
94
+ This work is based on the **Audio Spectrogram Transformer (AST)** model by MIT, fine-tuned for emotion classification. Special thanks to the developers of Hugging Face and the CREMA-D dataset contributors.
95
+
96
+ ---
97
+
98
+ ## **License**
99
+ The model is shared under the MIT License. Refer to the licensing details in the repository.
100
+
101
+ ---
102
+
103
+ ## **Citation**
104
+ If you use this model in your work, please cite:
105
+ ```
106
+ @misc{ast-finetuned-model,
107
+ author = {forwarder1121},
108
+ title = {Fine-Tuned Audio Spectrogram Transformer for Emotion Classification},
109
+ year = {2024},
110
+ url = {https://huggingface.co/forwarder1121/ast-finetuned-model},
111
+ }
112
+ ```
113
+
114
+ ---
115
+
116
+ ## **Contact**
117
+ For questions, reach out to `[email protected]`.