File size: 3,750 Bytes
366d0ce bc06669 366d0ce bc06669 366d0ce bc06669 366d0ce bc06669 366d0ce 77489cf |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 |
---
language:
- en
datasets:
- CREMA-D
library_name: transformers
tags:
- emotion-classification
- audio-classification
- audio-spectrogram
- transformer
- fine-tuned
license: apache-2.0
pipeline_tag: audio-classification
base_model: "MIT/ast-finetuned-audioset-10-10-0.4593"
metrics:
- accuracy
- f1
task_categories:
- audio-classification
---
# AST Fine-Tuned Model for Emotion Classification
# **AST Fine-Tuned Model for Emotion Classification**
This is a fine-tuned Audio Spectrogram Transformer (AST) model, specifically designed for classifying emotions in speech audio. The model was fine-tuned on the **CREMA-D dataset**, focusing on six emotional categories. The base model was sourced from **MIT's pre-trained AST model**.
---
## **Model Details**
- **Base Model**: `MIT/ast-finetuned-audioset-10-10-0.4593`
- **Fine-Tuned Dataset**: CREMA-D
- **Architecture**: Audio Spectrogram Transformer (AST)
- **Model Type**: Single-label classification
- **Input Features**: Log-Mel Spectrograms (128 mel bins)
- **Output Classes**:
- **ANG**: Anger
- **DIS**: Disgust
- **FEA**: Fear
- **HAP**: Happiness
- **NEU**: Neutral
- **SAD**: Sadness
---
## **Model Configuration**
- **Hidden Size**: 768
- **Number of Attention Heads**: 12
- **Number of Hidden Layers**: 12
- **Patch Size**: 16
- **Maximum Length**: 1024
- **Dropout Probability**: 0.0
- **Activation Function**: GELU (Gaussian Error Linear Unit)
- **Optimizer**: Adam
- **Learning Rate**: 1e-4
---
## **Training Details**
- **Dataset**: CREMA-D (Emotion-Labeled Speech Data)
- **Data Augmentation**:
- Noise injection
- Time shifting
- Speed perturbation
- **Fine-Tuning Epochs**: 5
- **Batch Size**: 16
- **Learning Rate Scheduler**: Linear decay
- **Best Validation Accuracy**: 60.71%
- **Best Checkpoint**: `./results/checkpoint-1119`
---
## **How to Use**
### **Load the Model**
```python
from transformers import AutoModelForAudioClassification, AutoProcessor
# Load the model and processor
model = AutoModelForAudioClassification.from_pretrained("forwarder1121/ast-finetuned-model")
processor = AutoProcessor.from_pretrained("forwarder1121/ast-finetuned-model")
# Prepare input audio (e.g., waveform) as log-mel spectrogram
inputs = processor("path_to_audio.wav", sampling_rate=16000, return_tensors="pt")
# Make predictions
outputs = model(**inputs)
predicted_class = outputs.logits.argmax(-1).item()
print(f"Predicted emotion: {model.config.id2label[str(predicted_class)]}")
```
---
## **Metrics**
### **Validation Results**
- **Best Validation Accuracy**: 60.71%
- **Validation Loss**: 1.1126
### **Evaluation Details**
- **Eval Dataset**: CREMA-D test split
- **Batch Size**: 16
- **Number of Steps**: 94
---
## **Limitations**
- The model was trained on CREMA-D, which has a specific set of speech data. It may not generalize well to datasets with different accents, speech styles, or languages.
- Validation accuracy is 60.71%, indicating room for improvement for real-world deployment.
---
## **Acknowledgments**
This work is based on the **Audio Spectrogram Transformer (AST)** model by MIT, fine-tuned for emotion classification. Special thanks to the developers of Hugging Face and the CREMA-D dataset contributors.
---
## **License**
The model is shared under the MIT License. Refer to the licensing details in the repository.
---
## **Citation**
If you use this model in your work, please cite:
```
@misc{ast-finetuned-model,
author = {forwarder1121},
title = {Fine-Tuned Audio Spectrogram Transformer for Emotion Classification},
year = {2024},
url = {https://huggingface.co/forwarder1121/ast-finetuned-model},
}
```
---
## **Contact**
For questions, reach out to `[email protected]`. |