|
--- |
|
license: mit |
|
--- |
|
# Audio Feature Extraction Models |
|
|
|
This repository contains pre-trained models for audio feature extraction, specifically: |
|
|
|
- **Key Detection:** Classifies the musical key of an audio track into relative key classes. |
|
|
|
## Model Details |
|
|
|
### Tempo Model |
|
- **Model Type:** Custom CNN architecture for tempo classification. |
|
- **Input:** Audio segments converted to Mel spectrograms followed by autocorrelation. |
|
- **Output:** Predicts Beats Per Minute (BPM) in a range from [85, 170]. |
|
|
|
### Key Detection Models |
|
- **Key Class Model:** Classifies into 12 relative key classes. |
|
- **Key Quality Model:** Determines if the key is Major or Minor. |
|
- **Input:** Audio segments converted to Mel spectrograms. |
|
- **Output:** |
|
- Key Class: One of 12 key signatures. |
|
- Key Quality: Binary classification (0 for Major, 1 for Minor). |
|
|
|
## Usage |
|
|
|
### Prerequisites |
|
- Python 3.7+ |
|
- PyTorch |
|
- torchaudio |
|
- transformers |
|
|
|
### Loading Models |
|
|
|
To use these models with Hugging Face's transformers library: |
|
|
|
```python |
|
from transformers import [AutoModelForAudioClassification](https://x.com/i/grok?text=AutoModelForAudioClassification) |
|
|
|
# Load Tempo Model |
|
tempo_model = AutoModelForAudioClassification.from_pretrained("your_username/tempo_model") |
|
|
|
# Load Key Models |
|
key_class_model = AutoModelForAudioClassification.from_pretrained("your_username/key_class_model") |
|
key_quality_model = AutoModelForAudioClassification.from_pretrained("your_username/key_quality_model") |