Spaces:
Running
Running
- README.md +26 -90
- requirements.txt +7 -10
- src/classifier.py +0 -148
- src/lyric_generator.py +0 -144
README.md
CHANGED
@@ -1,109 +1,45 @@
|
|
1 |
---
|
2 |
-
title:
|
3 |
-
emoji:
|
4 |
-
colorFrom:
|
5 |
-
colorTo:
|
6 |
sdk: gradio
|
7 |
-
sdk_version:
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
license: mit
|
11 |
-
short_description: create this first space for getting familiar with space
|
12 |
---
|
13 |
|
14 |
-
|
15 |
|
16 |
-
|
17 |
-
|
18 |
-
A powerful web application that combines music genre classification with AI-powered lyric generation. This tool can analyze both audio files and text lyrics to determine the genre, then generate new lyrics in that style or transform existing lyrics into different genres.
|
19 |
|
20 |
## Features
|
21 |
|
22 |
-
-
|
23 |
-
|
24 |
-
|
25 |
-
-
|
26 |
-
- Accurate genre detection using state-of-the-art models
|
27 |
-
- Supports multiple popular music genres
|
28 |
-
- **Lyric Generation**:
|
29 |
-
- Genre-aware lyric generation
|
30 |
-
- Theme-based content creation
|
31 |
-
- Multiple generation options (temperature, length, versions)
|
32 |
-
- **Style Transfer**:
|
33 |
-
- Transform existing lyrics into different genres
|
34 |
-
- Preserve core message while adapting style
|
35 |
-
|
36 |
-
## Installation
|
37 |
-
|
38 |
-
1. Clone the repository:
|
39 |
-
```bash
|
40 |
-
git clone [your-repo-url]
|
41 |
-
cd music-genre-classifier-lyric-stylist
|
42 |
-
```
|
43 |
-
|
44 |
-
2. Create a virtual environment (recommended):
|
45 |
-
```bash
|
46 |
-
python -m venv venv
|
47 |
-
source venv/bin/activate # On Windows: venv\Scripts\activate
|
48 |
-
```
|
49 |
-
|
50 |
-
3. Install dependencies:
|
51 |
-
```bash
|
52 |
-
pip install -r requirements.txt
|
53 |
-
```
|
54 |
-
|
55 |
-
## Usage
|
56 |
-
|
57 |
-
1. Start the application:
|
58 |
-
```bash
|
59 |
-
python app.py
|
60 |
-
```
|
61 |
-
|
62 |
-
2. Open your web browser and navigate to the provided URL (typically http://localhost:7860)
|
63 |
-
|
64 |
-
3. Choose your input method:
|
65 |
-
- Upload an audio file (supports .mp3, .wav, .ogg, .flac)
|
66 |
-
- Enter lyrics text
|
67 |
-
|
68 |
-
4. Adjust generation parameters:
|
69 |
-
- Temperature (controls randomness)
|
70 |
-
- Maximum length
|
71 |
-
- Number of versions
|
72 |
-
|
73 |
-
5. Click "Detect Genre & Generate Lyrics" or use the "Style Transfer" tab for existing lyrics
|
74 |
-
|
75 |
-
## Models Used
|
76 |
-
|
77 |
-
- **Genre Classification**:
|
78 |
-
- Audio: `mit/ast-finetuned-audioset-10-10-0.4593` (MIT's Audio Spectrogram Transformer)
|
79 |
-
- Text: `facebook/bart-large-mnli` (Zero-shot classification)
|
80 |
-
- **Lyric Generation**: `gpt2-medium`
|
81 |
|
82 |
-
##
|
83 |
|
84 |
-
|
85 |
-
|
86 |
-
|
87 |
-
- Hip Hop
|
88 |
-
- Country
|
89 |
-
- Jazz
|
90 |
-
- Classical
|
91 |
-
- Electronic
|
92 |
-
- Blues
|
93 |
-
- Reggae
|
94 |
-
- Metal
|
95 |
|
96 |
-
##
|
97 |
|
98 |
-
|
99 |
|
100 |
-
##
|
101 |
|
102 |
-
|
|
|
|
|
|
|
103 |
|
104 |
-
##
|
105 |
|
106 |
-
- MIT
|
107 |
-
-
|
108 |
-
-
|
109 |
-
- The open-source community for various audio processing libraries
|
|
|
1 |
---
|
2 |
+
title: Music Classification with MIT AST
|
3 |
+
emoji: 🎵
|
4 |
+
colorFrom: blue
|
5 |
+
colorTo: purple
|
6 |
sdk: gradio
|
7 |
+
sdk_version: 4.12.0
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
license: mit
|
|
|
11 |
---
|
12 |
|
13 |
+
# Music Classification with MIT's AST Model 🎵
|
14 |
|
15 |
+
This Hugging Face Space demonstrates audio classification using MIT's Audio Spectrogram Transformer (AST) model. The model can identify various types of music, instruments, and sounds in audio files.
|
|
|
|
|
16 |
|
17 |
## Features
|
18 |
|
19 |
+
- Simple, user-friendly interface
|
20 |
+
- Support for multiple audio formats (WAV, MP3, OGG, FLAC)
|
21 |
+
- Top-5 predictions with confidence scores
|
22 |
+
- Real-time processing
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
|
24 |
+
## How to Use
|
25 |
|
26 |
+
1. Click the "Upload Music File" button or drag and drop an audio file
|
27 |
+
2. Wait a few seconds for the model to process the audio
|
28 |
+
3. View the classification results with confidence scores
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
|
30 |
+
## Model Details
|
31 |
|
32 |
+
This app uses the `MIT/ast-finetuned-audioset-10-10-0.4593` model, which is trained on AudioSet and can recognize a wide variety of sounds and music styles. The model converts audio into spectrograms and uses a transformer architecture to classify the audio content.
|
33 |
|
34 |
+
## Technical Notes
|
35 |
|
36 |
+
- The model processes audio at 16kHz
|
37 |
+
- Results show top 5 predictions with confidence scores
|
38 |
+
- Processing is done on Hugging Face's infrastructure
|
39 |
+
- No local installation required
|
40 |
|
41 |
+
## Credits
|
42 |
|
43 |
+
- Model: [MIT AST](https://huggingface.co/MIT/ast-finetuned-audioset-10-10-0.4593)
|
44 |
+
- Interface: Gradio
|
45 |
+
- Deployment: Hugging Face Spaces
|
|
requirements.txt
CHANGED
@@ -1,10 +1,7 @@
|
|
1 |
-
gradio
|
2 |
-
transformers
|
3 |
-
torch
|
4 |
-
torchaudio
|
5 |
-
numpy
|
6 |
-
|
7 |
-
|
8 |
-
librosa==0.10.1
|
9 |
-
python-dotenv==1.0.0
|
10 |
-
accelerate
|
|
|
1 |
+
gradio>=4.12.0
|
2 |
+
transformers>=4.36.2
|
3 |
+
torch>=2.1.2
|
4 |
+
torchaudio>=2.1.2
|
5 |
+
numpy>=1.26.2
|
6 |
+
accelerate>=0.25.0
|
7 |
+
librosa>=0.10.1
|
|
|
|
|
|
src/classifier.py
DELETED
@@ -1,148 +0,0 @@
|
|
1 |
-
import torch
|
2 |
-
import torchaudio
|
3 |
-
import librosa
|
4 |
-
import numpy as np
|
5 |
-
from transformers import pipeline
|
6 |
-
from typing import Union, Tuple, List
|
7 |
-
|
8 |
-
class MusicGenreClassifier:
|
9 |
-
def __init__(self):
|
10 |
-
try:
|
11 |
-
# Initialize both audio and text classification pipelines with auto device mapping
|
12 |
-
self.text_classifier = pipeline(
|
13 |
-
"zero-shot-classification",
|
14 |
-
model="facebook/bart-large-mnli",
|
15 |
-
device="cpu"
|
16 |
-
)
|
17 |
-
|
18 |
-
# For audio classification, we'll use MIT's music classification model
|
19 |
-
self.audio_classifier = pipeline(
|
20 |
-
"audio-classification",
|
21 |
-
model="mit/ast-finetuned-audioset-10-10-0.4593",
|
22 |
-
device="cpu"
|
23 |
-
)
|
24 |
-
except Exception as e:
|
25 |
-
print(f"Warning: GPU initialization failed, falling back to CPU. Error: {str(e)}")
|
26 |
-
# Fall back to CPU if GPU initialization fails
|
27 |
-
self.text_classifier = pipeline(
|
28 |
-
"zero-shot-classification",
|
29 |
-
model="facebook/bart-large-mnli",
|
30 |
-
device="cpu"
|
31 |
-
)
|
32 |
-
|
33 |
-
self.audio_classifier = pipeline(
|
34 |
-
"audio-classification",
|
35 |
-
model="mit/ast-finetuned-audioset-10-10-0.4593",
|
36 |
-
device="cpu"
|
37 |
-
)
|
38 |
-
|
39 |
-
# Define standard genres for classification
|
40 |
-
self.genres = [
|
41 |
-
"rock", "pop", "hip hop", "country", "jazz",
|
42 |
-
"classical", "electronic", "blues", "reggae", "metal"
|
43 |
-
]
|
44 |
-
|
45 |
-
# Mapping from model output labels to our standard genres
|
46 |
-
self.label_mapping = {
|
47 |
-
"Music": "pop", # Default mapping
|
48 |
-
"Rock music": "rock",
|
49 |
-
"Pop music": "pop",
|
50 |
-
"Hip hop music": "hip hop",
|
51 |
-
"Country": "country",
|
52 |
-
"Jazz": "jazz",
|
53 |
-
"Classical music": "classical",
|
54 |
-
"Electronic music": "electronic",
|
55 |
-
"Blues": "blues",
|
56 |
-
"Reggae": "reggae",
|
57 |
-
"Heavy metal": "metal"
|
58 |
-
}
|
59 |
-
|
60 |
-
def process_audio(self, audio_path: str) -> torch.Tensor:
|
61 |
-
"""Process audio file to match model requirements."""
|
62 |
-
try:
|
63 |
-
# Load audio using librosa (handles more formats)
|
64 |
-
waveform, sample_rate = librosa.load(audio_path, sr=16000)
|
65 |
-
# Convert to torch tensor and ensure proper shape
|
66 |
-
waveform = torch.from_numpy(waveform).float()
|
67 |
-
if len(waveform.shape) == 1:
|
68 |
-
waveform = waveform.unsqueeze(0)
|
69 |
-
return waveform
|
70 |
-
except Exception as e:
|
71 |
-
raise ValueError(f"Error processing audio file: {str(e)}")
|
72 |
-
|
73 |
-
def map_label_to_genre(self, label: str) -> str:
|
74 |
-
"""Map model output label to standard genre."""
|
75 |
-
return self.label_mapping.get(label, "pop") # Default to pop if unknown
|
76 |
-
|
77 |
-
def classify_audio(self, audio_path: str) -> Tuple[str, float]:
|
78 |
-
"""Classify genre from audio file."""
|
79 |
-
try:
|
80 |
-
waveform = self.process_audio(audio_path)
|
81 |
-
predictions = self.audio_classifier(waveform, top_k=3)
|
82 |
-
|
83 |
-
# Process predictions
|
84 |
-
if isinstance(predictions, list):
|
85 |
-
predictions = predictions[0]
|
86 |
-
|
87 |
-
# Find the highest scoring music-related prediction
|
88 |
-
music_preds = [
|
89 |
-
(self.map_label_to_genre(p['label']), p['score'])
|
90 |
-
for p in predictions
|
91 |
-
if p['label'] in self.label_mapping
|
92 |
-
]
|
93 |
-
|
94 |
-
if not music_preds:
|
95 |
-
# If no music genres found, return default
|
96 |
-
return "pop", 0.5
|
97 |
-
|
98 |
-
# Get the highest scoring genre
|
99 |
-
genre, score = max(music_preds, key=lambda x: x[1])
|
100 |
-
return genre, score
|
101 |
-
|
102 |
-
except Exception as e:
|
103 |
-
raise ValueError(f"Audio classification failed: {str(e)}")
|
104 |
-
|
105 |
-
def classify_text(self, lyrics: str) -> Tuple[str, float]:
|
106 |
-
"""Classify genre from lyrics text."""
|
107 |
-
try:
|
108 |
-
# Prepare the hypothesis template for zero-shot classification
|
109 |
-
hypothesis_template = "This text contains {} music lyrics."
|
110 |
-
|
111 |
-
result = self.text_classifier(
|
112 |
-
lyrics,
|
113 |
-
candidate_labels=self.genres,
|
114 |
-
hypothesis_template=hypothesis_template
|
115 |
-
)
|
116 |
-
|
117 |
-
return result['labels'][0], result['scores'][0]
|
118 |
-
except Exception as e:
|
119 |
-
raise ValueError(f"Text classification failed: {str(e)}")
|
120 |
-
|
121 |
-
def predict(self, input_data: str, input_type: str = None) -> dict:
|
122 |
-
"""
|
123 |
-
Main prediction method that handles both audio and text inputs.
|
124 |
-
|
125 |
-
Args:
|
126 |
-
input_data: Path to audio file or lyrics text
|
127 |
-
input_type: Optional, 'audio' or 'text'. If None, will try to auto-detect
|
128 |
-
|
129 |
-
Returns:
|
130 |
-
dict containing predicted genre and confidence score
|
131 |
-
"""
|
132 |
-
# Try to auto-detect input type if not specified
|
133 |
-
if input_type is None:
|
134 |
-
input_type = 'audio' if input_data.lower().endswith(('.mp3', '.wav', '.ogg', '.flac')) else 'text'
|
135 |
-
|
136 |
-
try:
|
137 |
-
if input_type == 'audio':
|
138 |
-
genre, confidence = self.classify_audio(input_data)
|
139 |
-
else:
|
140 |
-
genre, confidence = self.classify_text(input_data)
|
141 |
-
|
142 |
-
return {
|
143 |
-
'genre': genre,
|
144 |
-
'confidence': float(confidence),
|
145 |
-
'input_type': input_type
|
146 |
-
}
|
147 |
-
except Exception as e:
|
148 |
-
raise ValueError(f"Prediction failed: {str(e)}")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
src/lyric_generator.py
DELETED
@@ -1,144 +0,0 @@
|
|
1 |
-
from transformers import pipeline
|
2 |
-
import torch
|
3 |
-
from typing import Dict, List, Optional
|
4 |
-
|
5 |
-
class LyricGenerator:
|
6 |
-
def __init__(self, model_name: str = "gpt2-medium"):
|
7 |
-
"""
|
8 |
-
Initialize the lyric generator with a specified language model.
|
9 |
-
|
10 |
-
Args:
|
11 |
-
model_name: The name of the pre-trained model to use
|
12 |
-
"""
|
13 |
-
try:
|
14 |
-
# Try to use CUDA if available
|
15 |
-
if torch.cuda.is_available():
|
16 |
-
device = "cuda"
|
17 |
-
else:
|
18 |
-
device = "cpu"
|
19 |
-
|
20 |
-
self.generator = pipeline(
|
21 |
-
"text-generation",
|
22 |
-
model=model_name,
|
23 |
-
device_map="auto" # Let transformers handle device mapping
|
24 |
-
)
|
25 |
-
except Exception as e:
|
26 |
-
print(f"Warning: GPU initialization failed, falling back to CPU. Error: {str(e)}")
|
27 |
-
self.generator = pipeline(
|
28 |
-
"text-generation",
|
29 |
-
model=model_name,
|
30 |
-
device="cpu"
|
31 |
-
)
|
32 |
-
|
33 |
-
# Genre-specific prompts to guide generation
|
34 |
-
self.genre_prompts = {
|
35 |
-
"rock": "Write energetic rock lyrics about",
|
36 |
-
"pop": "Create catchy pop lyrics about",
|
37 |
-
"hip hop": "Write hip hop verses about",
|
38 |
-
"country": "Write country music lyrics about",
|
39 |
-
"jazz": "Compose smooth jazz lyrics about",
|
40 |
-
"classical": "Write classical music lyrics about",
|
41 |
-
"electronic": "Create electronic dance music lyrics about",
|
42 |
-
"blues": "Write soulful blues lyrics about",
|
43 |
-
"reggae": "Write laid-back reggae lyrics about",
|
44 |
-
"metal": "Write intense metal lyrics about"
|
45 |
-
}
|
46 |
-
|
47 |
-
def generate_lyrics(
|
48 |
-
self,
|
49 |
-
genre: str,
|
50 |
-
theme: str,
|
51 |
-
max_length: int = 200,
|
52 |
-
num_return_sequences: int = 1,
|
53 |
-
temperature: float = 0.9,
|
54 |
-
top_p: float = 0.9,
|
55 |
-
top_k: int = 50
|
56 |
-
) -> List[str]:
|
57 |
-
"""
|
58 |
-
Generate lyrics based on genre and theme.
|
59 |
-
|
60 |
-
Args:
|
61 |
-
genre: The music genre to generate lyrics for
|
62 |
-
theme: The theme or topic for the lyrics
|
63 |
-
max_length: Maximum length of generated text
|
64 |
-
num_return_sequences: Number of different lyrics to generate
|
65 |
-
temperature: Controls randomness (higher = more random)
|
66 |
-
top_p: Nucleus sampling parameter
|
67 |
-
top_k: Top-k sampling parameter
|
68 |
-
|
69 |
-
Returns:
|
70 |
-
List of generated lyrics
|
71 |
-
"""
|
72 |
-
try:
|
73 |
-
# Get genre-specific prompt or use default
|
74 |
-
genre = genre.lower()
|
75 |
-
base_prompt = self.genre_prompts.get(
|
76 |
-
genre,
|
77 |
-
"Write song lyrics about"
|
78 |
-
)
|
79 |
-
|
80 |
-
# Construct full prompt
|
81 |
-
prompt = f"{base_prompt} {theme}:\n\n"
|
82 |
-
|
83 |
-
# Generate lyrics
|
84 |
-
outputs = self.generator(
|
85 |
-
prompt,
|
86 |
-
max_length=max_length,
|
87 |
-
num_return_sequences=num_return_sequences,
|
88 |
-
temperature=temperature,
|
89 |
-
top_p=top_p,
|
90 |
-
top_k=top_k,
|
91 |
-
do_sample=True,
|
92 |
-
pad_token_id=50256 # GPT-2's pad token ID
|
93 |
-
)
|
94 |
-
|
95 |
-
# Process and clean up the generated texts
|
96 |
-
generated_lyrics = []
|
97 |
-
for output in outputs:
|
98 |
-
# Remove the prompt from the generated text
|
99 |
-
lyrics = output['generated_text'][len(prompt):].strip()
|
100 |
-
# Basic cleanup
|
101 |
-
lyrics = lyrics.replace('<|endoftext|>', '').strip()
|
102 |
-
generated_lyrics.append(lyrics)
|
103 |
-
|
104 |
-
return generated_lyrics
|
105 |
-
|
106 |
-
except Exception as e:
|
107 |
-
raise ValueError(f"Lyric generation failed: {str(e)}")
|
108 |
-
|
109 |
-
def style_transfer(
|
110 |
-
self,
|
111 |
-
original_lyrics: str,
|
112 |
-
target_genre: str,
|
113 |
-
temperature: float = 0.9
|
114 |
-
) -> str:
|
115 |
-
"""
|
116 |
-
Attempt to transfer the style of existing lyrics to a target genre.
|
117 |
-
|
118 |
-
Args:
|
119 |
-
original_lyrics: The original lyrics to restyle
|
120 |
-
target_genre: The target genre for the style transfer
|
121 |
-
temperature: Controls randomness of generation
|
122 |
-
|
123 |
-
Returns:
|
124 |
-
Restyled lyrics in the target genre
|
125 |
-
"""
|
126 |
-
try:
|
127 |
-
prompt = f"Rewrite these lyrics in {target_genre} style:\n\n{original_lyrics}\n\nNew version:\n"
|
128 |
-
|
129 |
-
output = self.generator(
|
130 |
-
prompt,
|
131 |
-
max_length=len(prompt) + 200,
|
132 |
-
temperature=temperature,
|
133 |
-
top_p=0.9,
|
134 |
-
do_sample=True,
|
135 |
-
num_return_sequences=1
|
136 |
-
)[0]
|
137 |
-
|
138 |
-
# Extract the new version only
|
139 |
-
generated_text = output['generated_text']
|
140 |
-
new_lyrics = generated_text.split("New version:\n")[-1].strip()
|
141 |
-
return new_lyrics
|
142 |
-
|
143 |
-
except Exception as e:
|
144 |
-
raise ValueError(f"Style transfer failed: {str(e)}")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|