jacob-c commited on
Commit
167211e
·
1 Parent(s): 8dee368
Files changed (4) hide show
  1. README.md +26 -90
  2. requirements.txt +7 -10
  3. src/classifier.py +0 -148
  4. src/lyric_generator.py +0 -144
README.md CHANGED
@@ -1,109 +1,45 @@
1
  ---
2
- title: Fyp Start Space
3
- emoji: 🏆
4
- colorFrom: purple
5
- colorTo: green
6
  sdk: gradio
7
- sdk_version: 5.5.0
8
  app_file: app.py
9
  pinned: false
10
  license: mit
11
- short_description: create this first space for getting familiar with space
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
15
 
16
- # Music Genre Classifier + Lyric Stylist 🎵
17
-
18
- A powerful web application that combines music genre classification with AI-powered lyric generation. This tool can analyze both audio files and text lyrics to determine the genre, then generate new lyrics in that style or transform existing lyrics into different genres.
19
 
20
  ## Features
21
 
22
- - **Dual Input Support**:
23
- - Audio file analysis for genre detection
24
- - Text-based lyrics analysis
25
- - **Genre Classification**:
26
- - Accurate genre detection using state-of-the-art models
27
- - Supports multiple popular music genres
28
- - **Lyric Generation**:
29
- - Genre-aware lyric generation
30
- - Theme-based content creation
31
- - Multiple generation options (temperature, length, versions)
32
- - **Style Transfer**:
33
- - Transform existing lyrics into different genres
34
- - Preserve core message while adapting style
35
-
36
- ## Installation
37
-
38
- 1. Clone the repository:
39
- ```bash
40
- git clone [your-repo-url]
41
- cd music-genre-classifier-lyric-stylist
42
- ```
43
-
44
- 2. Create a virtual environment (recommended):
45
- ```bash
46
- python -m venv venv
47
- source venv/bin/activate # On Windows: venv\Scripts\activate
48
- ```
49
-
50
- 3. Install dependencies:
51
- ```bash
52
- pip install -r requirements.txt
53
- ```
54
-
55
- ## Usage
56
-
57
- 1. Start the application:
58
- ```bash
59
- python app.py
60
- ```
61
-
62
- 2. Open your web browser and navigate to the provided URL (typically http://localhost:7860)
63
-
64
- 3. Choose your input method:
65
- - Upload an audio file (supports .mp3, .wav, .ogg, .flac)
66
- - Enter lyrics text
67
-
68
- 4. Adjust generation parameters:
69
- - Temperature (controls randomness)
70
- - Maximum length
71
- - Number of versions
72
-
73
- 5. Click "Detect Genre & Generate Lyrics" or use the "Style Transfer" tab for existing lyrics
74
-
75
- ## Models Used
76
-
77
- - **Genre Classification**:
78
- - Audio: `mit/ast-finetuned-audioset-10-10-0.4593` (MIT's Audio Spectrogram Transformer)
79
- - Text: `facebook/bart-large-mnli` (Zero-shot classification)
80
- - **Lyric Generation**: `gpt2-medium`
81
 
82
- ## Supported Genres
83
 
84
- The system supports classification and generation for the following genres:
85
- - Rock
86
- - Pop
87
- - Hip Hop
88
- - Country
89
- - Jazz
90
- - Classical
91
- - Electronic
92
- - Blues
93
- - Reggae
94
- - Metal
95
 
96
- ## Contributing
97
 
98
- Contributions are welcome! Please feel free to submit a Pull Request.
99
 
100
- ## License
101
 
102
- This project is licensed under the MIT License - see the LICENSE file for details.
 
 
 
103
 
104
- ## Acknowledgments
105
 
106
- - MIT for the Audio Spectrogram Transformer model
107
- - Hugging Face for providing the pre-trained models
108
- - Gradio for the web interface framework
109
- - The open-source community for various audio processing libraries
 
1
  ---
2
+ title: Music Classification with MIT AST
3
+ emoji: 🎵
4
+ colorFrom: blue
5
+ colorTo: purple
6
  sdk: gradio
7
+ sdk_version: 4.12.0
8
  app_file: app.py
9
  pinned: false
10
  license: mit
 
11
  ---
12
 
13
+ # Music Classification with MIT's AST Model 🎵
14
 
15
+ This Hugging Face Space demonstrates audio classification using MIT's Audio Spectrogram Transformer (AST) model. The model can identify various types of music, instruments, and sounds in audio files.
 
 
16
 
17
  ## Features
18
 
19
+ - Simple, user-friendly interface
20
+ - Support for multiple audio formats (WAV, MP3, OGG, FLAC)
21
+ - Top-5 predictions with confidence scores
22
+ - Real-time processing
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
+ ## How to Use
25
 
26
+ 1. Click the "Upload Music File" button or drag and drop an audio file
27
+ 2. Wait a few seconds for the model to process the audio
28
+ 3. View the classification results with confidence scores
 
 
 
 
 
 
 
 
29
 
30
+ ## Model Details
31
 
32
+ This app uses the `MIT/ast-finetuned-audioset-10-10-0.4593` model, which is trained on AudioSet and can recognize a wide variety of sounds and music styles. The model converts audio into spectrograms and uses a transformer architecture to classify the audio content.
33
 
34
+ ## Technical Notes
35
 
36
+ - The model processes audio at 16kHz
37
+ - Results show top 5 predictions with confidence scores
38
+ - Processing is done on Hugging Face's infrastructure
39
+ - No local installation required
40
 
41
+ ## Credits
42
 
43
+ - Model: [MIT AST](https://huggingface.co/MIT/ast-finetuned-audioset-10-10-0.4593)
44
+ - Interface: Gradio
45
+ - Deployment: Hugging Face Spaces
 
requirements.txt CHANGED
@@ -1,10 +1,7 @@
1
- gradio==4.12.0
2
- transformers==4.36.2
3
- torch==2.1.2
4
- torchaudio==2.1.2
5
- numpy==1.26.2
6
- datasets==2.15.0
7
- soundfile==0.12.1
8
- librosa==0.10.1
9
- python-dotenv==1.0.0
10
- accelerate
 
1
+ gradio>=4.12.0
2
+ transformers>=4.36.2
3
+ torch>=2.1.2
4
+ torchaudio>=2.1.2
5
+ numpy>=1.26.2
6
+ accelerate>=0.25.0
7
+ librosa>=0.10.1
 
 
 
src/classifier.py DELETED
@@ -1,148 +0,0 @@
1
- import torch
2
- import torchaudio
3
- import librosa
4
- import numpy as np
5
- from transformers import pipeline
6
- from typing import Union, Tuple, List
7
-
8
- class MusicGenreClassifier:
9
- def __init__(self):
10
- try:
11
- # Initialize both audio and text classification pipelines with auto device mapping
12
- self.text_classifier = pipeline(
13
- "zero-shot-classification",
14
- model="facebook/bart-large-mnli",
15
- device="cpu"
16
- )
17
-
18
- # For audio classification, we'll use MIT's music classification model
19
- self.audio_classifier = pipeline(
20
- "audio-classification",
21
- model="mit/ast-finetuned-audioset-10-10-0.4593",
22
- device="cpu"
23
- )
24
- except Exception as e:
25
- print(f"Warning: GPU initialization failed, falling back to CPU. Error: {str(e)}")
26
- # Fall back to CPU if GPU initialization fails
27
- self.text_classifier = pipeline(
28
- "zero-shot-classification",
29
- model="facebook/bart-large-mnli",
30
- device="cpu"
31
- )
32
-
33
- self.audio_classifier = pipeline(
34
- "audio-classification",
35
- model="mit/ast-finetuned-audioset-10-10-0.4593",
36
- device="cpu"
37
- )
38
-
39
- # Define standard genres for classification
40
- self.genres = [
41
- "rock", "pop", "hip hop", "country", "jazz",
42
- "classical", "electronic", "blues", "reggae", "metal"
43
- ]
44
-
45
- # Mapping from model output labels to our standard genres
46
- self.label_mapping = {
47
- "Music": "pop", # Default mapping
48
- "Rock music": "rock",
49
- "Pop music": "pop",
50
- "Hip hop music": "hip hop",
51
- "Country": "country",
52
- "Jazz": "jazz",
53
- "Classical music": "classical",
54
- "Electronic music": "electronic",
55
- "Blues": "blues",
56
- "Reggae": "reggae",
57
- "Heavy metal": "metal"
58
- }
59
-
60
- def process_audio(self, audio_path: str) -> torch.Tensor:
61
- """Process audio file to match model requirements."""
62
- try:
63
- # Load audio using librosa (handles more formats)
64
- waveform, sample_rate = librosa.load(audio_path, sr=16000)
65
- # Convert to torch tensor and ensure proper shape
66
- waveform = torch.from_numpy(waveform).float()
67
- if len(waveform.shape) == 1:
68
- waveform = waveform.unsqueeze(0)
69
- return waveform
70
- except Exception as e:
71
- raise ValueError(f"Error processing audio file: {str(e)}")
72
-
73
- def map_label_to_genre(self, label: str) -> str:
74
- """Map model output label to standard genre."""
75
- return self.label_mapping.get(label, "pop") # Default to pop if unknown
76
-
77
- def classify_audio(self, audio_path: str) -> Tuple[str, float]:
78
- """Classify genre from audio file."""
79
- try:
80
- waveform = self.process_audio(audio_path)
81
- predictions = self.audio_classifier(waveform, top_k=3)
82
-
83
- # Process predictions
84
- if isinstance(predictions, list):
85
- predictions = predictions[0]
86
-
87
- # Find the highest scoring music-related prediction
88
- music_preds = [
89
- (self.map_label_to_genre(p['label']), p['score'])
90
- for p in predictions
91
- if p['label'] in self.label_mapping
92
- ]
93
-
94
- if not music_preds:
95
- # If no music genres found, return default
96
- return "pop", 0.5
97
-
98
- # Get the highest scoring genre
99
- genre, score = max(music_preds, key=lambda x: x[1])
100
- return genre, score
101
-
102
- except Exception as e:
103
- raise ValueError(f"Audio classification failed: {str(e)}")
104
-
105
- def classify_text(self, lyrics: str) -> Tuple[str, float]:
106
- """Classify genre from lyrics text."""
107
- try:
108
- # Prepare the hypothesis template for zero-shot classification
109
- hypothesis_template = "This text contains {} music lyrics."
110
-
111
- result = self.text_classifier(
112
- lyrics,
113
- candidate_labels=self.genres,
114
- hypothesis_template=hypothesis_template
115
- )
116
-
117
- return result['labels'][0], result['scores'][0]
118
- except Exception as e:
119
- raise ValueError(f"Text classification failed: {str(e)}")
120
-
121
- def predict(self, input_data: str, input_type: str = None) -> dict:
122
- """
123
- Main prediction method that handles both audio and text inputs.
124
-
125
- Args:
126
- input_data: Path to audio file or lyrics text
127
- input_type: Optional, 'audio' or 'text'. If None, will try to auto-detect
128
-
129
- Returns:
130
- dict containing predicted genre and confidence score
131
- """
132
- # Try to auto-detect input type if not specified
133
- if input_type is None:
134
- input_type = 'audio' if input_data.lower().endswith(('.mp3', '.wav', '.ogg', '.flac')) else 'text'
135
-
136
- try:
137
- if input_type == 'audio':
138
- genre, confidence = self.classify_audio(input_data)
139
- else:
140
- genre, confidence = self.classify_text(input_data)
141
-
142
- return {
143
- 'genre': genre,
144
- 'confidence': float(confidence),
145
- 'input_type': input_type
146
- }
147
- except Exception as e:
148
- raise ValueError(f"Prediction failed: {str(e)}")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src/lyric_generator.py DELETED
@@ -1,144 +0,0 @@
1
- from transformers import pipeline
2
- import torch
3
- from typing import Dict, List, Optional
4
-
5
- class LyricGenerator:
6
- def __init__(self, model_name: str = "gpt2-medium"):
7
- """
8
- Initialize the lyric generator with a specified language model.
9
-
10
- Args:
11
- model_name: The name of the pre-trained model to use
12
- """
13
- try:
14
- # Try to use CUDA if available
15
- if torch.cuda.is_available():
16
- device = "cuda"
17
- else:
18
- device = "cpu"
19
-
20
- self.generator = pipeline(
21
- "text-generation",
22
- model=model_name,
23
- device_map="auto" # Let transformers handle device mapping
24
- )
25
- except Exception as e:
26
- print(f"Warning: GPU initialization failed, falling back to CPU. Error: {str(e)}")
27
- self.generator = pipeline(
28
- "text-generation",
29
- model=model_name,
30
- device="cpu"
31
- )
32
-
33
- # Genre-specific prompts to guide generation
34
- self.genre_prompts = {
35
- "rock": "Write energetic rock lyrics about",
36
- "pop": "Create catchy pop lyrics about",
37
- "hip hop": "Write hip hop verses about",
38
- "country": "Write country music lyrics about",
39
- "jazz": "Compose smooth jazz lyrics about",
40
- "classical": "Write classical music lyrics about",
41
- "electronic": "Create electronic dance music lyrics about",
42
- "blues": "Write soulful blues lyrics about",
43
- "reggae": "Write laid-back reggae lyrics about",
44
- "metal": "Write intense metal lyrics about"
45
- }
46
-
47
- def generate_lyrics(
48
- self,
49
- genre: str,
50
- theme: str,
51
- max_length: int = 200,
52
- num_return_sequences: int = 1,
53
- temperature: float = 0.9,
54
- top_p: float = 0.9,
55
- top_k: int = 50
56
- ) -> List[str]:
57
- """
58
- Generate lyrics based on genre and theme.
59
-
60
- Args:
61
- genre: The music genre to generate lyrics for
62
- theme: The theme or topic for the lyrics
63
- max_length: Maximum length of generated text
64
- num_return_sequences: Number of different lyrics to generate
65
- temperature: Controls randomness (higher = more random)
66
- top_p: Nucleus sampling parameter
67
- top_k: Top-k sampling parameter
68
-
69
- Returns:
70
- List of generated lyrics
71
- """
72
- try:
73
- # Get genre-specific prompt or use default
74
- genre = genre.lower()
75
- base_prompt = self.genre_prompts.get(
76
- genre,
77
- "Write song lyrics about"
78
- )
79
-
80
- # Construct full prompt
81
- prompt = f"{base_prompt} {theme}:\n\n"
82
-
83
- # Generate lyrics
84
- outputs = self.generator(
85
- prompt,
86
- max_length=max_length,
87
- num_return_sequences=num_return_sequences,
88
- temperature=temperature,
89
- top_p=top_p,
90
- top_k=top_k,
91
- do_sample=True,
92
- pad_token_id=50256 # GPT-2's pad token ID
93
- )
94
-
95
- # Process and clean up the generated texts
96
- generated_lyrics = []
97
- for output in outputs:
98
- # Remove the prompt from the generated text
99
- lyrics = output['generated_text'][len(prompt):].strip()
100
- # Basic cleanup
101
- lyrics = lyrics.replace('<|endoftext|>', '').strip()
102
- generated_lyrics.append(lyrics)
103
-
104
- return generated_lyrics
105
-
106
- except Exception as e:
107
- raise ValueError(f"Lyric generation failed: {str(e)}")
108
-
109
- def style_transfer(
110
- self,
111
- original_lyrics: str,
112
- target_genre: str,
113
- temperature: float = 0.9
114
- ) -> str:
115
- """
116
- Attempt to transfer the style of existing lyrics to a target genre.
117
-
118
- Args:
119
- original_lyrics: The original lyrics to restyle
120
- target_genre: The target genre for the style transfer
121
- temperature: Controls randomness of generation
122
-
123
- Returns:
124
- Restyled lyrics in the target genre
125
- """
126
- try:
127
- prompt = f"Rewrite these lyrics in {target_genre} style:\n\n{original_lyrics}\n\nNew version:\n"
128
-
129
- output = self.generator(
130
- prompt,
131
- max_length=len(prompt) + 200,
132
- temperature=temperature,
133
- top_p=0.9,
134
- do_sample=True,
135
- num_return_sequences=1
136
- )[0]
137
-
138
- # Extract the new version only
139
- generated_text = output['generated_text']
140
- new_lyrics = generated_text.split("New version:\n")[-1].strip()
141
- return new_lyrics
142
-
143
- except Exception as e:
144
- raise ValueError(f"Style transfer failed: {str(e)}")