Feature Extraction
music
sander-wood commited on
Commit
1c16446
Β·
verified Β·
1 Parent(s): b27ec7f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +195 -87
README.md CHANGED
@@ -145,54 +145,218 @@ CLaMP 3 unifies diverse music data and text into a shared representation space,
145
 
146
  For examples demonstrating these capabilities, visit [CLaMP 3 Homepage](https://sanderwood.github.io/clamp3/).
147
 
148
- ## **Repository Structure**
149
- - **[code/](https://github.com/sanderwood/clamp3/tree/main/code)** β†’ Training & feature extraction scripts.
150
- - **[classification/](https://github.com/sanderwood/clamp3/tree/main/classification)** β†’ Linear classification training and prediction.
151
- - **[preprocessing/](https://github.com/sanderwood/clamp3/tree/main/preprocessing)** β†’ Convert data into Interleaved ABC, MTF, or MERT-extracted features.
152
- - **[retrieval/](https://github.com/sanderwood/clamp3/tree/main/retrieval)** β†’ Semantic search, retrieval evaluation, and similarity calculations.
153
 
154
- > **Note:** Ensure the model weights are placed in the `code/` folder, and verify the configuration hyperparameters before use.
155
-
156
- ## **Getting Started**
157
- ### **Environment Setup**
158
- To set up the environment for CLaMP 3, run:
159
  ```bash
160
- conda env create -f environment.yml
161
  conda activate clamp3
 
 
162
  ```
163
 
164
- ### **Data Preparation**
165
- #### **1. Convert Music Data to Compatible Formats**
166
- Before using CLaMP 3, preprocess **MusicXML files** into **Interleaved ABC**, **MIDI files** into **MTF**, and **audio files** into **MERT-extracted features**.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
167
 
168
- > **Note:** Each script requires a manual edit of the `input_dir` variable at the top of the file before running, except for the MERT extraction script (`extract_mert.py`), which takes command-line arguments for input and output paths.
169
 
170
- ##### **1.1 Convert MusicXML to Interleaved ABC Notation**
171
 
172
- CLaMP 3 requires **Interleaved ABC notation** for sheet music. To achieve this, first, convert **MusicXML** (`.mxl`, `.xml`, `.musicxml`) to **standard ABC** using [`batch_xml2abc.py`](https://github.com/sanderwood/clamp3/blob/main/preprocessing/abc/batch_xml2abc.py):
173
 
174
  ```bash
175
- python batch_xml2abc.py
176
  ```
177
- - **Input:** `.mxl`, `.xml`, `.musicxml`
178
- - **Output:** `.abc` (Standard ABC)
179
-
180
- Next, process the standard ABC files into **Interleaved ABC notation** using [`batch_interleaved_abc.py`](https://github.com/sanderwood/clamp3/blob/main/preprocessing/abc/batch_interleaved_abc.py):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
181
 
182
  ```bash
183
- python batch_interleaved_abc.py
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
184
  ```
185
- - **Input:** `.abc` (Standard ABC)
186
- - **Output:** `.abc` *(Interleaved ABC for CLaMP 3)*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
187
 
188
- ##### **1.2 Convert MIDI to MTF Format**
189
- CLaMP 3 processes performance signals in **MIDI Text Format (MTF)**. Convert **MIDI files** (`.mid`, `.midi`) into **MTF format** using [`batch_midi2mtf.py`](https://github.com/sanderwood/clamp3/blob/main/preprocessing/midi/batch_midi2mtf.py):
190
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
191
  ```bash
192
- python batch_midi2mtf.py
193
  ```
194
- - **Input:** `.mid`, `.midi`
195
- - **Output:** `.mtf` *(MTF for CLaMP 3)*
 
196
 
197
  ##### **1.3 Extract Audio Features using MERT**
198
  For audio processing, CLaMP 3 uses **MERT-extracted features** instead of raw waveforms. Extract MERT-based features from raw audio (`.mp3`, `.wav`) using [`extract_mert.py`](https://github.com/sanderwood/clamp3/blob/main/preprocessing/audio/extract_mert.py):
@@ -238,7 +402,7 @@ By default, CLaMP 3 is configured for the **SAAS version** (optimized for audio)
238
  After training (or using pre-trained weights), extract features using [`extract_clamp3.py`](https://github.com/sanderwood/clamp3/blob/main/code/extract_clamp3.py):
239
 
240
  ```bash
241
- accelerate launch extract_clamp3.py --epoch <epoch> <input_dir> <output_dir> [--get_global]
242
  ```
243
  - **`--epoch <epoch>`:** (Optional) Specify the checkpoint epoch.
244
  - **`<input_dir>`:** Directory containing the input files.
@@ -249,62 +413,6 @@ All extracted features are stored as `.npy` files.
249
 
250
  > **Note**: For retrieval, `--get_global` must be used. Without it, CLaMP 3 will not work correctly for retrieval tasks. You only omit `--get_global` if you are performing downstream fine-tuning or need raw feature extraction for custom tasks.
251
 
252
- ### **Retrieval and Classification**
253
- #### **1. Semantic Search**
254
-
255
- To perform semantic search with CLaMP 3, you first need to extract the features for both your **query** and **reference** data using [`extract_clamp3.py`](https://github.com/sanderwood/clamp3/blob/main/code/extract_clamp3.py). The query is usually a text description, and the reference folder contains a large set of music data, such as audio or sheet music.
256
-
257
- After extracting the features, you can perform the semantic search using the [`semantic_search.py`](https://github.com/sanderwood/clamp3/blob/main/retrieval/semantic_search.py) script. This search can be used for various tasks.
258
-
259
- ```bash
260
- python semantic_search.py <query_file> <reference_folder> [--top_k TOP_K]
261
- ```
262
- - **`<query_file>`**: Path to the query feature (e.g., `ballad.npy`).
263
- - **`<reference_folder>`**: Folder containing reference features for comparison.
264
- - **`--top_k`**: *(Optional)* Number of top similar items to display (default is 10).
265
-
266
- CLaMP 3's semantic search enables various retrieval and evaluation tasks by comparing features extracted from queries and reference data. Generally, the larger and more diverse the reference music dataset, the higher the likelihood of retrieving relevant and accurately matched music.
267
-
268
- ##### **1. Text-to-Music Retrieval**
269
- - **Query:** Text description of the desired music.
270
- - **Reference:** Music data (e.g., audio files).
271
- - **Output:** Retrieves music that best matches the semantic meaning of the text description.
272
-
273
- ##### **2. Image-to-Music Retrieval**
274
- - **Query:** Generate an image caption using models like [BLIP](https://huggingface.co/Salesforce/blip-image-captioning-base).
275
- - **Reference:** Music data (e.g., audio files).
276
- - **Output:** Finds music that semantically aligns with the image.
277
-
278
- ##### **3. Cross-Modal and Same-Modal Music Retrieval**
279
- - **Cross-Modal Retrieval:**
280
- - **Query:** Music data from one modality (e.g., audio).
281
- - **Reference:** Music data from another modality (e.g., MIDI, ABC notation).
282
- - **Output:** Finds semantically similar music across different representations.
283
-
284
- - **Same-Modal Retrieval (Semantic-Based Music Recommendation):**
285
- - **Query & Reference:** Both are from the same modality (e.g., audio-to-audio).
286
- - **Output:** Recommends similar music based on semantic meaning.
287
-
288
- ##### **4. Zero-Shot Music Classification**
289
- - **Query:** Music data.
290
- - **Reference:** Class descriptions (e.g., "It is classical," "It is folk").
291
- - **Output:** Assigns the most relevant class based on feature similarity.
292
-
293
- ##### **5. Music Semantic Similarity Evaluation**
294
- - **Query:** High-quality music or music generation prompt.
295
- - **Reference:** Generated music.
296
- - **Output:** Ranks generated music based on semantic similarity to the query. For large-scale evaluation between generated music and reference music, it is recommended to use [`clamp3_score.py`](https://github.com/sanderwood/clamp3/blob/main/retrieval/clamp3_score.py).
297
-
298
- #### **2. Classification**
299
- Train a linear classifier using **[`train_cls.py`](https://github.com/sanderwood/clamp3/tree/main/classification/train_cls.py)**:
300
- ```bash
301
- python train_cls.py --train_folder <path> --eval_folder <path> [--num_epochs <int>] [--learning_rate <float>] [--balanced_training]
302
- ```
303
- Run inference with **[`inference_cls.py`](https://github.com/sanderwood/clamp3/tree/main/classification/inference_cls.py)**:
304
- ```bash
305
- python inference_cls.py <weights_path> <feature_folder> <output_file>
306
- ```
307
-
308
  ## **Citation**
309
  If you find CLaMP 3 useful in your work, please consider citing our paper:
310
 
 
145
 
146
  For examples demonstrating these capabilities, visit [CLaMP 3 Homepage](https://sanderwood.github.io/clamp3/).
147
 
148
+ ## **Quick Start Guide**
149
+ For users who want to get started quickly without delving into the details, follow these steps:
 
 
 
150
 
151
+ ### **Install Environment**
 
 
 
 
152
  ```bash
153
+ conda create -n clamp3 python=3.10.16 -y
154
  conda activate clamp3
155
+ conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia -y
156
+ pip install -r requirements.txt
157
  ```
158
 
159
+ ### **Overview of `clamp3_*.py` Scripts**
160
+ CLaMP 3 provides the `clamp3_*.py` script series for **streamlined data preprocessing, feature extraction, retrieval, similarity computation, and evaluation**. These scripts offer an easy-to-use solution for processing different modalities with minimal configuration.
161
+
162
+ **Common Features of `clamp3_*.py` Scripts:**
163
+ - **End-to-End Processing**: Each script handles the entire pipeline in a single command.
164
+ - **Automatic Modality Detection**:
165
+ Simply specify the file path, and the script will automatically detect the modality (e.g., **audio**, **performance signals**, **sheet music**, **images**, or **text**) and extract the relevant features. Supported formats include:
166
+ - **Audio**: `.mp3`, `.wav`
167
+ - **Performance Signals**: `.mid`, `.midi`
168
+ - **Sheet Music**: `.mxl`, `.musicxml`, `.xml`
169
+ - **Images**: `.png`, `.jpg`
170
+ - **Text**: `.txt`
171
+ - **First-Time Model Download**:
172
+ - The necessary model weights for **[CLaMP 3 (SAAS)](https://huggingface.co/sander-wood/clamp3/blob/main/weights_clamp3_saas_h_size_768_t_model_FacebookAI_xlm-roberta-base_t_length_128_a_size_768_a_layers_12_a_length_128_s_size_768_s_layers_12_p_size_64_p_length_512.pth)**, **[MERT-v1-95M](https://huggingface.co/m-a-p/MERT-v1-95M)**, and any other required models will be automatically downloaded if needed.
173
+ - Once downloaded, models are cached and will not be re-downloaded in future runs.
174
+
175
+ - **Feature Management**:
176
+ - Extracted features are saved in `inference/` and **won't be overwritten** to avoid redundant computations.
177
+ - **To run retrieval on a new dataset**, manually delete the corresponding folder inside `inference/` (e.g., `inference/audio_features/`). Otherwise, previously extracted features will be reused.
178
+ - Temporary files are stored in `temp/` and **are cleaned up after each run**.
179
 
180
+ > **Note**: All files within a folder must belong to the same modality; the script will process them based on the first detected format.
181
 
182
+ #### **[`clamp3_search.py`](https://github.com/sanderwood/clamp3/blob/main/clamp3_search.py) - Running Retrieval Tasks**
183
 
184
+ This script performs semantic retrieval tasks, comparing a query file to reference files in `ref_dir`. Typically, the larger and more diverse the files in `ref_dir`, the better the chances of finding a semantically matching result.
185
 
186
  ```bash
187
+ python clamp3_search.py <query_file> <ref_dir> [--top_k TOP_K]
188
  ```
189
+
190
+ - **Text-to-Music Retrieval**:
191
+ Query is a `.txt` file, and `ref_dir` contains music files. Retrieves the music most semantically similar to the query text.
192
+
193
+ - **Image-to-Music Retrieval**:
194
+ Query is an image (`.png`, `.jpg`), and `ref_dir` contains music files. **BLIP** generates a caption for the image to find the most semantically matching music.
195
+
196
+ - **Music-to-Music Retrieval**:
197
+ Query is a music file, and `ref_dir` contains music files (same or different modality). Supports **cross-modal retrieval** (e.g., retrieving audio using sheet music).
198
+
199
+ - **Zero-Shot Classification**:
200
+ Query is a music file, and `ref_dir` contains **text-based class prototypes** (e.g., `"It is classical"`, `"It is jazz"`). The highest similarity match is the classification result.
201
+
202
+ - **Optional `--top_k` Parameter**:
203
+ You can specify the number of top results to retrieve using the `--top_k` argument. If not provided, the default value is 10.
204
+ Example:
205
+ ```bash
206
+ python clamp3_search.py <query_file> <ref_dir> --top_k 3
207
+ ```
208
+
209
+ **Example Output**:
210
+ ```
211
+ Top 3 results among 1000 candidates:
212
+ 4tDYMayp6Dk 0.7468
213
+ vGJTaP6anOU 0.7333
214
+ JkK8g6FMEXE 0.7054
215
+ ```
216
+
217
+ #### **[`clamp3_score.py`](https://github.com/sanderwood/clamp3/blob/main/clamp3_score.py) - Semantic Similarity Calculation**
218
+
219
+ This script compares files in a query directory to a reference directory. By default, it uses **group mode**, but you can switch to **pairwise mode** for paired data.
220
 
221
  ```bash
222
+ python clamp3_score.py <query_dir> <ref_dir> [--pairwise]
223
+ ```
224
+
225
+ - **Group Mode (default)**:
226
+ Compares all query files to all reference files and calculates the average similarity. **Use when you don't have paired data** or when dealing with large datasets.
227
+
228
+ **Example**:
229
+ To compare generated music to ground truth music files (no pairs available), use **group mode**.
230
+
231
+ ```bash
232
+ python clamp3_score.py query_dir ref_dir
233
+ ```
234
+
235
+ **Example Output (Group Mode)**:
236
+ ```
237
+ Total query features: 1000
238
+ Total reference features: 1000
239
+ Group similarity: 0.6711
240
+ ```
241
+
242
+ - **Pairwise Mode**:
243
+ Compares query files with their corresponding reference files based on **same prefix** (before the dot) and **identical folder structure**. **Use when you have paired data** and the dataset is of manageable size (e.g., thousands of pairs).
244
+
245
+ **Example**:
246
+ To evaluate a **text-to-music generation model**, where each prompt (e.g., `sample1.txt`) corresponds to one or more generated music files (e.g., `sample1.1.wav`, `sample1.2.wav`), use **pairwise mode**.
247
+
248
+ ```bash
249
+ python clamp3_score.py query_dir ref_dir --pairwise
250
+ ```
251
+
252
+ **Folder structure**:
253
+ ```
254
+ query_dir/
255
+ β”œβ”€β”€ en/
256
+ β”‚ β”œβ”€β”€ sample1.wav
257
+ β”œβ”€β”€ zh/
258
+ β”‚ β”œβ”€β”€ sample1.1.wav
259
+ β”‚ β”œβ”€β”€ sample1.2.wav
260
+ β”‚ β”œβ”€β”€ sample2.wav
261
+
262
+ ref_dir/
263
+ β”œβ”€β”€ en/
264
+ β”‚ β”œβ”€β”€ sample1.txt
265
+ β”œβ”€β”€ zh/
266
+ β”‚ β”œβ”€β”€ sample1.txt
267
+ β”‚ β”œβ”€β”€ sample2.txt
268
+ ```
269
+
270
+ - Files with the **same prefix** (e.g., `query_dir/en/sample1.wav` and `ref_dir/en/sample1.txt`) are treated as pairs.
271
+ - Multiple query files (e.g., `query_dir/zh/sample1.1.wav`, `query_dir/zh/sample1.2.wav`) can correspond to one reference file (e.g., `query_dir/zh/sample1.txt`).
272
+
273
+ **Example Output (Pairwise Mode)**:
274
+ ```
275
+ Total query features: 1000
276
+ Total reference features: 1000
277
+ Avg. pairwise similarity: 0.1639
278
+ ```
279
+
280
+ In **pairwise mode**, the script will additionally output a JSON Lines file (`inference/pairwise_similarities.jsonl`) with the similarity scores for each query-reference pair.
281
+ For example:
282
+ ```json
283
+ {"query": "txt_features/UzUybLGvBxE.npy", "reference": "mid_features/UzUybLGvBxE.npy", "similarity": 0.2289600819349289}
284
+ ```
285
+
286
+ > **Note**: The file paths in the output will retain the folder structure and file names, but the top-level folder names and file extensions will be replaced.
287
+
288
+ #### **[`clamp3_eval.py`](https://github.com/sanderwood/clamp3/blob/main/clamp3_eval.py) - Evaluating Retrieval Performance**
289
+
290
+ This script evaluates **CLaMP3's retrieval performance on a paired dataset**, measuring how accurately the system ranks the correct reference files for each query using metrics like **MRR** and **Hit@K**.
291
+
292
+ ```bash
293
+ python clamp3_eval.py <query_dir> <ref_dir>
294
+ ```
295
+
296
+ - **Matching Folder Structure & Filenames**:
297
+ Requires paired query and reference files, with identical folder structure and filenames between `query_dir` and `ref_dir`. This matches the requirements of **pairwise mode** in `clamp3_score.py`.
298
+
299
+ - **Evaluation Metrics**:
300
+ The script calculates the following retrieval metrics:
301
+ - **MRR (Mean Reciprocal Rank)**
302
+ - **Hit@1**, **Hit@10**, and **Hit@100**
303
+
304
+ **Example Output**:
305
  ```
306
+ Total query features: 1000
307
+ Total reference features: 1000
308
+ MRR: 0.3301
309
+ Hit@1: 0.251
310
+ Hit@10: 0.482
311
+ Hit@100: 0.796
312
+ ```
313
+
314
+ - **Additional Output**:
315
+ A JSON Lines file (`inference/retrieval_ranks.jsonl`) with query-reference ranks:
316
+ ```json
317
+ {"query": "txt_features/HQ9FaXu55l0.npy", "reference": "xml_features/HQ9FaXu55l0.npy", "rank": 6}
318
+ ```
319
+
320
+ ## **Repository Structure**
321
+ - **[code/](https://github.com/sanderwood/clamp3/tree/main/code)** β†’ Training & feature extraction scripts.
322
+ - **[classification/](https://github.com/sanderwood/clamp3/tree/main/classification)** β†’ Linear classification training and prediction.
323
+ - **[inference/](https://github.com/sanderwood/clamp3/tree/main/inference)** β†’ Semantic search, similarity calculations, and retrieval evaluation.
324
+ - **[preprocessing/](https://github.com/sanderwood/clamp3/tree/main/preprocessing)** β†’ Convert data into Interleaved ABC, MTF, or MERT-extracted features.
325
+
326
+ > **Note:** Ensure the model weights are placed in the `code/` folder, and verify the configuration hyperparameters before use.
327
+
328
+ ## **Key Script Overview**
329
+ ### **Data Preparation**
330
+ #### **1. Convert Music Data to Compatible Formats**
331
+ Before using CLaMP 3, preprocess **MusicXML files** into **Interleaved ABC**, **MIDI files** into **MTF**, and **audio files** into **MERT-extracted features**.
332
+
333
+ ##### **1.1 Convert MusicXML to Interleaved ABC Notation**
334
 
335
+ CLaMP 3 requires **Interleaved ABC notation** for sheet music. Follow these steps:
 
336
 
337
+ 1. Convert **MusicXML** (`.mxl`, `.xml`, `.musicxml`) to **standard ABC** using [`batch_xml2abc.py`](https://github.com/sanderwood/clamp3/blob/main/preprocessing/abc/batch_xml2abc.py):
338
+ ```bash
339
+ python batch_xml2abc.py <input_dir> <output_dir>
340
+ ```
341
+ - **Input:** Directory containing `.mxl`, `.xml`, `.musicxml` files
342
+ - **Output:** Directory where converted `.abc` (Standard ABC) files will be saved
343
+
344
+ 2. Convert **Standard ABC** into **Interleaved ABC** using [`batch_interleaved_abc.py`](https://github.com/sanderwood/clamp3/blob/main/preprocessing/abc/batch_interleaved_abc.py):
345
+ ```bash
346
+ python batch_interleaved_abc.py <input_dir> <output_dir>
347
+ ```
348
+ - **Input:** Directory containing `.abc` (Standard ABC) files
349
+ - **Output:** Directory where Interleaved ABC files will be saved *(for CLaMP 3 use)*
350
+
351
+ ##### **1.2 Convert MIDI to MTF Format**
352
+
353
+ CLaMP 3 processes performance signals in **MIDI Text Format (MTF)**. Convert **MIDI files** (`.mid`, `.midi`) into **MTF format** using [`batch_midi2mtf.py`](https://github.com/sanderwood/clamp3/blob/main/preprocessing/midi/batch_midi2mtf.py):
354
  ```bash
355
+ python batch_midi2mtf.py <input_dir> <output_dir> --m3_compatible
356
  ```
357
+ - **Input:** Directory containing `.mid`, `.midi` files
358
+ - **Output:** Directory where `.mtf` files will be saved *(MTF format for CLaMP 3)*
359
+ - **Important:** The `--m3_compatible` flag **must be included** to ensure the output format is compatible with CLaMP 3. Without this flag, the extracted MTF files **will not work** correctly in the pipeline.
360
 
361
  ##### **1.3 Extract Audio Features using MERT**
362
  For audio processing, CLaMP 3 uses **MERT-extracted features** instead of raw waveforms. Extract MERT-based features from raw audio (`.mp3`, `.wav`) using [`extract_mert.py`](https://github.com/sanderwood/clamp3/blob/main/preprocessing/audio/extract_mert.py):
 
402
  After training (or using pre-trained weights), extract features using [`extract_clamp3.py`](https://github.com/sanderwood/clamp3/blob/main/code/extract_clamp3.py):
403
 
404
  ```bash
405
+ accelerate launch extract_clamp3.py --epoch <epoch> <input_dir> <output_dir> --get_global
406
  ```
407
  - **`--epoch <epoch>`:** (Optional) Specify the checkpoint epoch.
408
  - **`<input_dir>`:** Directory containing the input files.
 
413
 
414
  > **Note**: For retrieval, `--get_global` must be used. Without it, CLaMP 3 will not work correctly for retrieval tasks. You only omit `--get_global` if you are performing downstream fine-tuning or need raw feature extraction for custom tasks.
415
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
416
  ## **Citation**
417
  If you find CLaMP 3 useful in your work, please consider citing our paper:
418