sander-wood
/

clamp3

Feature Extraction

music

Model card Files Files and versions Community

sander-wood commited on 28 days ago

Commit

1c16446

verified ·

1 Parent(s): b27ec7f

Update README.md

Browse files

Files changed (1) hide show

README.md +195 -87

README.md CHANGED Viewed

@@ -145,54 +145,218 @@ CLaMP 3 unifies diverse music data and text into a shared representation space,
 For examples demonstrating these capabilities, visit [CLaMP 3 Homepage](https://sanderwood.github.io/clamp3/).
-## **Repository Structure**
-- **[code/](https://github.com/sanderwood/clamp3/tree/main/code)** → Training & feature extraction scripts.
-- **[classification/](https://github.com/sanderwood/clamp3/tree/main/classification)** → Linear classification training and prediction.
-- **[preprocessing/](https://github.com/sanderwood/clamp3/tree/main/preprocessing)** → Convert data into Interleaved ABC, MTF, or MERT-extracted features.
-- **[retrieval/](https://github.com/sanderwood/clamp3/tree/main/retrieval)** → Semantic search, retrieval evaluation, and similarity calculations.
-> **Note:** Ensure the model weights are placed in the `code/` folder, and verify the configuration hyperparameters before use.
-## **Getting Started**
-### **Environment Setup**
-To set up the environment for CLaMP 3, run:
 ```bash
-conda env create -f environment.yml
 conda activate clamp3
 ```
-### **Data Preparation**
-#### **1. Convert Music Data to Compatible Formats**
-Before using CLaMP 3, preprocess **MusicXML files** into **Interleaved ABC**, **MIDI files** into **MTF**, and **audio files** into **MERT-extracted features**.
-> **Note:** Each script requires a manual edit of the `input_dir` variable at the top of the file before running, except for the MERT extraction script (`extract_mert.py`), which takes command-line arguments for input and output paths.
-##### **1.1 Convert MusicXML to Interleaved ABC Notation**
-CLaMP 3 requires **Interleaved ABC notation** for sheet music. To achieve this, first, convert **MusicXML** (`.mxl`, `.xml`, `.musicxml`) to **standard ABC** using [`batch_xml2abc.py`](https://github.com/sanderwood/clamp3/blob/main/preprocessing/abc/batch_xml2abc.py):
 ```bash
-python batch_xml2abc.py
 ```
-- **Input:** `.mxl`, `.xml`, `.musicxml`
-- **Output:** `.abc` (Standard ABC)
-Next, process the standard ABC files into **Interleaved ABC notation** using [`batch_interleaved_abc.py`](https://github.com/sanderwood/clamp3/blob/main/preprocessing/abc/batch_interleaved_abc.py):
 ```bash
-python batch_interleaved_abc.py
 ```
-- **Input:** `.abc` (Standard ABC)
-- **Output:** `.abc` *(Interleaved ABC for CLaMP 3)*
-##### **1.2 Convert MIDI to MTF Format**
-CLaMP 3 processes performance signals in **MIDI Text Format (MTF)**. Convert **MIDI files** (`.mid`, `.midi`) into **MTF format** using [`batch_midi2mtf.py`](https://github.com/sanderwood/clamp3/blob/main/preprocessing/midi/batch_midi2mtf.py):
 ```bash
-python batch_midi2mtf.py
 ```
-- **Input:** `.mid`, `.midi`
-- **Output:** `.mtf` *(MTF for CLaMP 3)*
 ##### **1.3 Extract Audio Features using MERT**
 For audio processing, CLaMP 3 uses **MERT-extracted features** instead of raw waveforms. Extract MERT-based features from raw audio (`.mp3`, `.wav`) using [`extract_mert.py`](https://github.com/sanderwood/clamp3/blob/main/preprocessing/audio/extract_mert.py):
@@ -238,7 +402,7 @@ By default, CLaMP 3 is configured for the **SAAS version** (optimized for audio)
 After training (or using pre-trained weights), extract features using [`extract_clamp3.py`](https://github.com/sanderwood/clamp3/blob/main/code/extract_clamp3.py):
 ```bash
-accelerate launch extract_clamp3.py --epoch <epoch> <input_dir> <output_dir> [--get_global]
 ```
 - **`--epoch <epoch>`:** (Optional) Specify the checkpoint epoch.
 - **`<input_dir>`:** Directory containing the input files.
@@ -249,62 +413,6 @@ All extracted features are stored as `.npy` files.
 > **Note**: For retrieval, `--get_global` must be used. Without it, CLaMP 3 will not work correctly for retrieval tasks. You only omit `--get_global` if you are performing downstream fine-tuning or need raw feature extraction for custom tasks.
-### **Retrieval and Classification**
-#### **1. Semantic Search**
-To perform semantic search with CLaMP 3, you first need to extract the features for both your **query** and **reference** data using [`extract_clamp3.py`](https://github.com/sanderwood/clamp3/blob/main/code/extract_clamp3.py). The query is usually a text description, and the reference folder contains a large set of music data, such as audio or sheet music.
-After extracting the features, you can perform the semantic search using the [`semantic_search.py`](https://github.com/sanderwood/clamp3/blob/main/retrieval/semantic_search.py) script. This search can be used for various tasks.
-```bash
-python semantic_search.py <query_file> <reference_folder> [--top_k TOP_K]
-```
-- **`<query_file>`**: Path to the query feature (e.g., `ballad.npy`).
-- **`<reference_folder>`**: Folder containing reference features for comparison.
-- **`--top_k`**: *(Optional)* Number of top similar items to display (default is 10).
-CLaMP 3's semantic search enables various retrieval and evaluation tasks by comparing features extracted from queries and reference data. Generally, the larger and more diverse the reference music dataset, the higher the likelihood of retrieving relevant and accurately matched music.
-##### **1. Text-to-Music Retrieval**
-- **Query:** Text description of the desired music.
-- **Reference:** Music data (e.g., audio files).
-- **Output:** Retrieves music that best matches the semantic meaning of the text description.
-##### **2. Image-to-Music Retrieval**
-- **Query:** Generate an image caption using models like [BLIP](https://huggingface.co/Salesforce/blip-image-captioning-base).
-- **Reference:** Music data (e.g., audio files).
-- **Output:** Finds music that semantically aligns with the image.
-##### **3. Cross-Modal and Same-Modal Music Retrieval**
-- **Cross-Modal Retrieval:**
-  - **Query:** Music data from one modality (e.g., audio).
-  - **Reference:** Music data from another modality (e.g., MIDI, ABC notation).
-  - **Output:** Finds semantically similar music across different representations.
-- **Same-Modal Retrieval (Semantic-Based Music Recommendation):**
-  - **Query & Reference:** Both are from the same modality (e.g., audio-to-audio).
-  - **Output:** Recommends similar music based on semantic meaning.
-##### **4. Zero-Shot Music Classification**
-- **Query:** Music data.
-- **Reference:** Class descriptions (e.g., "It is classical," "It is folk").
-- **Output:** Assigns the most relevant class based on feature similarity.
-##### **5. Music Semantic Similarity Evaluation**
-- **Query:** High-quality music or music generation prompt.
-- **Reference:** Generated music.
-- **Output:** Ranks generated music based on semantic similarity to the query. For large-scale evaluation between generated music and reference music, it is recommended to use [`clamp3_score.py`](https://github.com/sanderwood/clamp3/blob/main/retrieval/clamp3_score.py).
-#### **2. Classification**
-Train a linear classifier using **[`train_cls.py`](https://github.com/sanderwood/clamp3/tree/main/classification/train_cls.py)**:
-```bash
-python train_cls.py --train_folder <path> --eval_folder <path> [--num_epochs <int>] [--learning_rate <float>] [--balanced_training]
-```
-Run inference with **[`inference_cls.py`](https://github.com/sanderwood/clamp3/tree/main/classification/inference_cls.py)**:
-```bash
-python inference_cls.py <weights_path> <feature_folder> <output_file>
-```
 ## **Citation**
 If you find CLaMP 3 useful in your work, please consider citing our paper:

 For examples demonstrating these capabilities, visit [CLaMP 3 Homepage](https://sanderwood.github.io/clamp3/).
+## **Quick Start Guide**
+For users who want to get started quickly without delving into the details, follow these steps:
+### **Install Environment**
 ```bash
+conda create -n clamp3 python=3.10.16 -y
 conda activate clamp3
+conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia -y
+pip install -r requirements.txt
 ```
+### **Overview of `clamp3_*.py` Scripts**
+CLaMP 3 provides the `clamp3_*.py` script series for **streamlined data preprocessing, feature extraction, retrieval, similarity computation, and evaluation**. These scripts offer an easy-to-use solution for processing different modalities with minimal configuration.
+**Common Features of `clamp3_*.py` Scripts:**
+- **End-to-End Processing**: Each script handles the entire pipeline in a single command.
+- **Automatic Modality Detection**:
+  Simply specify the file path, and the script will automatically detect the modality (e.g., **audio**, **performance signals**, **sheet music**, **images**, or **text**) and extract the relevant features. Supported formats include:
+  - **Audio**: `.mp3`, `.wav`
+  - **Performance Signals**: `.mid`, `.midi`
+  - **Sheet Music**: `.mxl`, `.musicxml`, `.xml`
+  - **Images**: `.png`, `.jpg`
+  - **Text**: `.txt`
+- **First-Time Model Download**:
+  - The necessary model weights for **[CLaMP 3 (SAAS)](https://huggingface.co/sander-wood/clamp3/blob/main/weights_clamp3_saas_h_size_768_t_model_FacebookAI_xlm-roberta-base_t_length_128_a_size_768_a_layers_12_a_length_128_s_size_768_s_layers_12_p_size_64_p_length_512.pth)**, **[MERT-v1-95M](https://huggingface.co/m-a-p/MERT-v1-95M)**, and any other required models will be automatically downloaded if needed.
+  - Once downloaded, models are cached and will not be re-downloaded in future runs.
+- **Feature Management**:
+  - Extracted features are saved in `inference/` and **won't be overwritten** to avoid redundant computations.
+  - **To run retrieval on a new dataset**, manually delete the corresponding folder inside `inference/` (e.g., `inference/audio_features/`). Otherwise, previously extracted features will be reused.
+  - Temporary files are stored in `temp/` and **are cleaned up after each run**.
+> **Note**: All files within a folder must belong to the same modality; the script will process them based on the first detected format.
+#### **[`clamp3_search.py`](https://github.com/sanderwood/clamp3/blob/main/clamp3_search.py) - Running Retrieval Tasks**
+This script performs semantic retrieval tasks, comparing a query file to reference files in `ref_dir`. Typically, the larger and more diverse the files in `ref_dir`, the better the chances of finding a semantically matching result.
 ```bash
+python clamp3_search.py <query_file> <ref_dir> [--top_k TOP_K]
 ```
+- **Text-to-Music Retrieval**:
+  Query is a `.txt` file, and `ref_dir` contains music files. Retrieves the music most semantically similar to the query text.
+- **Image-to-Music Retrieval**:
+  Query is an image (`.png`, `.jpg`), and `ref_dir` contains music files. **BLIP** generates a caption for the image to find the most semantically matching music.
+- **Music-to-Music Retrieval**:
+  Query is a music file, and `ref_dir` contains music files (same or different modality). Supports **cross-modal retrieval** (e.g., retrieving audio using sheet music).
+- **Zero-Shot Classification**:
+  Query is a music file, and `ref_dir` contains **text-based class prototypes** (e.g., `"It is classical"`, `"It is jazz"`). The highest similarity match is the classification result.
+- **Optional `--top_k` Parameter**:
+  You can specify the number of top results to retrieve using the `--top_k` argument. If not provided, the default value is 10.
+  Example:
+  ```bash
+  python clamp3_search.py <query_file> <ref_dir> --top_k 3
+  ```
+  **Example Output**:
+  ```
+  Top 3 results among 1000 candidates:
+  4tDYMayp6Dk 0.7468
+  vGJTaP6anOU 0.7333
+  JkK8g6FMEXE 0.7054
+  ```
+#### **[`clamp3_score.py`](https://github.com/sanderwood/clamp3/blob/main/clamp3_score.py) - Semantic Similarity Calculation**
+This script compares files in a query directory to a reference directory. By default, it uses **group mode**, but you can switch to **pairwise mode** for paired data.
 ```bash
+python clamp3_score.py <query_dir> <ref_dir> [--pairwise]
+```
+- **Group Mode (default)**:
+  Compares all query files to all reference files and calculates the average similarity. **Use when you don't have paired data** or when dealing with large datasets.
+  **Example**:
+  To compare generated music to ground truth music files (no pairs available), use **group mode**.
+  ```bash
+  python clamp3_score.py query_dir ref_dir
+  ```
+  **Example Output (Group Mode)**:
+  ```
+  Total query features: 1000
+  Total reference features: 1000
+  Group similarity: 0.6711
+  ```
+- **Pairwise Mode**:
+  Compares query files with their corresponding reference files based on **same prefix** (before the dot) and **identical folder structure**. **Use when you have paired data** and the dataset is of manageable size (e.g., thousands of pairs).
+  **Example**:
+  To evaluate a **text-to-music generation model**, where each prompt (e.g., `sample1.txt`) corresponds to one or more generated music files (e.g., `sample1.1.wav`, `sample1.2.wav`), use **pairwise mode**.
+  ```bash
+  python clamp3_score.py query_dir ref_dir --pairwise
+  ```
+  **Folder structure**:
+  ```
+  query_dir/
+  ├── en/
+  │   ├── sample1.wav
+  ├── zh/
+  │   ├── sample1.1.wav
+  │   ├── sample1.2.wav
+  │   ├── sample2.wav
+  ref_dir/
+  ├── en/
+  │   ├── sample1.txt
+  ├── zh/
+  │   ├── sample1.txt
+  │   ├── sample2.txt
+  ```
+  - Files with the **same prefix** (e.g., `query_dir/en/sample1.wav` and `ref_dir/en/sample1.txt`) are treated as pairs.
+  - Multiple query files (e.g., `query_dir/zh/sample1.1.wav`, `query_dir/zh/sample1.2.wav`) can correspond to one reference file (e.g., `query_dir/zh/sample1.txt`).
+  **Example Output (Pairwise Mode)**:
+  ```
+  Total query features: 1000
+  Total reference features: 1000
+  Avg. pairwise similarity: 0.1639
+  ```
+  In **pairwise mode**, the script will additionally output a JSON Lines file (`inference/pairwise_similarities.jsonl`) with the similarity scores for each query-reference pair.
+  For example:
+  ```json
+  {"query": "txt_features/UzUybLGvBxE.npy", "reference": "mid_features/UzUybLGvBxE.npy", "similarity": 0.2289600819349289}
+  ```
+  > **Note**: The file paths in the output will retain the folder structure and file names, but the top-level folder names and file extensions will be replaced.
+#### **[`clamp3_eval.py`](https://github.com/sanderwood/clamp3/blob/main/clamp3_eval.py) - Evaluating Retrieval Performance**
+This script evaluates **CLaMP3's retrieval performance on a paired dataset**, measuring how accurately the system ranks the correct reference files for each query using metrics like **MRR** and **Hit@K**.
+```bash
+python clamp3_eval.py <query_dir> <ref_dir>
+```
+- **Matching Folder Structure & Filenames**:
+  Requires paired query and reference files, with identical folder structure and filenames between `query_dir` and `ref_dir`. This matches the requirements of **pairwise mode** in `clamp3_score.py`.
+- **Evaluation Metrics**:
+  The script calculates the following retrieval metrics:
+  - **MRR (Mean Reciprocal Rank)**
+  - **Hit@1**, **Hit@10**, and **Hit@100**
+**Example Output**:
 ```
+Total query features: 1000
+Total reference features: 1000
+MRR: 0.3301
+Hit@1: 0.251
+Hit@10: 0.482
+Hit@100: 0.796
+```
+- **Additional Output**:
+  A JSON Lines file (`inference/retrieval_ranks.jsonl`) with query-reference ranks:
+  ```json
+  {"query": "txt_features/HQ9FaXu55l0.npy", "reference": "xml_features/HQ9FaXu55l0.npy", "rank": 6}
+  ```
+## **Repository Structure**
+- **[code/](https://github.com/sanderwood/clamp3/tree/main/code)** → Training & feature extraction scripts.
+- **[classification/](https://github.com/sanderwood/clamp3/tree/main/classification)** → Linear classification training and prediction.
+- **[inference/](https://github.com/sanderwood/clamp3/tree/main/inference)** → Semantic search, similarity calculations, and retrieval evaluation.
+- **[preprocessing/](https://github.com/sanderwood/clamp3/tree/main/preprocessing)** → Convert data into Interleaved ABC, MTF, or MERT-extracted features.
+> **Note:** Ensure the model weights are placed in the `code/` folder, and verify the configuration hyperparameters before use.
+## **Key Script Overview**
+### **Data Preparation**
+#### **1. Convert Music Data to Compatible Formats**
+Before using CLaMP 3, preprocess **MusicXML files** into **Interleaved ABC**, **MIDI files** into **MTF**, and **audio files** into **MERT-extracted features**.
+##### **1.1 Convert MusicXML to Interleaved ABC Notation**
+CLaMP 3 requires **Interleaved ABC notation** for sheet music. Follow these steps:
+1. Convert **MusicXML** (`.mxl`, `.xml`, `.musicxml`) to **standard ABC** using [`batch_xml2abc.py`](https://github.com/sanderwood/clamp3/blob/main/preprocessing/abc/batch_xml2abc.py):
+   ```bash
+   python batch_xml2abc.py <input_dir> <output_dir>
+   ```
+   - **Input:** Directory containing `.mxl`, `.xml`, `.musicxml` files
+   - **Output:** Directory where converted `.abc` (Standard ABC) files will be saved
+2. Convert **Standard ABC** into **Interleaved ABC** using [`batch_interleaved_abc.py`](https://github.com/sanderwood/clamp3/blob/main/preprocessing/abc/batch_interleaved_abc.py):
+   ```bash
+   python batch_interleaved_abc.py <input_dir> <output_dir>
+   ```
+   - **Input:** Directory containing `.abc` (Standard ABC) files
+   - **Output:** Directory where Interleaved ABC files will be saved *(for CLaMP 3 use)*
+##### **1.2 Convert MIDI to MTF Format**
+CLaMP 3 processes performance signals in **MIDI Text Format (MTF)**. Convert **MIDI files** (`.mid`, `.midi`) into **MTF format** using [`batch_midi2mtf.py`](https://github.com/sanderwood/clamp3/blob/main/preprocessing/midi/batch_midi2mtf.py):
 ```bash
+python batch_midi2mtf.py <input_dir> <output_dir> --m3_compatible
 ```
+- **Input:** Directory containing `.mid`, `.midi` files
+- **Output:** Directory where `.mtf` files will be saved *(MTF format for CLaMP 3)*
+- **Important:** The `--m3_compatible` flag **must be included** to ensure the output format is compatible with CLaMP 3. Without this flag, the extracted MTF files **will not work** correctly in the pipeline.
 ##### **1.3 Extract Audio Features using MERT**
 For audio processing, CLaMP 3 uses **MERT-extracted features** instead of raw waveforms. Extract MERT-based features from raw audio (`.mp3`, `.wav`) using [`extract_mert.py`](https://github.com/sanderwood/clamp3/blob/main/preprocessing/audio/extract_mert.py):
 After training (or using pre-trained weights), extract features using [`extract_clamp3.py`](https://github.com/sanderwood/clamp3/blob/main/code/extract_clamp3.py):
 ```bash
+accelerate launch extract_clamp3.py --epoch <epoch> <input_dir> <output_dir> --get_global
 ```
 - **`--epoch <epoch>`:** (Optional) Specify the checkpoint epoch.
 - **`<input_dir>`:** Directory containing the input files.
 > **Note**: For retrieval, `--get_global` must be used. Without it, CLaMP 3 will not work correctly for retrieval tasks. You only omit `--get_global` if you are performing downstream fine-tuning or need raw feature extraction for custom tasks.
 ## **Citation**
 If you find CLaMP 3 useful in your work, please consider citing our paper: