musicaudiopretrain commited on
Commit
fbb7a33
·
1 Parent(s): 6ce0ed9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -1
README.md CHANGED
@@ -2,8 +2,37 @@
2
  license: mit
3
  inference: false
4
  ---
 
5
 
6
- # Introduction
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
 
8
  **Music2Vec** is accepted as 2-page abstract in Late Breaking Demos (LBD) at the ISMIR 2022.
9
  It is a completely unsupervised model trained on 1000 hour music audios.
 
2
  license: mit
3
  inference: false
4
  ---
5
+ # Introduction to our series work
6
 
7
+ The development log of our Music Audio Pre-training (m-a-p) model family:
8
+ - 17/03/2023: we release two advanced music understanding models, [MERT-v1-95M](https://huggingface.co/m-a-p/MERT-v1-95M) and [MERT-v1-330M](https://huggingface.co/m-a-p/MERT-v1-330M) , trained with new paradigm and dataset. They outperform the previous models and can better generalize to more tasks.
9
+ - 14/03/2023: we retrained the MERT-v0 model with open-source-only music dataset [MERT-v0-public](https://huggingface.co/m-a-p/MERT-v0-public)
10
+ - 29/12/2022: a music understanding model [MERT-v0](https://huggingface.co/m-a-p/MERT-v0) trained with **MLM** paradigm, which performs better at downstream tasks.
11
+ - 29/10/2022: a pre-trained MIR model [music2vec](https://huggingface.co/m-a-p/music2vec-v1) trained with **BYOL** paradigm.
12
+
13
+
14
+
15
+ Here is a table for quick model pick-up:
16
+
17
+ | Name | Pre-train Paradigm | Training Data (hour) | Pre-train Context (second) | Model Size | Transformer Layer-Dimension | Feature Rate | Sample Rate | Release Date |
18
+ | ------------------------------------------------------------ | ------------------ | -------------------- | ---------------------------- | ---------- | --------------------------- | ------------ | ----------- | ------------ |
19
+ | [MERT-v1-330M](https://huggingface.co/m-a-p/MERT-v1-330M) | MLM | 160K | 5 | 330M | 24-1024 | 75 Hz | 24K Hz | 17/03/2023 |
20
+ | [MERT-v1-95M](https://huggingface.co/m-a-p/MERT-v1-95M) | MLM | 20K | 5 | 95M | 12-768 | 75 Hz | 24K Hz | 17/03/2023 |
21
+ | [MERT-v0-public](https://huggingface.co/m-a-p/MERT-v0-public) | MLM | 900 | 5 | 95M | 12-768 | 50 Hz | 16K Hz | 14/03/2023 |
22
+ | [MERT-v0](https://huggingface.co/m-a-p/MERT-v0) | MLM | 1000 | 5 | 95 M | 12-768 | 50 Hz | 16K Hz | 29/12/2022 |
23
+ | [music2vec-v1](https://huggingface.co/m-a-p/music2vec-v1) | BYOL | 1000 | 30 | 95 M | 12-768 | 50 Hz | 16K Hz | 30/10/2022 |
24
+
25
+ ## Explanation
26
+
27
+ The m-a-p models share the similar model architecture and the most distinguished difference is the paradigm in used pre-training. Other than that, there are several nuance technical configuration needs to know before using:
28
+
29
+ - **Model Size**: the number of parameters that would be loaded to memory. Please select the appropriate size fitting your hardware.
30
+ - **Transformer Layer-Dimension**: The number of transformer layers and the corresponding feature dimensions can be outputted from our model. This is marked out because features extracted by **different layers could have various performance depending on tasks**.
31
+ - **Feature Rate**: Given a 1-second audio input, the number of features output by the model.
32
+ - **Sample Rate**: The frequency of audio that the model is trained with.
33
+
34
+
35
+ # Introduction to Music2Vec
36
 
37
  **Music2Vec** is accepted as 2-page abstract in Late Breaking Demos (LBD) at the ISMIR 2022.
38
  It is a completely unsupervised model trained on 1000 hour music audios.