seungheondoh
commited on
Commit
•
a96e517
1
Parent(s):
31b2e83
Update README.md
Browse files
README.md
CHANGED
@@ -14,4 +14,34 @@ tags:
|
|
14 |
---
|
15 |
|
16 |
- **Repository:** [LP-MusicCaps repository](https://github.com/seungheondoh/lp-music-caps)
|
17 |
-
- **Paper:** [ArXiv
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
14 |
---
|
15 |
|
16 |
- **Repository:** [LP-MusicCaps repository](https://github.com/seungheondoh/lp-music-caps)
|
17 |
+
- **Paper:** [ArXiv](https://arxiv.org/abs/2307.16372)
|
18 |
+
|
19 |
+
# :sound: LP-MusicCaps: LLM-Based Pseudo Music Captioning
|
20 |
+
|
21 |
+
[![Demo Video](https://i.imgur.com/cgi8NsD.jpg)](https://youtu.be/ezwYVaiC-AM)
|
22 |
+
|
23 |
+
This is a implementation of [LP-MusicCaps: LLM-Based Pseudo Music Captioning](#). This project aims to generate captions for music. 1) Tag-to-Caption: Using existing tags, We leverage the power of OpenAI's GPT-3.5 Turbo API to generate high-quality and contextually relevant captions based on music tag. 2) Audio-to-Caption: Using music-audio and pseudo caption pairs, we train a cross-model encoder-decoder model for end-to-end music captioning
|
24 |
+
|
25 |
+
> [**LP-MusicCaps: LLM-Based Pseudo Music Captioning**](#)
|
26 |
+
> SeungHeon Doh, Keunwoo Choi, Jongpil Lee, Juhan Nam
|
27 |
+
> To appear ISMIR 2023
|
28 |
+
|
29 |
+
|
30 |
+
## TL;DR
|
31 |
+
|
32 |
+
|
33 |
+
<p align = "center">
|
34 |
+
<img src = "https://i.imgur.com/2LC0nT1.png">
|
35 |
+
</p>
|
36 |
+
|
37 |
+
- **[1.Tag-to-Caption: LLM Captioning](https://github.com/seungheondoh/lp-music-caps/tree/main/lpmc/llm_captioning)**: Generate caption from given tag input.
|
38 |
+
- **[2.Pretrain Music Captioning Model](https://github.com/seungheondoh/lp-music-caps/tree/main/lpmc/music_captioning)**: Generate pseudo caption from given audio.
|
39 |
+
- **[3.Transfer Music Captioning Model](https://github.com/seungheondoh/lp-music-caps/tree/main/lpmc/music_captioning/transfer.py)**: Generate human level caption from given audio.
|
40 |
+
|
41 |
+
## Open Source Material
|
42 |
+
|
43 |
+
- [pre-trained models](https://huggingface.co/seungheondoh/lp-music-caps)
|
44 |
+
- [music-pseudo caption dataset](https://huggingface.co/datasets/seungheondoh/LP-MusicCaps-MSD)
|
45 |
+
- [demo](https://huggingface.co/spaces/seungheondoh/LP-Music-Caps-demo)
|
46 |
+
|
47 |
+
are available online for future research. example of dataset in [notebook](https://github.com/seungheondoh/lp-music-caps/blob/main/notebook/Dataset.ipynb)
|