Spaces:
Running
on
L40S
Running
on
L40S
Update README.md
Browse files
README.md
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
---
|
2 |
-
title:
|
3 |
emoji: 🎵
|
4 |
colorFrom: purple
|
5 |
colorTo: gray
|
@@ -8,56 +8,17 @@ app_port: 7860
|
|
8 |
---
|
9 |
|
10 |
|
11 |
-
|
|
|
|
|
12 |
|
13 |
-
This repository is the official
|
14 |
|
15 |
-
|
16 |
|
17 |
-
|
18 |
|
19 |
-
|
20 |
-
You can install the necessary dependencies using the `requirements.txt` file with Python 3.8.12:
|
21 |
-
|
22 |
-
```bash
|
23 |
-
pip install -r requirements.txt
|
24 |
-
```
|
25 |
-
|
26 |
-
then install flash attention from wget
|
27 |
-
|
28 |
-
```bash
|
29 |
-
wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.2cxx11abiFALSE-cp310-cp310-linux_x86_64.whl -P /home/
|
30 |
-
pip install /home/flash_attn-2.7.4.post1+cu12torch2.2cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
|
31 |
-
```
|
32 |
-
|
33 |
-
## Start with docker
|
34 |
-
```bash
|
35 |
-
docker pull juhayna/song-generation-levo:v0.1
|
36 |
-
docker run -it --gpus all --network=host juhayna/song-generation-levo:v0.1 /bin/bash
|
37 |
-
```
|
38 |
-
|
39 |
-
## Inference
|
40 |
-
|
41 |
-
Please note that all the two folder below must be downloaded completely for the model to load correctly, which is sourced from [here](https://huggingface.co/waytan22/SongGeneration)
|
42 |
-
|
43 |
-
- Save `ckpt` to the root directory
|
44 |
-
- Save `third_party` to the root directory
|
45 |
-
|
46 |
-
Then run inference, use the following command:
|
47 |
-
|
48 |
-
```bash
|
49 |
-
sh generate.sh sample/lyric.jsonl sample/generate
|
50 |
-
```
|
51 |
-
- Input keys in the `sample/lyric.jsonl`
|
52 |
-
- `idx`: name of the generate song file
|
53 |
-
- `descriptions`: text description, can be None or specified gender, timbre, genre, mood, instrument and BPM
|
54 |
-
- `prompt_audio_path`: reference audio path, can be None or 10s song audio path
|
55 |
-
- `gt_lyric`: lyrics, it needs to follow the format of '\[Structure\] Text', supported structures can be found in `conf/vocab.yaml`
|
56 |
-
|
57 |
-
- Outputs of the loader `sample/generate`:
|
58 |
-
- `audio`: generated audio files
|
59 |
-
- `jsonl`: output jsonls
|
60 |
-
- `token`: Token corresponding to the generated audio files
|
61 |
|
62 |
## Note
|
63 |
|
|
|
1 |
---
|
2 |
+
title: Song Generation
|
3 |
emoji: 🎵
|
4 |
colorFrom: purple
|
5 |
colorTo: gray
|
|
|
8 |
---
|
9 |
|
10 |
|
11 |
+
<p align="center">
|
12 |
+
<a href="https://levo-demo.github.io/">Demo</a> | <a href="https://arxiv.org/abs/2506.07520">Paper</a> | <a href="https://github.com/tencent-ailab/songgeneration">Code</a>
|
13 |
+
</p>
|
14 |
|
15 |
+
This repository is the official weight repository for LeVo: High-Quality Song Generation with Multi-Preference Alignment. In this repository, we provide the SongGeneration model, inference scripts, and the checkpoint that has been trained on the Million Song Dataset.
|
16 |
|
17 |
+
## Overview
|
18 |
|
19 |
+
We develop the SongGeneration model. It is an LM-based framework consisting of **LeLM** and a **music codec**. LeLM is capable of parallelly modeling two types of tokens: mixed tokens, which represent the combined audio of vocals and accompaniment to achieve vocal-instrument harmony, and dual-track tokens, which separately encode vocals and accompaniment for high-quality song generation. The music codec reconstructs the dual-track tokens into highfidelity music audio. SongGeneration significantly improves over the open-source music generation models and performs competitively with current state-of-the-art industry systems. For more details, please refer to our [paper](https://arxiv.org/abs/2506.07520).
|
20 |
|
21 |
+
<img src="https://github.com/tencent-ailab/songgeneration/blob/main/img/over.jpg?raw=true" alt="img" style="zoom:100%;" />
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
|
23 |
## Note
|
24 |
|