Spaces:
Running
on
L40S
title: LeVo Song Generation
emoji: 🎵
colorFrom: purple
colorTo: gray
sdk: docker
app_port: 7860
SongGeration:
This repository is the official code repository for LeVo: High-Quality Song Generation with Multi-Preference Alignment. You can find our paper on here. The demo page is available here.
In this repository, we provide the SongGeration model, inference scripts, and the checkpoint that has been trained on the Million Song Dataset. Specifically, we have released the model and inference code corresponding to the SFT + auto-DPO version.
Installation
Start from scatch
You can install the necessary dependencies using the requirements.txt
file with Python 3.8.12:
pip install -r requirements.txt
then install flash attention from wget
wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.2cxx11abiFALSE-cp310-cp310-linux_x86_64.whl -P /home/
pip install /home/flash_attn-2.7.4.post1+cu12torch2.2cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
Start with docker
docker pull juhayna/song-generation-levo:v0.1
docker run -it --gpus all --network=host juhayna/song-generation-levo:v0.1 /bin/bash
Inference
Please note that all the two folder below must be downloaded completely for the model to load correctly, which is sourced from here
- Save
ckpt
to the root directory - Save
third_party
to the root directory
Then run inference, use the following command:
sh generate.sh sample/lyric.jsonl sample/generate
Input keys in the
sample/lyric.jsonl
idx
: name of the generate song filedescriptions
: text description, can be None or specified gender, timbre, genre, mood, instrument and BPMprompt_audio_path
: reference audio path, can be None or 10s song audio pathgt_lyric
: lyrics, it needs to follow the format of '[Structure] Text', supported structures can be found inconf/vocab.yaml
Outputs of the loader
sample/generate
:audio
: generated audio filesjsonl
: output jsonlstoken
: Token corresponding to the generated audio files
Note
Since the model is trained based on data longer than 1 minute, if the given lyrics are too short, the model will automatically fill in the lyrics to extend the duration.
License
The code and weights in this repository is released under the MIT license as found in the LICENSE file.