Spaces:

tencent
/

SongGeneration

Running on L40S

App Files Files Community

SongGeneration / README.md

hainazhu

Add application file

258fd02 13 days ago

preview code

raw

history blame

2.55 kB

	---
	title: LeVo Song Generation
	emoji: 🎵
	colorFrom: purple
	colorTo: gray
	sdk: docker
	app_port: 7860
	---


	# SongGeration:

	This repository is the official code repository for LeVo: High-Quality Song Generation with Multi-Preference Alignment. You can find our paper on [here](https://arxiv.org/). The demo page is available [here](https://levo-demo.github.io/).

	In this repository, we provide the SongGeration model, inference scripts, and the checkpoint that has been trained on the Million Song Dataset. Specifically, we have released the model and inference code corresponding to the SFT + auto-DPO version.

	## Installation

	## Start from scatch
	You can install the necessary dependencies using the `requirements.txt` file with Python 3.8.12:

	```bash
	pip install -r requirements.txt
	```

	then install flash attention from wget

	```bash
	wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.2cxx11abiFALSE-cp310-cp310-linux_x86_64.whl -P /home/
	pip install /home/flash_attn-2.7.4.post1+cu12torch2.2cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
	```

	## Start with docker
	```bash
	docker pull juhayna/song-generation-levo:v0.1
	docker run -it --gpus all --network=host juhayna/song-generation-levo:v0.1 /bin/bash
	```

	## Inference

	Please note that all the two folder below must be downloaded completely for the model to load correctly, which is sourced from [here](https://huggingface.co/waytan22/SongGeneration)

	- Save `ckpt` to the root directory
	- Save `third_party` to the root directory

	Then run inference, use the following command:

	```bash
	sh generate.sh sample/lyric.jsonl sample/generate
	```
	- Input keys in the `sample/lyric.jsonl`
	- `idx`: name of the generate song file
	- `descriptions`: text description, can be None or specified gender, timbre, genre, mood, instrument and BPM
	- `prompt_audio_path`: reference audio path, can be None or 10s song audio path
	- `gt_lyric`: lyrics, it needs to follow the format of '\[Structure\] Text', supported structures can be found in `conf/vocab.yaml`

	- Outputs of the loader `sample/generate`:
	- `audio`: generated audio files
	- `jsonl`: output jsonls
	- `token`: Token corresponding to the generated audio files

	## Note

	Since the model is trained based on data longer than 1 minute, if the given lyrics are too short, the model will automatically fill in the lyrics to extend the duration.

	## License

	The code and weights in this repository is released under the MIT license as found in the [LICENSE](LICENSE) file.