Spaces:

amaai-lab
/

SonicVerse

Running on Zero

App Files Files Community

SonicVerse / README.md

annabeth97c

Update README.md

1bbd78d verified 7 days ago

preview code

raw

history blame contribute delete

2.8 kB

	---
	title: SonicVerse
	emoji: 🖼
	colorFrom: purple
	colorTo: red
	sdk: gradio
	sdk_version: 5.25.2
	app_file: app.py
	pinned: false
	---

	# 🎼 SonicVerse

	An interactive demo for SonicVerse, a music captioning model, allowing users to input audio and generate a natural language caption
	that includes a general description of the music as well as music features such as key, instruments, genre, mood / theme, vocals gender.

	The demo supports both short (10s) and long (up to 1 minute) audio inputs.

	---

	## 🚀 Demo

	Check out the live Space here:
	[![Hugging Face Space](https://img.shields.io/badge/HuggingFace-Space-blue?logo=huggingface)](https://huggingface.co/spaces/amaai-lab/SonicVerse)

	---

	## 🚀 Samples

	Short captions and long chained LLM-generated captions:
	➡️ [Samples page](https://amaai-lab.github.io/SonicVerse/)

	---

	## 📦 Features

	✅ Upload a 10 second music clip and get a caption

	✅ Upload a long music clip (upto 1 minute for successful demo) to get a long detailed caption for the whole music clip.

	✅ Captions include musical attributes (key, instruments, tempo, etc.)

	⚠️ You can upload audio of any length, but due to compute limitations on Hugging Face Spaces, we recommend uploading clips under 30 seconds unless you have a Hugging Face Pro account or run the app locally.

	---

	## 🛠️ How to Run Locally

	```bash
	# Clone the repo
	git clone https://github.com/AMAAI-Lab/SonicVerse
	cd SonicVerse

	# Install dependencies
	pip install -r requirements.txt

	# Alternatively, set up conda environment
	conda env create -f environment.yml
	conda activate sonicverse

	# Run the app
	python app.py
	```

	---

	## 💡 Usage

	To use the app:
	1. Select audio clip to input
	2. Click the Generate button.
	3. See the model’s output below.

	---

	## 📜 Citation

	If you use SonicVerse in your work, please cite our paper:

	SonicVerse: Multi-Task Learning for Music Feature-Informed Captioning
	Anuradha Chopra, Abhinaba Roy, Dorien Herremans
	Accepted to AIMC 2025

	```bibtex
	@article{chopra2025sonicverse,
	title={SonicVerse: Multi-Task Learning for Music Feature-Informed Captioning},
	author={Chopra, Anuradha and Roy, Abhinaba and Herremans, Dorien},
	journal={Proceedings of the 6th Conference on AI Music Creativity (AIMC 2025)},
	year={2025},
	address={Brussels, Belgium},
	month={September},
	url={https://arxiv.org/abs/2506.15154},
	}
	```

	Read the paper here: [arXiv:2506.15154](https://arxiv.org/abs/2506.15154)
	DOI: [10.48550/arXiv.2506.15154](https://doi.org/10.48550/arXiv.2506.15154)

	---

	## 🧹 Built With

	- [Hugging Face Spaces](https://huggingface.co/spaces)
	- [Gradio](https://gradio.app/)
	- [Mistral 7B](https://huggingface.co/mistralai/Mistral-7B-v0.1)
	- [MERT 95M](https://huggingface.co/m-a-p/MERT-v1-95M)
	---