Spaces:
Running
on
Zero
Running
on
Zero
File size: 2,798 Bytes
ee0998f eb70ae1 ecc971f ee0998f 4401dfb 1bbd78d 4401dfb 1bbd78d 4401dfb 27deb4c 4401dfb 1bbd78d 4401dfb 1bbd78d 4401dfb 1bbd78d 4401dfb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
---
title: SonicVerse
emoji: 🖼
colorFrom: purple
colorTo: red
sdk: gradio
sdk_version: 5.25.2
app_file: app.py
pinned: false
---
# 🎼 SonicVerse
An interactive demo for **SonicVerse**, a music captioning model, allowing users to input audio and generate a natural language caption
that includes a general description of the music as well as music features such as key, instruments, genre, mood / theme, vocals gender.
The demo supports both short (10s) and long (up to 1 minute) audio inputs.
---
## 🚀 Demo
Check out the live Space here:
[](https://huggingface.co/spaces/amaai-lab/SonicVerse)
---
## 🚀 Samples
Short captions and long chained LLM-generated captions:
➡️ [Samples page](https://amaai-lab.github.io/SonicVerse/)
---
## 📦 Features
✅ Upload a 10 second music clip and get a caption
✅ Upload a long music clip (upto 1 minute for successful demo) to get a long detailed caption for the whole music clip.
✅ Captions include musical attributes (key, instruments, tempo, etc.)
⚠️ You can upload audio of any length, but due to compute limitations on Hugging Face Spaces, we recommend uploading clips under **30 seconds** unless you have a **Hugging Face Pro account** or run the app locally.
---
## 🛠️ How to Run Locally
```bash
# Clone the repo
git clone https://github.com/AMAAI-Lab/SonicVerse
cd SonicVerse
# Install dependencies
pip install -r requirements.txt
# Alternatively, set up conda environment
conda env create -f environment.yml
conda activate sonicverse
# Run the app
python app.py
```
---
## 💡 Usage
To use the app:
1. Select audio clip to input
2. Click the **Generate** button.
3. See the model’s output below.
---
## 📜 Citation
If you use SonicVerse in your work, please cite our paper:
**SonicVerse: Multi-Task Learning for Music Feature-Informed Captioning**
Anuradha Chopra, Abhinaba Roy, Dorien Herremans
Accepted to AIMC 2025
```bibtex
@article{chopra2025sonicverse,
title={SonicVerse: Multi-Task Learning for Music Feature-Informed Captioning},
author={Chopra, Anuradha and Roy, Abhinaba and Herremans, Dorien},
journal={Proceedings of the 6th Conference on AI Music Creativity (AIMC 2025)},
year={2025},
address={Brussels, Belgium},
month={September},
url={https://arxiv.org/abs/2506.15154},
}
```
Read the paper here: [arXiv:2506.15154](https://arxiv.org/abs/2506.15154)
DOI: [10.48550/arXiv.2506.15154](https://doi.org/10.48550/arXiv.2506.15154)
---
## 🧹 Built With
- [Hugging Face Spaces](https://huggingface.co/spaces)
- [Gradio](https://gradio.app/)
- [Mistral 7B](https://huggingface.co/mistralai/Mistral-7B-v0.1)
- [MERT 95M](https://huggingface.co/m-a-p/MERT-v1-95M)
---
|