Spaces:
Running
Running
A newer version of the Gradio SDK is available:
5.23.3
metadata
title: DeepSound-V1
colorFrom: blue
colorTo: indigo
sdk: gradio
app_file: app.py
pinned: false
DeepSound-V1: Start to Think Step-by-Step in the Audio Generation from Videos
Highlight
DeepSound-V1 is a framework enabling audio generation from videos towards initial step-by-step thinking without extra annotations based on the internal chain-of-thought (CoT) of Multi-modal large language model(MLLM).
Installation
conda create -n deepsound-v1 python=3.10.16 -y
conda activate deepsound-v1
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu120
pip install flash-attn==2.5.8 --no-build-isolation
pip install -e .
pip install -r reqirments.txt
Demo
Pretrained models
See MODELS.md.
Command-line interface
With demo.py
python demo.py -i <video_path>
All training parameters are here.
Evaluation
Refer av-benchmark for benchmarking results. See EVAL.md.
Citation
Relevant Repositories
- av-benchmark for benchmarking results.
Acknowledgement
Many thanks to: