Video-to-Video
File size: 775 Bytes
51ea4a6
 
 
 
 
 
b72d99a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
---
license: apache-2.0
---

This repository contains the weights of  [ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing](https://arxiv.org/abs/2506.21448).

Project Paper: https://thinksound-project.github.io/.

If you find our work useful, please cite our paper: 

```bibtex
@misc{liu2025thinksoundchainofthoughtreasoningmultimodal,
    title={ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing}, 
    author={Huadai Liu and Jialei Wang and Kaicheng Luo and Wen Wang and Qian Chen and Zhou Zhao and Wei Xue},
    year={2025},
    eprint={2506.21448},
    archivePrefix={arXiv},
    primaryClass={eess.AS},
    url={https://arxiv.org/abs/2506.21448},   
}
```