Papers
arxiv:2411.04646

DanceFusion: A Spatio-Temporal Skeleton Diffusion Transformer for Audio-Driven Dance Motion Reconstruction

Published on Nov 7, 2024
Authors:
,

Abstract

This paper introduces DanceFusion, a novel framework for reconstructing and generating dance movements synchronized to music, utilizing a Spatio-Temporal Skeleton Diffusion Transformer. The framework adeptly handles incomplete and noisy skeletal data common in short-form dance videos on social media platforms like TikTok. DanceFusion incorporates a hierarchical Transformer-based Variational Autoencoder (VAE) integrated with a diffusion model, significantly enhancing motion realism and accuracy. Our approach introduces sophisticated masking techniques and a unique iterative diffusion process that refines the motion sequences, ensuring high fidelity in both motion generation and synchronization with accompanying audio cues. Comprehensive evaluations demonstrate that DanceFusion surpasses existing methods, providing state-of-the-art performance in generating dynamic, realistic, and stylistically diverse dance motions. Potential applications of this framework extend to content creation, virtual reality, and interactive entertainment, promising substantial advancements in automated dance generation. Visit our project page at https://th-mlab.github.io/DanceFusion/.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2411.04646 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2411.04646 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2411.04646 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.