Xu Tan's picture

6 2 3

Xu Tan

xutan

·

tobyoup

AI & ML interests

None yet

Recent Activity

authored a paper 1 day ago

CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models

authored a paper 1 day ago

Memories are One-to-Many Mapping Alleviators in Talking Face Generation

authored a paper 1 day ago

DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder

View all activity

Organizations

xutan's activity

authored 6 papers 1 day ago

CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models

Paper • 2410.13267 • Published Oct 17, 2024 • 1

Memories are One-to-Many Mapping Alleviators in Talking Face Generation

Paper • 2212.05005 • Published Dec 9, 2022

DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder

Paper • 2303.17550 • Published Mar 30, 2023

Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

Paper • 2502.04128 • Published Feb 6 • 25

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

Paper • 2406.04321 • Published Jun 6, 2024

The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation

Paper • 2503.04606 • Published 7 days ago • 7

authored a paper 2 months ago

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

Paper • 2412.18619 • Published Dec 16, 2024 • 55

authored 9 papers 7 months ago

PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Dependent Adaptive Prior

Paper • 2106.06406 • Published Jun 11, 2021

MuPT: A Generative Symbolic Music Pretrained Transformer

Paper • 2404.06393 • Published Apr 9, 2024 • 16

D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models

Paper • 2406.01375 • Published Jun 3, 2024

InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation

Paper • 2405.15758 • Published May 24, 2024 • 1

E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS

Paper • 2406.18009 • Published Jun 26, 2024 • 23

EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms

Paper • 2406.14228 • Published Jun 20, 2024 • 1

Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement

Paper • 2406.08096 • Published Jun 12, 2024

Foundation Models for Music: A Survey

Paper • 2408.14340 • Published Aug 26, 2024 • 44

PromptTTS: Controllable Text-to-Speech with Text Descriptions

Paper • 2211.12171 • Published Nov 22, 2022

upvoted a paper 8 months ago

E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS

Paper • 2406.18009 • Published Jun 26, 2024 • 23

authored a paper 9 months ago

VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers

Paper • 2406.05370 • Published Jun 8, 2024 • 19

authored 2 papers 11 months ago

FlashSpeech: Efficient Zero-Shot Speech Synthesis

Paper • 2404.14700 • Published Apr 23, 2024 • 32

Mitigating Reversal Curse in Large Language Models via Semantic-aware Permutation Training

Paper • 2403.00758 • Published Mar 1, 2024 • 2