Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models Paper โข 2501.01423 โข Published 10 days ago โข 34
Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis Paper โข 2412.15322 โข Published 24 days ago โข 18
Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis Paper โข 2412.15322 โข Published 24 days ago โข 18
Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis Paper โข 2412.15322 โข Published 24 days ago โข 18 โข 2
Byte Latent Transformer: Patches Scale Better Than Tokens Paper โข 2412.09871 โข Published about 1 month ago โข 85
Putting the Object Back into Video Object Segmentation Paper โข 2310.12982 โข Published Oct 19, 2023