Sound Source Localization is All about Cross-Modal Alignment Paper • 2309.10724 • Published Sep 19, 2023
LaughTalk: Expressive 3D Talking Head Generation with Laughter Paper • 2311.00994 • Published Nov 2, 2023
SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models Paper • 2312.09818 • Published Dec 15, 2023
Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering Paper • 2312.11360 • Published Dec 18, 2023 • 1
FedPara: Low-Rank Hadamard Product for Communication-Efficient Federated Learning Paper • 2108.06098 • Published Aug 13, 2021 • 2
TextManiA: Enriching Visual Feature by Text-driven Manifold Augmentation Paper • 2307.14611 • Published Jul 27, 2023
Noise Map Guidance: Inversion with Spatial Context for Real Image Editing Paper • 2402.04625 • Published Feb 7, 2024
Object-Centric Domain Randomization for 3D Shape Reconstruction in the Wild Paper • 2403.14539 • Published Mar 21, 2024
Cross-Attention of Disentangled Modalities for 3D Human Mesh Recovery with Transformers Paper • 2207.13820 • Published Jul 27, 2022
Scratching Visual Transformer's Back with Uniform Attention Paper • 2210.08457 • Published Oct 16, 2022
MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset Paper • 2406.14272 • Published Jun 20, 2024
Contextually Customized Video Summaries via Natural Language Paper • 1702.01528 • Published Feb 6, 2017
BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models Paper • 2407.13442 • Published Jul 18, 2024
Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert Paper • 2407.01034 • Published Jul 1, 2024
DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding Paper • 2411.19527 • Published Nov 29, 2024 • 10