DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation Paper • 2412.18597 • Published about 22 hours ago • 10
3DGS-Enhancer: Enhancing Unbounded 3D Gaussian Splatting with View-consistent 2D Diffusion Priors Paper • 2410.16266 • Published Oct 21 • 4
EvoPress: Towards Optimal Dynamic Model Compression via Evolutionary Search Paper • 2410.14649 • Published Oct 18 • 8
MiniPLM: Knowledge Distillation for Pre-Training Language Models Paper • 2410.17215 • Published Oct 22 • 14
xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs Paper • 2410.16267 • Published Oct 21 • 17
Mitigating Object Hallucination via Concentric Causal Attention Paper • 2410.15926 • Published Oct 21 • 16
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction Paper • 2410.17247 • Published Oct 22 • 45
SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes Paper • 2410.17249 • Published Oct 22 • 41
JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation Paper • 2410.17250 • Published Oct 22 • 14
Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward Passes Paper • 2410.16930 • Published Oct 22 • 6
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper • 2412.09596 • Published 13 days ago • 90