Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models Paper • 2508.09138 • Published 11 days ago • 34
Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO Paper • 2505.21457 • Published May 27 • 14
Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration Paper • 2505.20256 • Published May 26 • 17
Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models Paper • 2505.18536 • Published May 24 • 19
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories Paper • 2503.08625 • Published Mar 11 • 27
X-Dancer: Expressive Music to Human Dance Video Generation Paper • 2502.17414 • Published Feb 24 • 14
RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers Paper • 2502.15894 • Published Feb 21 • 20
VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing Paper • 2502.17258 • Published Feb 24 • 80
Beyond Release: Access Considerations for Generative AI Systems Paper • 2502.16701 • Published Feb 23 • 16
Slamming: Training a Speech Language Model on One GPU in a Day Paper • 2502.15814 • Published Feb 19 • 70
Mobile-Agent-V: Learning Mobile Device Operation Through Video-Guided Multi-Agent Collaboration Paper • 2502.17110 • Published Feb 24 • 13
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment Paper • 2502.16894 • Published Feb 24 • 31
Benchmarking Temporal Reasoning and Alignment Across Chinese Dynasties Paper • 2502.16922 • Published Feb 24 • 8
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning Paper • 2502.17407 • Published Feb 24 • 26
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models Paper • 2502.16033 • Published Feb 22 • 18
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models Paper • 2502.16614 • Published Feb 23 • 27