SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories Paper • 2503.08625 • Published 7 days ago • 24
X-Dancer: Expressive Music to Human Dance Video Generation Paper • 2502.17414 • Published 22 days ago • 11
MONSTER: Monash Scalable Time Series Evaluation Repository Paper • 2502.15122 • Published 26 days ago • 2
RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers Paper • 2502.15894 • Published 25 days ago • 20
VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing Paper • 2502.17258 • Published 22 days ago • 73
Beyond Release: Access Considerations for Generative AI Systems Paper • 2502.16701 • Published 23 days ago • 12
Slamming: Training a Speech Language Model on One GPU in a Day Paper • 2502.15814 • Published 27 days ago • 66
Forecasting Open-Weight AI Model Growth on Hugging Face Paper • 2502.15987 • Published 25 days ago • 10
Mobile-Agent-V: Learning Mobile Device Operation Through Video-Guided Multi-Agent Collaboration Paper • 2502.17110 • Published 22 days ago • 11
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment Paper • 2502.16894 • Published 23 days ago • 27
Benchmarking Temporal Reasoning and Alignment Across Chinese Dynasties Paper • 2502.16922 • Published 23 days ago • 7
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning Paper • 2502.17407 • Published 22 days ago • 24
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models Paper • 2502.16033 • Published 25 days ago • 16
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models Paper • 2502.16614 • Published 23 days ago • 24
ZipAR: Accelerating Autoregressive Image Generation through Spatial Locality Paper • 2412.04062 • Published Dec 5, 2024 • 9
Mimir: Improving Video Diffusion Models for Precise Text Understanding Paper • 2412.03085 • Published Dec 4, 2024 • 12