Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models Paper • 2505.24164 • Published May 30
UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions Paper • 2506.13691 • Published Jun 16 • 2
Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology Paper • 2507.07999 • Published Jul 10 • 47
Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology Paper • 2507.07999 • Published Jul 10 • 47
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning Paper • 2506.24119 • Published Jun 30 • 48
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published Jul 1 • 226
VMoBA: Mixture-of-Block Attention for Video Diffusion Models Paper • 2506.23858 • Published Jun 30 • 31
Towards Semantic Equivalence of Tokenization in Multimodal LLM Paper • 2406.05127 • Published Jun 7, 2024
So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection Paper • 2505.18660 • Published May 24 • 1