CLS-RL: Image Classification with Rule-Based Reinforcement Learning Paper • 2503.16188 • Published 7 days ago • 8 • 2
Improving Autoregressive Image Generation through Coarse-to-Fine Token Prediction Paper • 2503.16194 • Published 7 days ago • 6 • 2
MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification Paper • 2503.12505 • Published 11 days ago • 9 • 2
PEBench: A Fictitious Dataset to Benchmark Machine Unlearning for Multimodal Large Language Models Paper • 2503.12545 • Published 11 days ago • 5 • 2
ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges Paper • 2503.06553 • Published 18 days ago • 8 • 2
ARMOR v0.1: Empowering Autoregressive Multimodal Understanding Model with Interleaved Multimodal Generation via Asymmetric Synergy Paper • 2503.06542 • Published 18 days ago • 8 • 2
ZipAR: Accelerating Autoregressive Image Generation through Spatial Locality Paper • 2412.04062 • Published Dec 5, 2024 • 9 • 2
GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation Paper • 2411.18499 • Published Nov 27, 2024 • 18 • 2
ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification and KV Cache Compression Paper • 2410.08584 • Published Oct 11, 2024 • 12 • 3
T3M: Text Guided 3D Human Motion Synthesis from Speech Paper • 2408.12885 • Published Aug 23, 2024 • 13 • 2