Submitted by zhoutianyi 46 C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing · 3 authors 2
Submitted by Lin-Chen 36 VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning · 10 authors 2
Submitted by lzyhha 33 VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning · 8 authors 2
Submitted by akhaliq 14 Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models · 6 authors 2
Submitted by salmannyu 14 MOSAIC: Modeling Social AI for Content Dissemination and Regulation in Multi-Agent Simulations · 5 authors 2
Submitted by russwang 11 SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement · 9 authors 2
Submitted by Franck-Dernoncourt 7 Towards Visual Text Grounding of Multimodal Large Language Model · 9 authors 2
Submitted by jzr99 2 Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction · 5 authors 2
Submitted by RishubhPar 2 MonoPlace3D: Learning 3D-Aware Object Placement for 3D Monocular Detection · 5 authors 2
Submitted by RishubhPar 2 Compass Control: Multi Object Orientation Control for Text-to-Image Generation · 4 authors 2