Submitted by Elizaveta 50 When Less is Enough: Adaptive Token Reduction for Efficient Image Representation · 3 authors 1
Submitted by VentureZJ 43 MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving · 9 authors 1
Submitted by VentureZJ 36 MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization · 6 authors 1
Submitted by IranQin 30 RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints · 8 authors 1
Submitted by Epiphqny 24 Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation · 7 authors 3
Submitted by ydeng9 16 OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement · 6 authors 1
Submitted by akhaliq 13 Modifying Large Language Model Post-Training for Diverse Creative Writing · 5 authors 1
Submitted by akhaliq 8 TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting · 7 authors 1
Submitted by Guan123 8 ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering · 8 authors 1
Submitted by JacobYuan 8 MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems · 8 authors 2
Submitted by akhaliq 6 FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models · 7 authors 2
Submitted by hitsmy 6 From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration · 4 authors 1
Submitted by ChengmingX 5 When Preferences Diverge: Aligning Diffusion Models with Minority-Aware Adaptive DPO · 8 authors 1
Submitted by ZhaochongAn 5 Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model · 7 authors 1