VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning Paper • 2506.09049 • Published Jun 10 • 36
RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics Paper • 2506.04308 • Published Jun 4 • 43
Position: Interactive Generative Video as Next-Generation Game Engine Paper • 2503.17359 • Published Mar 21 • 62
Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection Paper • 2412.04455 • Published Dec 5, 2024 • 39
VLSBench: Unveiling Visual Leakage in Multimodal Safety Paper • 2411.19939 • Published Nov 29, 2024 • 10
SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models Paper • 2402.05044 • Published Feb 7, 2024 • 2
FLAME: Factuality-Aware Alignment for Large Language Models Paper • 2405.01525 • Published May 2, 2024 • 29
InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions Paper • 2402.03040 • Published Feb 5, 2024 • 18