Submitted by lixiaochuan 79 DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation · 13 authors 2
Submitted by tellarin 56 Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills · 9 authors 2
Submitted by limuloo1999 37 DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models · 4 authors 3
Submitted by yyyyyyjjjjzzz 34 SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially? · 8 authors 3
Submitted by Orannue 23 Edit Transfer: Learning Image Editing via Vision In-Context Relations · 4 authors 6
Submitted by akhaliq 23 R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization · 7 authors 2
Submitted by ZyZcuhk 21 BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing · 9 authors 1
Submitted by jmhb 19 MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research · 23 authors 1
Submitted by Lingaaaaaaa 16 WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes · 8 authors 1
Submitted by ZhaofengWu 15 reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs · 6 authors 2
Submitted by lwpyh 11 V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning · 6 authors 2
Submitted by Luo-Yihong 9 Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation · 5 authors 1
Submitted by Buzz-lightyear 8 Long-Video Audio Synthesis with Multi-Agent Collaboration · 5 authors 3
Submitted by soarhigh 5 Sightation Counts: Leveraging Sighted User Feedback in Building a BLV-aligned Dataset of Diagram Descriptions · 7 authors 1
Submitted by k-nick 5 Error Analyses of Auto-Regressive Video Diffusion Models: A Unified Framework · 8 authors 2
Submitted by FQiao 3 GenStereo: Towards Open-World Generation of Stereo Images and Unsupervised Matching · 4 authors 3
Submitted by JesseTNRoberts 3 Investigating Human-Aligned Large Language Model Uncertainty · 4 authors 2
Submitted by Sckathach 3 Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models · 3 authors 2
Submitted by zxbsmk 2 WISA: World Simulator Assistant for Physics-Aware Text-to-Video Generation · 12 authors 2