OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement Paper • 2503.17352 • Published 5 days ago • 20
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse Paper • 2503.16365 • Published 6 days ago • 34
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models Paper • 2503.16419 • Published 6 days ago • 61
OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting Paper • 2503.08677 • Published 15 days ago • 27
Not-Just-Scaling Laws: Towards a Better Understanding of the Downstream Impact of Language Model Design Decisions Paper • 2503.03862 • Published 21 days ago • 1
VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control Paper • 2503.05639 • Published 19 days ago • 22
Unified Reward Model for Multimodal Understanding and Generation Paper • 2503.05236 • Published 19 days ago • 108
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published 20 days ago • 85
Predictive Data Selection: The Data That Predicts Is the Data That Teaches Paper • 2503.00808 • Published 24 days ago • 55
MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents Paper • 2503.01935 • Published 23 days ago • 25
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs Paper • 2503.01743 • Published 23 days ago • 77
How far can we go with ImageNet for Text-to-Image generation? Paper • 2502.21318 • Published 26 days ago • 25
Language Models' Factuality Depends on the Language of Inquiry Paper • 2502.17955 • Published 29 days ago • 32
Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation Paper • 2502.19414 • Published 28 days ago • 20
VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing Paper • 2502.17258 • Published 30 days ago • 77