Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control Paper • 2503.14492 • Published 7 days ago • 16
GenDec: A robust generative Question-decomposition method for Multi-hop reasoning Paper • 2402.11166 • Published Feb 17, 2024 • 1
Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills Paper • 2503.12533 • Published 9 days ago • 60
Open-World Skill Discovery from Unsegmented Demonstrations Paper • 2503.10684 • Published 14 days ago • 5
API Agents vs. GUI Agents: Divergence and Convergence Paper • 2503.11069 • Published 11 days ago • 31
MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research Paper • 2503.13399 • Published 8 days ago • 20
Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills Paper • 2503.12533 • Published 9 days ago • 60
Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills Paper • 2503.12533 • Published 9 days ago • 60 • 2
Pixel-Level Reasoning Segmentation via Multi-turn Conversations Paper • 2502.09447 • Published Feb 13 • 1
"Principal Components" Enable A New Language of Images Paper • 2503.08685 • Published 14 days ago • 11
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published 15 days ago • 95
SEA-VL: Multicultural VL Dataset for Southeast Asia Collection Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia • 3 items • Updated 13 days ago • 15
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark S Collection SEACrowd is a community movement project aimed at centralizing and standardizing AI resources for Southeast Asian languages, cultures, and/or regions. • 3 items • Updated Jun 18, 2024 • 8
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published 15 days ago • 95