view article Article LeRobot goes to driving school: World’s largest open-source self-driving dataset 3 days ago • 38
GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents Paper • 2406.10819 • Published Jun 16, 2024 • 1
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices Paper • 2406.08451 • Published Jun 12, 2024 • 25
UI-TARS: Pioneering Automated GUI Interaction with Native Agents Paper • 2501.12326 • Published Jan 21 • 54
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V Paper • 2310.11441 • Published Oct 17, 2023 • 28
view article Article Introducing Three New Serverless Inference Providers: Hyperbolic, Nebius AI Studio, and Novita 🔥 24 days ago • 93
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 203
Towards Retrieval Augmented Generation over Large Video Libraries Paper • 2406.14938 • Published Jun 21, 2024 • 21
view article Article π0 and π0-FAST: Vision-Language-Action Models for General Robot Control Feb 4 • 111
view article Article Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial By open-r1 • Jan 31 • 43