Running 532 532 Scaling test-time compute π Enhance math problem solving by scaling test-time compute
Running 543 543 Vision Arena (Testing VLMs side-by-side) πΌ Analyze images to detect and label objects