MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures Paper • 2410.13754 • Published Oct 17, 2024 • 75
Harnessing Webpage UIs for Text-Rich Visual Understanding Paper • 2410.13824 • Published Oct 17, 2024 • 30
Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents Paper • 2403.02502 • Published Mar 4, 2024 • 3
Running on CPU Upgrade 12.4k 12.4k Open LLM Leaderboard 🏆 Track, rank and evaluate open LLMs and chatbots