Aymeric Roucher's picture

Aymeric Roucher

m-ric

AI & ML interests

Leading Agents at Hugging Face 🤗

Recent Activity

Organizations

Hugging Face's profile picture Orange's profile picture Atmos Bank's profile picture Hugging Test Lab's profile picture Tools's profile picture HuggingFaceM4's profile picture lecocqassociate's profile picture huggingPartyParis's profile picture Supreme's profile picture FactSet's profile picture Propulse Lab's profile picture Leaderboard Organization's profile picture FactSet's profile picture CGIAR's profile picture Aperture Laboratories's profile picture AI Energy Score's profile picture C&A's profile picture Social Post Explorers's profile picture Dev Mode Explorers's profile picture Agent Collab's profile picture SLLHF's profile picture Data Agents's profile picture Hugging Face Party @ PyTorch Conference's profile picture Hugging Face FineVideo's profile picture Nerdy Face's profile picture Hugging Face Science's profile picture Agents Leaderboard's profile picture smolagents's profile picture Hugging Face Agents Course's profile picture Open R1's profile picture SIMS's profile picture

Posts 95

view post
Post
754
Our new Agentic leaderboard is now live!💥

If you ever asked which LLM is best for powering agents, we've just made a leaderboard that ranks them all! Built with @albertvillanova , this ranks LLMs powering a smolagents CodeAgent on subsets of various benchmarks. ✅

🏆 GPT-4.5 comes on top, even beating reasoning models like DeepSeek-R1 or o1. And Claude-3.7-Sonnet is a close second!

The leaderboard also allows you to show the scores of vanilla LLMs (without any agentic setup) on the same benchmarks: this shows the huge improvements brought by agentic setups. 💪

(Note that results will be added manually, so the leaderboard might not always have the latest LLMs)

Articles 10

Article
30

Trace & Evaluate your Agent with Arize Phoenix