unit2_smolagents_quiz

Running

The evaluator was a painful (and long) lesson in how poor LLM agents can be

by hardknee - opened Feb 27

Feb 27

Way way too strict in requiring answers match the reference verbatim. Incapable of recognising that code using different naming etc' is also correct answer. Many examples but,

#2 insists on creating a new visit_webpage tool when importing VisitWebPage is just as correct.
#5 rejects answer for referencing anthropic/claude-3.5-sonnet instead of anthropic/claude-3-sonnet.

Consistently inconsistent: previously accepted answers later rejected.
Reference files are incorrect: #3 only accepts an answer that is incorrect and inconsistent with smolagents documentation. No smolagents import named E2BSandbox. Rejected correct example given in both smolagents and E2B documentation.

Aizdes

Feb 28

•

edited Feb 28

Same problem. I don't understand what is E2BSandbox. #3 seems unsolvable.

cristuf

Feb 28

Same here, the matching should allowed for moore flexibility.
There are errors given to the candidate like: "too strict requirement for tooling" in question # 4. I don't even understand...
E2BSandbox seems unsolvable by looking at the HF blog, smolagents source code which seems a bit hard to satisfy...

GatinhoEducado

Mar 1

•

edited Mar 2

The answer that I get heuristically is:
from smolagents import CodeAgent, E2BSandbox agent = CodeAgent( tools=[], model=model, sandbox=E2BSandbox(), additional_authorized_imports=["numpy"] )
This sandbox parameter doesn't even mentioned in documentation LOL.

connorads

Mar 2

I've also given up. E2B sandbox seems to have a different API according to the docs. Which doesn't pass the quiz.
https://huggingface.co/docs/smolagents/main/tutorials/secure_code_execution

from smolagents import HfApiModel, CodeAgent

agent = CodeAgent(model=HfApiModel(), tools=[], executor_type="e2b")

agent.run("Can you give me the 100th Fibonacci number?")

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment