Impossible to solve

#5
by MartinSeeler - opened

It seems like the AI based evaluation has to be less strict. It expects so much details from the unknown reference implementation, that it is impossible to submit a valid solution. Here is an example from my Task 2. It said it was wrong because of wrong tool order, slightly different name, that I used CodeAgent (which was part of the exercise LOL) instead of ToolCallingAgent etc.

from smolagents import ToolCallingAgent, CodeAgent, DuckDuckGoSearchTool, HfApiModel, VisitWebpageTool

model = HfApiModel(model_id="Qwen/Qwen2.5-Coder-32B-Instruct")

web_agent = ToolCallingAgent(
    tools=[DuckDuckGoSearchTool(), VisitWebpageTool()],
    model=model,
    max_steps=10,
    name="web_agent",
    description="Browses the web to find information",
)

manager_agent = CodeAgent(
  model=model,
  managed_agents=[web_agent],
  planning_interval=5,
  max_steps=15,
)

❌ Incorrect! The student's solution contains a few discrepancies from the reference solution: 1. Import Mismatch: The student imports DuckDuckGoSearchTool and VisitWebpageTool, whereas the reference solution imports DuckDuckGoSearchTool and visit_webpage. The visit_webpage function should be correctly imported as shown in the reference solution. 2. Manager Agent Configuration: The student uses CodeAgent for the manager agent, whereas the reference solution explicitly states to use ToolCallingAgent. The ToolCallingAgent is likely the correct choice as per the example. 3. Description Clarity: In the student's solution, the web agent's description is slightly different ("Browses the web to find information") compared to the reference ("Runs web searches for you."). This is a minor point but should still be aligned for consistency and clarity. 4. Name Consistency: The student names the web agent "web_agent" instead of just "search" as in the reference. This discrepancy is minor but indicating less adherence to the provided naming conventions. 5. Planning Interval: The student introduces a planning interval of 5 for the manager agent, which is not mentioned in the reference solution. This is an extra detail not requested in the challenge. While the student's solution generally aligns closely, these points of difference indicate it is not strictly functionally equivalent to the reference solution. The student's solution contains a few discrepancies from the reference solution: 1. Import Mismatch: The student imports DuckDuckGoSearchTool and VisitWebpageTool, whereas the reference solution imports DuckDuckGoSearchTool and visit_webpage. The visit_webpage function should be correctly imported as shown in the reference solution. 2. Manager Agent Configuration: The student uses CodeAgent for the manager agent, whereas the reference solution explicitly states to use ToolCallingAgent. The ToolCallingAgent is likely the correct choice as per the example. 3. Description Clarity: In the student's solution, the web agent's description is slightly different ("Browses the web to find information") compared to the reference ("Runs web searches for you."). This is a minor point but should still be aligned for consistency and clarity. 4. Name Consistency: The student names the web agent "web_agent" instead of just "search" as in the reference. This discrepancy is minor but indicating less adherence to the provided naming conventions. 5. Planning Interval: The student introduces a planning interval of 5 for the manager agent, which is not mentioned in the reference solution. This is an extra detail not requested in the challenge. While the student's solution generally aligns closely, these points of difference indicate it is not strictly functionally equivalent to the reference solution.

Sign up or log in to comment