{ "cells": [ { "metadata": {}, "cell_type": "markdown", "source": [ "# Agent\n", "\n", "In this notebook, **we're going to build a simple agent using using LangGraph**.\n", "\n", "This notebook is part of the Hugging Face Agents Course, a free course from beginner to expert, where you learn to build Agents.\n", "\n", "![Agents course share](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/communication/share.png)\n", "\n", "As seen in the Unit 1, an agent needs 3 steps as introduced in the ReAct architecture :\n", "[ReAct](https://react-lm.github.io/), a general agent architecture.\n", "\n", "* `act` - let the model call specific tools\n", "* `observe` - pass the tool output back to the model\n", "* `reason` - let the model reason about the tool output to decide what to do next (e.g., call another tool or just respond directly)\n", "\n", "\n", "![Agent](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/LangGraph/Agent.png)" ], "id": "89791f21c171372a" }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": "%pip install -q -U langchain_openai langchain_core langgraph", "id": "bef6c5514bd263ce" }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": [ "import os\n", "\n", "# Please setp your own key.\n", "os.environ[\"OPENAI_API_KEY\"] = \"sk-xxxxxx\"" ], "id": "61d0ed53b26fa5c6" }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": [ "import base64\n", "from langchain_core.messages import HumanMessage\n", "from langchain_openai import ChatOpenAI\n", "\n", "vision_llm = ChatOpenAI(model=\"gpt-4o\")\n", "\n", "\n", "def extract_text(img_path: str) -> str:\n", " \"\"\"\n", " Extract text from an image file using a multimodal model.\n", "\n", " Args:\n", " img_path: A local image file path (strings).\n", "\n", " Returns:\n", " A single string containing the concatenated text extracted from each image.\n", " \"\"\"\n", " all_text = \"\"\n", " try:\n", "\n", " # Read image and encode as base64\n", " with open(img_path, \"rb\") as image_file:\n", " image_bytes = image_file.read()\n", "\n", " image_base64 = base64.b64encode(image_bytes).decode(\"utf-8\")\n", "\n", " # Prepare the prompt including the base64 image data\n", " message = [\n", " HumanMessage(\n", " content=[\n", " {\n", " \"type\": \"text\",\n", " \"text\": (\n", " \"Extract all the text from this image. \"\n", " \"Return only the extracted text, no explanations.\"\n", " ),\n", " },\n", " {\n", " \"type\": \"image_url\",\n", " \"image_url\": {\n", " \"url\": f\"data:image/png;base64,{image_base64}\"\n", " },\n", " },\n", " ]\n", " )\n", " ]\n", "\n", " # Call the vision-capable model\n", " response = vision_llm.invoke(message)\n", "\n", " # Append extracted text\n", " all_text += response.content + \"\\n\\n\"\n", "\n", " return all_text.strip()\n", " except Exception as e:\n", " # You can choose whether to raise or just return an empty string / error message\n", " error_msg = f\"Error extracting text: {str(e)}\"\n", " print(error_msg)\n", " return \"\"\n", "\n", "\n", "llm = ChatOpenAI(model=\"gpt-4o\")\n", "\n", "\n", "def divide(a: int, b: int) -> float:\n", " \"\"\"Divide a and b.\"\"\"\n", " return a / b\n", "\n", "\n", "tools = [\n", " divide,\n", " extract_text\n", "]\n", "llm_with_tools = llm.bind_tools(tools, parallel_tool_calls=False)" ], "id": "a4a8bf0d5ac25a37" }, { "metadata": {}, "cell_type": "markdown", "source": "Let's create our LLM and prompt it with the overall desired agent behavior.", "id": "3e7c17a2e155014e" }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": [ "from typing import TypedDict, Annotated, Optional\n", "from langchain_core.messages import AnyMessage\n", "from langgraph.graph.message import add_messages\n", "\n", "\n", "class AgentState(TypedDict):\n", " # The input document\n", " input_file: Optional[str] # Contains file path, type (PNG)\n", " messages: Annotated[list[AnyMessage], add_messages]" ], "id": "f31250bc1f61da81" }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": [ "from langchain_core.messages import HumanMessage, SystemMessage\n", "from langchain_core.utils.function_calling import convert_to_openai_tool\n", "\n", "\n", "def assistant(state: AgentState):\n", " # System message\n", " textual_description_of_tool = \"\"\"\n", "extract_text(img_path: str) -> str:\n", " Extract text from an image file using a multimodal model.\n", "\n", " Args:\n", " img_path: A local image file path (strings).\n", "\n", " Returns:\n", " A single string containing the concatenated text extracted from each image.\n", "divide(a: int, b: int) -> float:\n", " Divide a and b\n", "\"\"\"\n", " image = state[\"input_file\"]\n", " sys_msg = SystemMessage(content=f\"You are an helpful agent that can analyse some images and run some computatio without provided tools :\\n{textual_description_of_tool} \\n You have access to some otpional images. Currently the loaded images is : {image}\")\n", "\n", " return {\"messages\": [llm_with_tools.invoke([sys_msg] + state[\"messages\"])], \"input_file\": state[\"input_file\"]}" ], "id": "3c4a736f9e55afa9" }, { "metadata": {}, "cell_type": "markdown", "source": [ "We define a `tools` node with our list of tools.\n", "\n", "The `assistant` node is just our model with bound tools.\n", "\n", "We create a graph with `assistant` and `tools` nodes.\n", "\n", "We add `tools_condition` edge, which routes to `End` or to `tools` based on whether the `assistant` calls a tool.\n", "\n", "Now, we add one new step:\n", "\n", "We connect the `tools` node *back* to the `assistant`, forming a loop.\n", "\n", "* After the `assistant` node executes, `tools_condition` checks if the model's output is a tool call.\n", "* If it is a tool call, the flow is directed to the `tools` node.\n", "* The `tools` node connects back to `assistant`.\n", "* This loop continues as long as the model decides to call tools.\n", "* If the model response is not a tool call, the flow is directed to END, terminating the process." ], "id": "6f1efedd943d8b1d" }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": [ "from langgraph.graph import START, StateGraph\n", "from langgraph.prebuilt import ToolNode, tools_condition\n", "from IPython.display import Image, display\n", "\n", "# Graph\n", "builder = StateGraph(AgentState)\n", "\n", "# Define nodes: these do the work\n", "builder.add_node(\"assistant\", assistant)\n", "builder.add_node(\"tools\", ToolNode(tools))\n", "\n", "# Define edges: these determine how the control flow moves\n", "builder.add_edge(START, \"assistant\")\n", "builder.add_conditional_edges(\n", " \"assistant\",\n", " # If the latest message (result) from assistant is a tool call -> tools_condition routes to tools\n", " # If the latest message (result) from assistant is a not a tool call -> tools_condition routes to END\n", " tools_condition,\n", ")\n", "builder.add_edge(\"tools\", \"assistant\")\n", "react_graph = builder.compile()\n", "\n", "# Show\n", "display(Image(react_graph.get_graph(xray=True).draw_mermaid_png()))" ], "id": "e013061de784638a" }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": [ "messages = [HumanMessage(content=\"Divide 6790 by 5\")]\n", "\n", "messages = react_graph.invoke({\"messages\": messages, \"input_file\": None})" ], "id": "d3b0ba5be1a54aad" }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": [ "for m in messages['messages']:\n", " m.pretty_print()" ], "id": "55eb0f1afd096731" }, { "metadata": {}, "cell_type": "markdown", "source": [ "## Training program\n", "MR Wayne left a note with his training program for the week. I came up with a recipe for dinner leaft in a note.\n", "\n", "you can find the document [HERE](https://huggingface.co/datasets/agents-course/course-images/blob/main/en/unit2/LangGraph/Batman_training_and_meals.png), so download it and upload it in the local folder.\n", "\n", "![Training](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/LangGraph/Batman_training_and_meals.png)" ], "id": "e0062c1b99cb4779" }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": [ "messages = [HumanMessage(content=\"According the note provided by MR wayne in the provided images. What's the list of items I should buy for the dinner menu ?\")]\n", "\n", "messages = react_graph.invoke({\"messages\": messages, \"input_file\": \"Batman_training_and_meals.png\"})" ], "id": "2e166ebba82cfd2a" }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": [ "for m in messages['messages']:\n", " m.pretty_print()" ], "id": "5bfd67af70b7dcf3" }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": "", "id": "8cd664ab5ee5450e" } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.5" } }, "nbformat": 4, "nbformat_minor": 5 }