Spaces:
Build error
CodeAct Agent Framework
This folder is an implementation of OpenHands's main agent, the CodeAct Agent. It is based on (CodeAct, tweet), an idea of consolidating LLM agents' actions into a unified code action space for both simplicity and performance.
Overview
The CodeAct agent operates through a function calling interface. At each turn, the agent can:
- Converse: Communicate with humans in natural language to ask for clarification, confirmation, etc.
- CodeAct: Execute actions through a set of well-defined tools:
- Execute Linux
bash
commands withexecute_bash
- Run Python code in an IPython environment with
execute_ipython_cell
- Interact with web browsers using
browser
andfetch
- Edit files using
str_replace_editor
oredit_file
- Execute Linux
Built-in Tools
The agent provides several built-in tools:
1. execute_bash
- Execute any valid Linux bash command
- Handles long-running commands by running them in background with output redirection
- Supports interactive processes with STDIN input and process interruption
- Handles command timeouts with automatic retry in background mode
2. execute_ipython_cell
- Run Python code in an IPython environment
- Supports magic commands like
%pip
- Variables are scoped to the IPython environment
- Requires defining variables and importing packages before use
3. web_read
and browser
web_read
: Read and convert webpage content to markdownbrowser
: Interact with webpages through Python code- Supports common browser actions like navigation, clicking, form filling, scrolling
- Handles file uploads and drag-and-drop operations
4. str_replace_editor
- View, create and edit files through string replacement
- Persistent state across command calls
- File viewing with line numbers
- String replacement with exact matching
- Undo functionality for edits
5. edit_file
(LLM-based)
- Edit files using LLM-based content generation
- Support for partial file edits with line ranges
- Handles large files by editing specific sections
- Append mode for adding content to files
Configuration
Tools can be enabled/disabled through configuration parameters:
enable_browsing
: Enable browser interaction toolsenable_jupyter
: Enable IPython code executionenable_llm_editor
: Enable LLM-based file editing (falls back to string replacement editor if disabled)
Micro-agents
The agent includes specialized micro-agents for specific tasks:
- npm: Handles npm package installation with non-interactive shell workarounds
- github: Manages GitHub operations with API token support and PR creation guidelines
- flarglebargle: Easter egg response handler
Adding New Tools
The CodeAct agent uses a function calling interface based on litellm
's ChatCompletionToolParam
. To add a new tool:
- Define the tool in
function_calling.py
:
MyTool = ChatCompletionToolParam(
type='function',
function=ChatCompletionToolParamFunctionChunk(
name='my_tool',
description='Description of what the tool does and how to use it',
parameters={
'type': 'object',
'properties': {
'param1': {
'type': 'string',
'description': 'Description of parameter 1',
},
'param2': {
'type': 'integer',
'description': 'Description of parameter 2',
},
},
'required': ['param1'], # List required parameters here
},
),
)
- Add the tool to
get_tools()
infunction_calling.py
- Implement the corresponding action handler in the agent class
Implementation Details
The agent is implemented in two main files:
codeact_agent.py
: Core agent implementation with:- Message history management
- Tool execution handling
- State management
- Action/observation processing
function_calling.py
: Tool definitions and function calling interface with:- Tool parameter specifications
- Tool descriptions and examples
- Function calling response parsing