Spaces:
Build error
Build error
# CodeAct Agent Framework | |
This folder is an implementation of OpenHands's main agent, the CodeAct Agent. It is based on ([CodeAct](https://arxiv.org/abs/2402.01030), [tweet](https://twitter.com/xingyaow_/status/1754556835703751087)), an idea of consolidating LLM agents' **act**ions into a unified **code** action space for both *simplicity* and *performance*. | |
## Overview | |
The CodeAct agent operates through a function calling interface. At each turn, the agent can: | |
1. **Converse**: Communicate with humans in natural language to ask for clarification, confirmation, etc. | |
2. **CodeAct**: Execute actions through a set of well-defined tools: | |
- Execute Linux `bash` commands with `execute_bash` | |
- Run Python code in an [IPython](https://ipython.org/) environment with `execute_ipython_cell` | |
- Interact with web browsers using `browser` and `fetch` | |
- Edit files using `str_replace_editor` or `edit_file` | |
 | |
## Built-in Tools | |
The agent provides several built-in tools: | |
### 1. `execute_bash` | |
- Execute any valid Linux bash command | |
- Handles long-running commands by running them in background with output redirection | |
- Supports interactive processes with STDIN input and process interruption | |
- Handles command timeouts with automatic retry in background mode | |
### 2. `execute_ipython_cell` | |
- Run Python code in an IPython environment | |
- Supports magic commands like `%pip` | |
- Variables are scoped to the IPython environment | |
- Requires defining variables and importing packages before use | |
### 3. `web_read` and `browser` | |
- `web_read`: Read and convert webpage content to markdown | |
- `browser`: Interact with webpages through Python code | |
- Supports common browser actions like navigation, clicking, form filling, scrolling | |
- Handles file uploads and drag-and-drop operations | |
### 4. `str_replace_editor` | |
- View, create and edit files through string replacement | |
- Persistent state across command calls | |
- File viewing with line numbers | |
- String replacement with exact matching | |
- Undo functionality for edits | |
### 5. `edit_file` (LLM-based) | |
- Edit files using LLM-based content generation | |
- Support for partial file edits with line ranges | |
- Handles large files by editing specific sections | |
- Append mode for adding content to files | |
## Configuration | |
Tools can be enabled/disabled through configuration parameters: | |
- `enable_browsing`: Enable browser interaction tools | |
- `enable_jupyter`: Enable IPython code execution | |
- `enable_llm_editor`: Enable LLM-based file editing (falls back to string replacement editor if disabled) | |
## Micro-agents | |
The agent includes specialized micro-agents for specific tasks: | |
1. **npm**: Handles npm package installation with non-interactive shell workarounds | |
2. **github**: Manages GitHub operations with API token support and PR creation guidelines | |
3. **flarglebargle**: Easter egg response handler | |
## Adding New Tools | |
The CodeAct agent uses a function calling interface based on `litellm`'s `ChatCompletionToolParam`. To add a new tool: | |
1. Define the tool in `function_calling.py`: | |
```python | |
MyTool = ChatCompletionToolParam( | |
type='function', | |
function=ChatCompletionToolParamFunctionChunk( | |
name='my_tool', | |
description='Description of what the tool does and how to use it', | |
parameters={ | |
'type': 'object', | |
'properties': { | |
'param1': { | |
'type': 'string', | |
'description': 'Description of parameter 1', | |
}, | |
'param2': { | |
'type': 'integer', | |
'description': 'Description of parameter 2', | |
}, | |
}, | |
'required': ['param1'], # List required parameters here | |
}, | |
), | |
) | |
``` | |
2. Add the tool to `get_tools()` in `function_calling.py` | |
3. Implement the corresponding action handler in the agent class | |
## Implementation Details | |
The agent is implemented in two main files: | |
1. `codeact_agent.py`: Core agent implementation with: | |
- Message history management | |
- Tool execution handling | |
- State management | |
- Action/observation processing | |
2. `function_calling.py`: Tool definitions and function calling interface with: | |
- Tool parameter specifications | |
- Tool descriptions and examples | |
- Function calling response parsing | |