Spaces:
Running
Running
# AI Agent Architecture Plan | |
## 1. Frameworks and Tools | |
- **Python**: The primary programming language. | |
- **Manim**: For creating mathematical animations. | |
- **OpenAI API**: For generating Manim code from text prompts. | |
- **pydantic-ai**: For structured AI agent creation, function calling, and workflow management. | |
- **Gradio**: For creating a user interface to input prompts and display generated videos. | |
- **dotenv**: For managing environment variables. | |
- **Logging**: For logging information and errors. | |
## 2. Agent Architecture | |
- **Input Handling**: Use Gradio to create a user interface where users can input text prompts. | |
- **Agent Structure**: Leverage pydantic-ai to define the agent's schema, capabilities, and functions. | |
- **System Prompts**: Use both static and dynamic system prompts to guide the agent's behavior. | |
- Static prompts: Define the agent's role and general capabilities | |
- Dynamic prompts: Adjust behavior based on complexity settings and current context | |
- **Keyword Identification**: Use pydantic-ai with OpenAI API to identify keywords and generate Manim code. | |
- **Scenario Creation**: Define structured schemas in pydantic-ai to guide the generation of scenarios. | |
- **Function Search**: Use pydantic-ai's function calling capabilities to organize and call Manim functions. | |
- **Code Generation and Testing**: The generated code will be tested by rendering the video using Manim. | |
## 3. Workflow | |
1. **User Input**: The user inputs a text prompt describing a mathematical or physics concept. | |
2. **Agent Processing**: The pydantic-ai agent processes the input through defined schemas and tools. | |
- System prompts dynamically adjust based on user requirements | |
- Tools are applied in sequence using the agent's capabilities | |
3. **Keyword Identification and Scenario Creation**: The agent uses OpenAI API to analyze the prompt and generate a structured scenario. | |
4. **Code Generation**: The agent transforms the structured scenario into Manim code using defined tools. | |
5. **Video Rendering**: The code is executed using Manim to render the video. | |
6. **Output**: The generated video is displayed to the user. | |
## 4. Detailed Steps | |
1. **Setup Environment**: | |
- Ensure all required packages are installed (`gradio`, `openai`, `pydantic-ai`, `dotenv`, `manim`, etc.). | |
- Set up environment variables in `.env` file (e.g., `TOGETHER_API_KEY`). | |
- Configure pydantic-ai with appropriate model settings. | |
2. **Create Agent Structure**: | |
- Define pydantic models for input prompts, scenario descriptions, and animation parameters. | |
- Create static and dynamic system prompts to guide agent behavior: | |
- Static: Define the agent's role and general capabilities | |
- Dynamic: Adjust behavior based on request complexity and context | |
- Create tool functions for scenario extraction, code generation, and rendering. | |
- Configure the agent with appropriate tools and models. | |
3. **Create User Interface**: | |
- Use Gradio to create a web interface for inputting prompts and displaying results. | |
- Add complexity selection controls to customize animation generation. | |
- Connect the UI to the pydantic-ai agent. | |
4. **Generate Manim Code**: | |
- Implement functions using pydantic-ai tools to transform user prompts into structured scenarios. | |
- Convert structured scenarios into Manim code templates. | |
- Fill templates with specifics from the scenario. | |
5. **Render Video**: | |
- Implement a function to render the generated Manim code into a video. | |
- Add error handling and validation using pydantic models. | |
6. **Display Results**: | |
- Display the generated video and code in the Gradio interface. | |
- Provide feedback and explanations based on the agent's processing steps. |