Spaces:
Running
Running
File size: 3,785 Bytes
1645305 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
# AI Agent Architecture Plan
## 1. Frameworks and Tools
- **Python**: The primary programming language.
- **Manim**: For creating mathematical animations.
- **OpenAI API**: For generating Manim code from text prompts.
- **pydantic-ai**: For structured AI agent creation, function calling, and workflow management.
- **Gradio**: For creating a user interface to input prompts and display generated videos.
- **dotenv**: For managing environment variables.
- **Logging**: For logging information and errors.
## 2. Agent Architecture
- **Input Handling**: Use Gradio to create a user interface where users can input text prompts.
- **Agent Structure**: Leverage pydantic-ai to define the agent's schema, capabilities, and functions.
- **System Prompts**: Use both static and dynamic system prompts to guide the agent's behavior.
- Static prompts: Define the agent's role and general capabilities
- Dynamic prompts: Adjust behavior based on complexity settings and current context
- **Keyword Identification**: Use pydantic-ai with OpenAI API to identify keywords and generate Manim code.
- **Scenario Creation**: Define structured schemas in pydantic-ai to guide the generation of scenarios.
- **Function Search**: Use pydantic-ai's function calling capabilities to organize and call Manim functions.
- **Code Generation and Testing**: The generated code will be tested by rendering the video using Manim.
## 3. Workflow
1. **User Input**: The user inputs a text prompt describing a mathematical or physics concept.
2. **Agent Processing**: The pydantic-ai agent processes the input through defined schemas and tools.
- System prompts dynamically adjust based on user requirements
- Tools are applied in sequence using the agent's capabilities
3. **Keyword Identification and Scenario Creation**: The agent uses OpenAI API to analyze the prompt and generate a structured scenario.
4. **Code Generation**: The agent transforms the structured scenario into Manim code using defined tools.
5. **Video Rendering**: The code is executed using Manim to render the video.
6. **Output**: The generated video is displayed to the user.
## 4. Detailed Steps
1. **Setup Environment**:
- Ensure all required packages are installed (`gradio`, `openai`, `pydantic-ai`, `dotenv`, `manim`, etc.).
- Set up environment variables in `.env` file (e.g., `TOGETHER_API_KEY`).
- Configure pydantic-ai with appropriate model settings.
2. **Create Agent Structure**:
- Define pydantic models for input prompts, scenario descriptions, and animation parameters.
- Create static and dynamic system prompts to guide agent behavior:
- Static: Define the agent's role and general capabilities
- Dynamic: Adjust behavior based on request complexity and context
- Create tool functions for scenario extraction, code generation, and rendering.
- Configure the agent with appropriate tools and models.
3. **Create User Interface**:
- Use Gradio to create a web interface for inputting prompts and displaying results.
- Add complexity selection controls to customize animation generation.
- Connect the UI to the pydantic-ai agent.
4. **Generate Manim Code**:
- Implement functions using pydantic-ai tools to transform user prompts into structured scenarios.
- Convert structured scenarios into Manim code templates.
- Fill templates with specifics from the scenario.
5. **Render Video**:
- Implement a function to render the generated Manim code into a video.
- Add error handling and validation using pydantic models.
6. **Display Results**:
- Display the generated video and code in the Gradio interface.
- Provide feedback and explanations based on the agent's processing steps. |