Omachoko commited on
Commit
a9d900f
·
1 Parent(s): f58a18b

Finalize: move advanced agent to root, clean up, ready for deployment

Browse files
Files changed (7) hide show
  1. .gitignore +0 -91
  2. README.md +26 -258
  3. app.py +371 -15
  4. gaia_agent.py +0 -397
  5. requirements.txt +10 -16
  6. tests/test_agent_core.py +0 -38
  7. tests/test_video_qa.py +0 -22
.gitignore DELETED
@@ -1,91 +0,0 @@
1
- # Python
2
- __pycache__/
3
- *.py[cod]
4
- *$py.class
5
- *.so
6
- .Python
7
- build/
8
- develop-eggs/
9
- dist/
10
- downloads/
11
- eggs/
12
- .eggs/
13
- lib/
14
- lib64/
15
- parts/
16
- sdist/
17
- var/
18
- wheels/
19
- pip-wheel-metadata/
20
- share/python-wheels/
21
- *.egg-info/
22
- .installed.cfg
23
- *.egg
24
- MANIFEST
25
-
26
- # Virtual Environments
27
- .env
28
- .venv
29
- env/
30
- venv/
31
- ENV/
32
- env.bak/
33
- venv.bak/
34
- gaia_env/
35
-
36
- # IDE
37
- .vscode/
38
- .idea/
39
- *.swp
40
- *.swo
41
- *~
42
-
43
- # OS
44
- .DS_Store
45
- .DS_Store?
46
- ._*
47
- .Spotlight-V100
48
- .Trashes
49
- ehthumbs.db
50
- Thumbs.db
51
-
52
- # Logs
53
- *.log
54
- logs/
55
-
56
- # Environment variables
57
- .env
58
- .env.local
59
- .env.development.local
60
- .env.test.local
61
- .env.production.local
62
-
63
- # Jupyter Notebook
64
- .ipynb_checkpoints
65
-
66
- # pytest
67
- .pytest_cache/
68
- .tox/
69
- .coverage
70
- htmlcov/
71
-
72
- # mypy
73
- .mypy_cache/
74
- .dmypy.json
75
- dmypy.json
76
-
77
- # Hugging Face
78
- wandb/ __pycache__/
79
- __pycache__/
80
-
81
- # New additions
82
- gaia_env/
83
- gaia_agent.log
84
- *.pyc
85
- *.pyo
86
- *.pyd
87
- *.swp
88
- .DS_Store
89
- .env
90
- venv/
91
- gaia_agent_files/
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README.md CHANGED
@@ -1,272 +1,40 @@
1
  ---
2
- title: Enhanced GAIA Agent - Full Benchmark Implementation
3
- emoji: 🚀
4
- colorFrom: blue
5
- colorTo: green
6
  sdk: gradio
7
- sdk_version: 4.44.0
8
  app_file: app.py
9
  pinned: false
10
- license: mit
 
 
11
  ---
12
 
13
- # 🚀 Enhanced GAIA Agent - Full Benchmark Implementation
14
 
15
- **Optimized for 30%+ performance on GAIA benchmark with complete API integration**
16
 
17
- ## 🎯 Overview
18
-
19
- This is a comprehensive GAIA (General AI Assistants) agent implementation designed to achieve the target 30% performance for course certification. The agent features complete API integration, enhanced multi-step reasoning, and advanced tool orchestration.
20
-
21
- ## Key Enhancements
22
-
23
- ### 🔗 **Full GAIA API Integration**
24
- - ✅ Fetch questions from official GAIA API (`GET /questions`)
25
- - ✅ Get random questions (`GET /random-question`)
26
- - ✅ Download task files (`GET /files/{task_id}`)
27
- - ✅ Submit answers for official scoring (`POST /submit`)
28
- - ✅ Real-time leaderboard submission
29
-
30
- ### 🧠 **Enhanced Multi-Step Reasoning**
31
- - **Advanced Workflow**: Analyze → Plan → Act → Observe → Reason → Answer
32
- - **Reasoning Memory**: Maintains context across 15+ reasoning steps
33
- - **Question Classification**: Automatic complexity assessment (Level 1-3)
34
- - **Tool Orchestration**: Intelligent tool selection and execution
35
-
36
- ### 🛠️ **Enhanced Tool Arsenal** (9 Tools)
37
- 1. **🧮 Enhanced Calculator** - Complex mathematical operations
38
- 2. **🌐 Enhanced Web Search** - Expanded knowledge base (20+ countries)
39
- 3. **🖼️ Image Analyzer** - Visual content processing and spatial reasoning
40
- 4. **📄 Document Reader** - File content extraction
41
- 5. **📁 File Processor** - Download and process GAIA task files
42
- 6. **📅 Date Calculator** - Temporal reasoning and age calculations
43
- 7. **🔄 Unit Converter** - Length, temperature, weight conversions
44
- 8. **📝 Text Analyzer** - Content analysis and pattern extraction
45
- 9. **🧠 Reasoning Chain** - Multi-step logical synthesis
46
-
47
- ### 📊 **Enhanced Knowledge Base**
48
- - **Geography**: 20+ countries and capitals
49
- - **Astronomy**: Solar system facts, planet classifications (8 planets, 4 gas giants)
50
- - **History**: Key events (Berlin Wall fall 1989, Cold War end, etc.)
51
- - **Mathematics**: Constants (π, e, golden ratio) and conversion factors
52
- - **Arts**: Famous paintings and artists
53
-
54
- ## 🎯 GAIA Compliance Features
55
-
56
- ### ✅ **Level 1**: Basic Questions (<5 steps)
57
- - Simple mathematical calculations
58
- - Geographic knowledge queries
59
- - Basic factual lookups
60
-
61
- ### ✅ **Level 2**: Multi-Step Reasoning (5-10 steps)
62
- - Complex calculations with multiple components
63
- - Cross-domain knowledge synthesis
64
- - Tool coordination and chaining
65
-
66
- ### ✅ **Level 3**: Long-Term Planning
67
- - Advanced reasoning with 15+ steps
68
- - File processing and analysis
69
- - Multi-modal understanding simulation
70
-
71
- ## 🚀 Performance Targets
72
-
73
- | Metric | Target | Baseline | Status |
74
- |--------|--------|----------|---------|
75
- | **Minimum Required** | 30% | GPT-4 ~15% | 🎯 Optimized |
76
- | **Enhanced Target** | 35-45% | Human ~92% | 📈 Achievable |
77
- | **Certification** | 30%+ | Course Requirement | ✅ Ready |
78
-
79
- ## 🛠️ Technical Implementation
80
-
81
- ### Core Components
82
- - `gaia_agent.py`: Enhanced agent with full capabilities (800+ lines)
83
- - `app.py`: Complete Gradio interface with API integration
84
- - `requirements.txt`: Enhanced dependencies for full functionality
85
-
86
- ### Enhanced Dependencies
87
- ```
88
- gradio==4.44.0 # Latest UI framework
89
- requests==2.31.0 # API connectivity
90
- pandas==2.1.0 # Data processing
91
- beautifulsoup4==4.12.2 # Content parsing
92
- pillow==10.0.1 # Image processing
93
- markdownify==0.11.6 # Document formatting
94
- ```
95
-
96
- ### API Integration
97
- ```python
98
- # Fetch questions
99
- questions = agent.get_questions()
100
-
101
- # Process with file support
102
- answer = agent.query(question, task_id="task_123")
103
-
104
- # Submit for scoring
105
- result = agent.submit_answer(username, agent_code_url, answers)
106
- ```
107
-
108
- ## 📱 User Interface
109
-
110
- ### 🎯 **GAIA Questions Tab**
111
- - Fetch real questions from GAIA API
112
- - Automatic file download and processing
113
- - Enhanced reasoning with memory display
114
-
115
- ### ✏️ **Manual Input Tab**
116
- - Test custom questions
117
- - Example questions for different complexity levels
118
- - Immediate processing and feedback
119
-
120
- ### 📊 **Submission & Scoring Tab**
121
- - Official GAIA leaderboard submission
122
- - Progress tracking and statistics
123
- - Performance monitoring
124
-
125
- ### 🛠️ **Agent Details Tab**
126
- - Complete capability documentation
127
- - Tool descriptions and examples
128
- - Performance benchmarks
129
-
130
- ## 🧪 Example Capabilities
131
-
132
- ### Mathematical Reasoning
133
- ```
134
- Q: If there are 8 planets and 4 are gas giants, how many are not gas giants?
135
- A: 4
136
- ```
137
-
138
- ### Geographic Knowledge
139
- ```
140
- Q: What is the capital of Germany?
141
- A: Berlin
142
- ```
143
-
144
- ### Historical Research
145
- ```
146
- Q: Who was the US president when the Berlin Wall fell?
147
- A: George H.W. Bush
148
- ```
149
-
150
- ### Complex Calculations
151
- ```
152
- Q: Convert 100 degrees Celsius to Fahrenheit
153
- A: 212.0
154
- ```
155
-
156
- ## 🎯 Usage Instructions
157
-
158
- ### 1. **Setup Environment**
159
- ```bash
160
- pip install -r requirements.txt
161
- python app.py
162
- ```
163
-
164
- ### 2. **Fetch GAIA Questions**
165
- - Click "Get Random Question" to fetch from API
166
- - Questions include task ID and associated files
167
- - Files are automatically downloaded and processed
168
-
169
- ### 3. **Process Questions**
170
- - Enhanced agent uses 15-step reasoning
171
- - Multiple tools are orchestrated intelligently
172
- - Reasoning memory is displayed for transparency
173
-
174
- ### 4. **Submit for Scoring**
175
- - Provide Hugging Face username
176
- - Include agent code URL (your Space link)
177
- - Submit accumulated answers for official scoring
178
-
179
- ## 🏆 Certification Ready
180
-
181
- This implementation is specifically optimized to achieve the **30% target performance** required for course certification:
182
-
183
- - ✅ **Complete API Integration** - Connects to official GAIA endpoints
184
- - ✅ **Enhanced Reasoning** - 15-step multi-tool workflow
185
- - ✅ **Expanded Knowledge** - Comprehensive knowledge base
186
- - ✅ **File Processing** - Handles task-associated files
187
- - ✅ **Clean Formatting** - Exact match answer preparation
188
- - ✅ **Progress Tracking** - Real-time performance monitoring
189
-
190
- ## 📊 Optimization Results
191
-
192
- | Component | Before | After | Improvement |
193
- |-----------|--------|-------|-------------|
194
- | **Tools** | 5 basic | 9 enhanced | +80% capability |
195
- | **Knowledge Base** | 8 entries | 50+ entries | +500% coverage |
196
- | **Reasoning Steps** | 10 max | 15 max | +50% depth |
197
- | **API Integration** | None | Full | Complete |
198
- | **File Support** | None | TXT/JSON/CSV | Advanced |
199
-
200
- ---
201
-
202
- **🎯 Ready for GAIA Benchmark - Targeting 30%+ Performance for Course Certification**
203
-
204
- # Modular GAIA Agent
205
-
206
- A production-ready, GAIA benchmark-compliant agent for Hugging Face's AI Agents course. Handles multi-modal questions, file downloads, and tool chaining with strict GAIA output formatting.
207
-
208
- ## Features
209
- - Modular tool/LLM registry (easy to extend)
210
- - Best-in-class Hugging Face models for LLM, QA, table QA, ASR, image captioning
211
- - File download/caching and type routing
212
- - Multi-step reasoning and tool chaining
213
- - GAIA-compliant output and reasoning trace
214
- - **Advanced YouTube/Video QA**: Frame extraction, object detection (YOLOv8), image captioning (BLIP), and audio transcription (Whisper)
215
- - **Robust error handling and logging**: All errors are logged to `gaia_agent.log` and user-friendly messages are returned
216
- - **Secure code execution**: Python code is run in a subprocess with timeout and resource limits
217
- - **Automated testing**: Unit and integration tests with pytest
218
 
219
  ## Usage
 
 
 
220
 
221
- ### Install dependencies
222
- ```bash
223
- pip install -r requirements.txt
224
- # Also install yt-dlp (for YouTube/video QA)
225
- pip install yt-dlp
226
- # Download YOLOv8 weights if needed
227
- python -c "from ultralytics import YOLO; YOLO('yolov8n.pt')"
228
- ```
229
-
230
- ### Run the agent
231
- ```python
232
- from gaia_agent import ModularGAIAAgent
233
- agent = ModularGAIAAgent()
234
- results = agent.run(from_api=True)
235
- for r in results:
236
- print(r)
237
- ```
238
 
239
- ### Run the Gradio UI
240
- ```bash
241
- python app.py
242
- ```
243
-
244
- ### Run tests
245
- ```bash
246
- pytest tests/
247
- ```
248
-
249
- ### Debugging and Logging
250
- - All errors and important events are logged to `gaia_agent.log`.
251
- - Set the agent's debug flag for verbose output (see code).
252
-
253
- ### Security
254
- - Python code is executed in a subprocess with a timeout (default 5s).
255
- - For extra safety, consider running the agent in a containerized environment.
256
-
257
- ## File Structure
258
- - `gaia_agent.py`: Main agent logic
259
- - `requirements.txt`: Dependencies
260
- - `README.md`: This file
261
- - `app.py`: Gradio UI
262
- - `tests/`: Automated tests
263
- - `gaia_agent_files/`: Example/context files
264
-
265
- ## Example Screenshot
266
 
267
- ![screenshot placeholder](screenshot.png)
268
 
269
- ## Notes
270
- - Requires a Hugging Face token for some models/APIs
271
- - Designed for easy extension and robust, production use
272
- - For video QA, ensure `yt-dlp` and YOLOv8 weights are available
 
1
  ---
2
+ title: Template Final Assignment
3
+ emoji: 🕵🏻‍♂️
4
+ colorFrom: indigo
5
+ colorTo: indigo
6
  sdk: gradio
7
+ sdk_version: 5.25.2
8
  app_file: app.py
9
  pinned: false
10
+ hf_oauth: true
11
+ # optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes.
12
+ hf_oauth_expiration_minutes: 480
13
  ---
14
 
15
+ # GAIA Benchmark Agent - Modular Multi-Modal Architecture
16
 
17
+ This Space is built on the official [agents-course/Final_Assignment_Template](https://huggingface.co/spaces/agents-course/Final_Assignment_Template) base. The architecture strictly preserves the original constants and UI, but replaces the agent logic with a fully modular, multi-modal, GAIA-compliant agent.
18
 
19
+ ## Key Features
20
+ - **ModularGAIAAgent**: Handles multi-modal, multi-step reasoning, tool use, file handling, and strict GAIA output formatting.
21
+ - **Tool/LLM Registry**: Easily extensible for new tools, models, and modalities.
22
+ - **File Handling**: Supports text, CSV, Excel, JSON, images, audio, and code files, with automatic type detection and routing.
23
+ - **Adaptive Reasoning**: Plans and chains tool/model calls as needed for each question.
24
+ - **GAIA-Compliant Output**: Ensures answers are formatted to GAIA standards.
25
+ - **Trace Logging**: Internal reasoning trace for each answer (for debugging and transparency).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
  ## Usage
28
+ - Log in with your Hugging Face account.
29
+ - Click 'Run Evaluation & Submit All Answers' to fetch questions, run the agent, and submit answers for scoring.
30
+ - The UI and constants (such as `DEFAULT_API_URL`) are unchanged from the official template, ensuring full compatibility with the GAIA evaluation system.
31
 
32
+ ## Customization
33
+ - To extend the agent, add new tools or models to the `TOOL_REGISTRY` and update the logic in `ModularGAIAAgent`.
34
+ - The agent is designed for easy adaptation to new modalities and reasoning strategies.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
 
38
+ **Note:** This implementation is intentionally modular and extensible, but the public interface and constants remain as required by the course template.
39
 
40
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
app.py CHANGED
@@ -1,25 +1,368 @@
1
- #!/usr/bin/env python3
2
- """
3
- 🚀 Enhanced GAIA Agent Interface - Full API Integration
4
- Complete Gradio interface for GAIA benchmark with API connectivity and scoring
5
- """
6
-
7
  import os
8
  import gradio as gr
9
- import json
10
- from datetime import datetime
11
- from gaia_agent import ModularGAIAAgent
12
  import requests
13
  import inspect
14
  import pandas as pd
15
-
16
- agent = ModularGAIAAgent()
17
 
18
  # (Keep Constants as is)
19
  # --- Constants ---
20
  DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
21
 
22
- # --- Advanced Modular Agent Integration ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
  class BasicAgent:
24
  def __init__(self):
25
  print("BasicAgent (GAIA Modular Agent) initialized.")
@@ -139,24 +482,32 @@ def run_and_submit_all(profile: gr.OAuthProfile | None):
139
  results_df = pd.DataFrame(results_log)
140
  return status_message, results_df
141
 
 
142
  with gr.Blocks() as demo:
143
  gr.Markdown("# Basic Agent Evaluation Runner")
144
  gr.Markdown(
145
  """
146
  **Instructions:**
 
147
  1. Please clone this space, then modify the code to define your agent's logic, the tools, the necessary packages, etc ...
148
  2. Log in to your Hugging Face account using the button below. This uses your HF username for submission.
149
  3. Click 'Run Evaluation & Submit All Answers' to fetch questions, run your agent, submit answers, and see the score.
 
150
  ---
151
  **Disclaimers:**
152
  Once clicking on the "submit button, it can take quite some time ( this is the time for the agent to go through all the questions).
153
  This space provides a basic setup and is intentionally sub-optimal to encourage you to develop your own, more robust solution. For instance for the delay process of the submit button, a solution could be to cache the answers and submit in a seperate action or even to answer the questions in async.
154
  """
155
  )
 
156
  gr.LoginButton()
 
157
  run_button = gr.Button("Run Evaluation & Submit All Answers")
 
158
  status_output = gr.Textbox(label="Run Status / Submission Result", lines=5, interactive=False)
 
159
  results_table = gr.DataFrame(label="Questions and Agent Answers", wrap=True)
 
160
  run_button.click(
161
  fn=run_and_submit_all,
162
  outputs=[status_output, results_table]
@@ -164,19 +515,24 @@ with gr.Blocks() as demo:
164
 
165
  if __name__ == "__main__":
166
  print("\n" + "-"*30 + " App Starting " + "-"*30)
 
167
  space_host_startup = os.getenv("SPACE_HOST")
168
- space_id_startup = os.getenv("SPACE_ID")
 
169
  if space_host_startup:
170
  print(f"✅ SPACE_HOST found: {space_host_startup}")
171
  print(f" Runtime URL should be: https://{space_host_startup}.hf.space")
172
  else:
173
  print("ℹ️ SPACE_HOST environment variable not found (running locally?).")
174
- if space_id_startup:
 
175
  print(f"✅ SPACE_ID found: {space_id_startup}")
176
  print(f" Repo URL: https://huggingface.co/spaces/{space_id_startup}")
177
  print(f" Repo Tree URL: https://huggingface.co/spaces/{space_id_startup}/tree/main")
178
  else:
179
  print("ℹ️ SPACE_ID environment variable not found (running locally?). Repo URL cannot be determined.")
 
180
  print("-"*(60 + len(" App Starting ")) + "\n")
 
181
  print("Launching Gradio Interface for Basic Agent Evaluation...")
182
- demo.launch(debug=True, share=False)
 
 
 
 
 
 
 
1
  import os
2
  import gradio as gr
 
 
 
3
  import requests
4
  import inspect
5
  import pandas as pd
6
+ from typing import Any
 
7
 
8
  # (Keep Constants as is)
9
  # --- Constants ---
10
  DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
11
 
12
+ # --- Advanced Modular Agent Implementation ---
13
+ import json
14
+ import logging
15
+ import mimetypes
16
+ import openpyxl
17
+ import numpy as np
18
+ from datetime import datetime
19
+ from io import BytesIO
20
+ from PIL import Image
21
+ import subprocess
22
+ import tempfile
23
+ from huggingface_hub import InferenceClient
24
+ import cv2
25
+ import torch
26
+ from bs4 import BeautifulSoup
27
+
28
+ logging.basicConfig(filename='gaia_agent.log', level=logging.INFO, format='%(asctime)s %(levelname)s:%(message)s')
29
+ logger = logging.getLogger(__name__)
30
+ HF_TOKEN = os.environ.get("HF_TOKEN", "")
31
+
32
+ def llama3_chat(prompt):
33
+ try:
34
+ client = InferenceClient(provider="fireworks-ai", api_key=HF_TOKEN)
35
+ completion = client.chat.completions.create(
36
+ model="meta-llama/Llama-3.1-8B-Instruct",
37
+ messages=[{"role": "user", "content": prompt}],
38
+ )
39
+ return completion.choices[0].message.content
40
+ except Exception as e:
41
+ logging.error(f"llama3_chat error: {e}")
42
+ return f"LLM error: {e}"
43
+
44
+ def mixtral_chat(prompt):
45
+ try:
46
+ client = InferenceClient(provider="hf-inference", api_key=HF_TOKEN)
47
+ completion = client.chat.completions.create(
48
+ model="mistralai/Mixtral-8x7B-Instruct-v0.1",
49
+ messages=[{"role": "user", "content": prompt}],
50
+ )
51
+ return completion.choices[0].message.content
52
+ except Exception as e:
53
+ logging.error(f"mixtral_chat error: {e}")
54
+ return f"LLM error: {e}"
55
+
56
+ def extractive_qa(question, context):
57
+ try:
58
+ client = InferenceClient(provider="hf-inference", api_key=HF_TOKEN)
59
+ answer = client.question_answering(
60
+ question=question,
61
+ context=context,
62
+ model="deepset/roberta-base-squad2",
63
+ )
64
+ return answer["answer"]
65
+ except Exception as e:
66
+ logging.error(f"extractive_qa error: {e}")
67
+ return f"QA error: {e}"
68
+
69
+ def table_qa(query, table):
70
+ try:
71
+ client = InferenceClient(provider="hf-inference", api_key=HF_TOKEN)
72
+ answer = client.table_question_answering(
73
+ query=query,
74
+ table=table,
75
+ model="google/tapas-large-finetuned-wtq",
76
+ )
77
+ return answer["answer"]
78
+ except Exception as e:
79
+ logging.error(f"table_qa error: {e}")
80
+ return f"Table QA error: {e}"
81
+
82
+ def asr_transcribe(audio_path):
83
+ try:
84
+ import torchaudio
85
+ from transformers import pipeline
86
+ asr = pipeline("automatic-speech-recognition", model="openai/whisper-base.en")
87
+ result = asr(audio_path)
88
+ return result["text"]
89
+ except Exception as e:
90
+ logging.error(f"asr_transcribe error: {e}")
91
+ return f"ASR error: {e}"
92
+
93
+ def image_caption(image_path):
94
+ try:
95
+ from transformers import BlipProcessor, BlipForConditionalGeneration
96
+ from PIL import Image
97
+ processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
98
+ model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")
99
+ raw_image = Image.open(image_path).convert('RGB')
100
+ inputs = processor(raw_image, return_tensors="pt")
101
+ out = model.generate(**inputs)
102
+ return processor.decode(out[0], skip_special_tokens=True)
103
+ except Exception as e:
104
+ logging.error(f"image_caption error: {e}")
105
+ return f"Image captioning error: {e}"
106
+
107
+ def code_analysis(py_path):
108
+ try:
109
+ with open(py_path) as f:
110
+ code = f.read()
111
+ with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as tmp:
112
+ tmp.write(code)
113
+ tmp_path = tmp.name
114
+ try:
115
+ result = subprocess.run([
116
+ "python3", tmp_path
117
+ ], capture_output=True, text=True, timeout=5)
118
+ if result.returncode == 0:
119
+ output = result.stdout.strip().split('\n')
120
+ return output[-1] if output else ''
121
+ else:
122
+ logging.error(f"code_analysis subprocess error: {result.stderr}")
123
+ return f"Code error: {result.stderr}"
124
+ except subprocess.TimeoutExpired:
125
+ logging.error("code_analysis timeout")
126
+ return "Code execution timed out"
127
+ finally:
128
+ os.remove(tmp_path)
129
+ except Exception as e:
130
+ logging.error(f"code_analysis error: {e}")
131
+ return f"Code analysis error: {e}"
132
+
133
+ def youtube_video_qa(youtube_url, question):
134
+ import subprocess
135
+ import tempfile
136
+ import os
137
+ from transformers import pipeline
138
+ try:
139
+ with tempfile.TemporaryDirectory() as tmpdir:
140
+ # Download video
141
+ video_path = os.path.join(tmpdir, "video.mp4")
142
+ cmd = ["yt-dlp", "-f", "mp4", "-o", video_path, youtube_url]
143
+ subprocess.run(cmd, check=True)
144
+ # Extract audio for ASR
145
+ audio_path = os.path.join(tmpdir, "audio.mp3")
146
+ cmd_audio = ["yt-dlp", "-f", "bestaudio", "--extract-audio", "--audio-format", "mp3", "-o", audio_path, youtube_url]
147
+ subprocess.run(cmd_audio, check=True)
148
+ # Transcribe audio
149
+ asr = pipeline("automatic-speech-recognition", model="openai/whisper-base.en")
150
+ result = asr(audio_path)
151
+ transcript = result["text"]
152
+ # Extract frames for vision QA
153
+ cap = cv2.VideoCapture(video_path)
154
+ frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
155
+ fps = int(cap.get(cv2.CAP_PROP_FPS))
156
+ frames = []
157
+ for i in range(0, frame_count, max(1, fps*5)):
158
+ cap.set(cv2.CAP_PROP_POS_FRAMES, i)
159
+ ret, frame = cap.read()
160
+ if not ret:
161
+ break
162
+ img = Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
163
+ frames.append(img)
164
+ cap.release()
165
+ # Object detection (YOLOv8)
166
+ try:
167
+ from ultralytics import YOLO
168
+ yolo = YOLO("yolov8n.pt")
169
+ detections = []
170
+ for img in frames:
171
+ results = yolo(np.array(img))
172
+ for r in results:
173
+ for c in r.boxes.cls:
174
+ detections.append(yolo.model.names[int(c)])
175
+ detection_summary = {}
176
+ for obj in detections:
177
+ detection_summary[obj] = detection_summary.get(obj, 0) + 1
178
+ except Exception as e:
179
+ logging.error(f"YOLOv8 error: {e}")
180
+ detection_summary = {}
181
+ # Image captioning (BLIP)
182
+ try:
183
+ from transformers import BlipProcessor, BlipForConditionalGeneration
184
+ processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
185
+ model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")
186
+ captions = []
187
+ for img in frames:
188
+ inputs = processor(img, return_tensors="pt")
189
+ out = model.generate(**inputs)
190
+ captions.append(processor.decode(out[0], skip_special_tokens=True))
191
+ except Exception as e:
192
+ logging.error(f"BLIP error: {e}")
193
+ captions = []
194
+ context = f"Transcript: {transcript}\nCaptions: {' | '.join(captions)}\nDetections: {detection_summary}"
195
+ answer = extractive_qa(question, context)
196
+ return answer
197
+ except Exception as e:
198
+ logging.error(f"YouTube video QA error: {e}")
199
+ return f"Video analysis error: {e}"
200
+
201
+ TOOL_REGISTRY = {
202
+ "llama3_chat": llama3_chat,
203
+ "mixtral_chat": mixtral_chat,
204
+ "extractive_qa": extractive_qa,
205
+ "table_qa": table_qa,
206
+ "asr_transcribe": asr_transcribe,
207
+ "image_caption": image_caption,
208
+ "code_analysis": code_analysis,
209
+ "youtube_video_qa": youtube_video_qa,
210
+ }
211
+
212
+ class ModularGAIAAgent:
213
+ def __init__(self, api_url=DEFAULT_API_URL, tool_registry=TOOL_REGISTRY):
214
+ self.api_url = api_url
215
+ self.tools = tool_registry
216
+ self.reasoning_trace = []
217
+ self.file_cache = set(os.listdir('.'))
218
+
219
+ def fetch_questions(self, from_api=True, questions_path="Hugging Face Questions"):
220
+ if from_api:
221
+ r = requests.get(f"{self.api_url}/questions")
222
+ r.raise_for_status()
223
+ return r.json()
224
+ else:
225
+ with open(questions_path) as f:
226
+ data = f.read()
227
+ start = data.find("[")
228
+ end = data.rfind("]") + 1
229
+ questions = json.loads(data[start:end])
230
+ return questions
231
+
232
+ def download_file(self, file_id, file_name=None):
233
+ if not file_name:
234
+ file_name = file_id
235
+ if file_name in self.file_cache:
236
+ return file_name
237
+ url = f"{self.api_url}/files/{file_id}"
238
+ r = requests.get(url)
239
+ if r.status_code == 200:
240
+ with open(file_name, "wb") as f:
241
+ f.write(r.content)
242
+ self.file_cache.add(file_name)
243
+ return file_name
244
+ else:
245
+ self.reasoning_trace.append(f"Failed to download file {file_id} (status {r.status_code})")
246
+ return None
247
+
248
+ def detect_file_type(self, file_name):
249
+ ext = os.path.splitext(file_name)[-1].lower()
250
+ if ext in ['.mp3', '.wav', '.flac']:
251
+ return 'audio'
252
+ elif ext in ['.png', '.jpg', '.jpeg', '.bmp']:
253
+ return 'image'
254
+ elif ext in ['.py']:
255
+ return 'code'
256
+ elif ext in ['.xlsx']:
257
+ return 'excel'
258
+ elif ext in ['.csv']:
259
+ return 'csv'
260
+ elif ext in ['.json']:
261
+ return 'json'
262
+ elif ext in ['.txt', '.md']:
263
+ return 'text'
264
+ else:
265
+ return 'unknown'
266
+
267
+ def analyze_file(self, file_name, file_type):
268
+ if file_type == 'audio':
269
+ transcript = self.tools['asr_transcribe'](file_name)
270
+ self.reasoning_trace.append(f"Transcribed audio: {transcript[:100]}...")
271
+ return transcript
272
+ elif file_type == 'image':
273
+ caption = self.tools['image_caption'](file_name)
274
+ self.reasoning_trace.append(f"Image caption: {caption}")
275
+ return caption
276
+ elif file_type == 'code':
277
+ result = self.tools['code_analysis'](file_name)
278
+ self.reasoning_trace.append(f"Code analysis result: {result}")
279
+ return result
280
+ elif file_type == 'excel':
281
+ wb = openpyxl.load_workbook(file_name)
282
+ ws = wb.active
283
+ data = list(ws.values)
284
+ headers = data[0]
285
+ table = [dict(zip(headers, row)) for row in data[1:]]
286
+ self.reasoning_trace.append(f"Excel table loaded: {table[:2]}...")
287
+ return table
288
+ elif file_type == 'csv':
289
+ df = pd.read_csv(file_name)
290
+ table = df.to_dict(orient='records')
291
+ self.reasoning_trace.append(f"CSV table loaded: {table[:2]}...")
292
+ return table
293
+ elif file_type == 'json':
294
+ with open(file_name) as f:
295
+ data = json.load(f)
296
+ self.reasoning_trace.append(f"JSON loaded: {str(data)[:100]}...")
297
+ return data
298
+ elif file_type == 'text':
299
+ with open(file_name) as f:
300
+ text = f.read()
301
+ self.reasoning_trace.append(f"Text loaded: {text[:100]}...")
302
+ return text
303
+ else:
304
+ self.reasoning_trace.append(f"Unknown file type: {file_name}")
305
+ return None
306
+
307
+ def answer_question(self, question_obj):
308
+ self.reasoning_trace = []
309
+ q = question_obj["question"]
310
+ file_name = question_obj.get("file_name", "")
311
+ file_content = None
312
+ file_type = None
313
+ # YouTube video question detection
314
+ if "youtube.com" in q or "youtu.be" in q:
315
+ url = None
316
+ for word in q.split():
317
+ if "youtube.com" in word or "youtu.be" in word:
318
+ url = word.strip().strip(',')
319
+ break
320
+ if url:
321
+ answer = self.tools['youtube_video_qa'](url, q)
322
+ self.reasoning_trace.append(f"YouTube video analyzed: {url}")
323
+ self.reasoning_trace.append(f"Final answer: {answer}")
324
+ return self.format_answer(answer), self.reasoning_trace
325
+ if file_name:
326
+ file_id = file_name.split('.')[0]
327
+ local_file = self.download_file(file_id, file_name)
328
+ if local_file:
329
+ file_type = self.detect_file_type(local_file)
330
+ file_content = self.analyze_file(local_file, file_type)
331
+ # Plan: choose tool based on question and file
332
+ if file_type == 'audio' or file_type == 'text':
333
+ if file_content:
334
+ answer = self.tools['extractive_qa'](q, file_content)
335
+ else:
336
+ answer = self.tools['llama3_chat'](q)
337
+ elif file_type == 'excel' or file_type == 'csv':
338
+ if file_content:
339
+ answer = self.tools['table_qa'](q, file_content)
340
+ else:
341
+ answer = self.tools['llama3_chat'](q)
342
+ elif file_type == 'image':
343
+ if file_content:
344
+ answer = self.tools['llama3_chat'](f"{q}\nImage description: {file_content}")
345
+ else:
346
+ answer = self.tools['llama3_chat'](q)
347
+ elif file_type == 'code':
348
+ answer = file_content
349
+ else:
350
+ answer = self.tools['llama3_chat'](q)
351
+ self.reasoning_trace.append(f"Final answer: {answer}")
352
+ return self.format_answer(answer), self.reasoning_trace
353
+
354
+ def format_answer(self, answer):
355
+ if isinstance(answer, str):
356
+ answer = answer.strip().rstrip('.')
357
+ for prefix in ['answer:', 'result:', 'the answer is', 'final answer:', 'response:']:
358
+ if answer.lower().startswith(prefix):
359
+ answer = answer[len(prefix):].strip()
360
+ import re
361
+ answer = re.sub(r'\b(the|a|an)\b ', '', answer, flags=re.IGNORECASE)
362
+ answer = answer.strip().rstrip('.')
363
+ return answer
364
+
365
+ # --- Basic Agent Definition (now wraps ModularGAIAAgent) ---
366
  class BasicAgent:
367
  def __init__(self):
368
  print("BasicAgent (GAIA Modular Agent) initialized.")
 
482
  results_df = pd.DataFrame(results_log)
483
  return status_message, results_df
484
 
485
+ # --- Build Gradio Interface using Blocks ---
486
  with gr.Blocks() as demo:
487
  gr.Markdown("# Basic Agent Evaluation Runner")
488
  gr.Markdown(
489
  """
490
  **Instructions:**
491
+
492
  1. Please clone this space, then modify the code to define your agent's logic, the tools, the necessary packages, etc ...
493
  2. Log in to your Hugging Face account using the button below. This uses your HF username for submission.
494
  3. Click 'Run Evaluation & Submit All Answers' to fetch questions, run your agent, submit answers, and see the score.
495
+
496
  ---
497
  **Disclaimers:**
498
  Once clicking on the "submit button, it can take quite some time ( this is the time for the agent to go through all the questions).
499
  This space provides a basic setup and is intentionally sub-optimal to encourage you to develop your own, more robust solution. For instance for the delay process of the submit button, a solution could be to cache the answers and submit in a seperate action or even to answer the questions in async.
500
  """
501
  )
502
+
503
  gr.LoginButton()
504
+
505
  run_button = gr.Button("Run Evaluation & Submit All Answers")
506
+
507
  status_output = gr.Textbox(label="Run Status / Submission Result", lines=5, interactive=False)
508
+ # Removed max_rows=10 from DataFrame constructor
509
  results_table = gr.DataFrame(label="Questions and Agent Answers", wrap=True)
510
+
511
  run_button.click(
512
  fn=run_and_submit_all,
513
  outputs=[status_output, results_table]
 
515
 
516
  if __name__ == "__main__":
517
  print("\n" + "-"*30 + " App Starting " + "-"*30)
518
+ # Check for SPACE_HOST and SPACE_ID at startup for information
519
  space_host_startup = os.getenv("SPACE_HOST")
520
+ space_id_startup = os.getenv("SPACE_ID") # Get SPACE_ID at startup
521
+
522
  if space_host_startup:
523
  print(f"✅ SPACE_HOST found: {space_host_startup}")
524
  print(f" Runtime URL should be: https://{space_host_startup}.hf.space")
525
  else:
526
  print("ℹ️ SPACE_HOST environment variable not found (running locally?).")
527
+
528
+ if space_id_startup: # Print repo URLs if SPACE_ID is found
529
  print(f"✅ SPACE_ID found: {space_id_startup}")
530
  print(f" Repo URL: https://huggingface.co/spaces/{space_id_startup}")
531
  print(f" Repo Tree URL: https://huggingface.co/spaces/{space_id_startup}/tree/main")
532
  else:
533
  print("ℹ️ SPACE_ID environment variable not found (running locally?). Repo URL cannot be determined.")
534
+
535
  print("-"*(60 + len(" App Starting ")) + "\n")
536
+
537
  print("Launching Gradio Interface for Basic Agent Evaluation...")
538
+ demo.launch(debug=True, share=False)
gaia_agent.py DELETED
@@ -1,397 +0,0 @@
1
- #!/usr/bin/env python3
2
- """
3
- 🚀 Enhanced GAIA Agent - Full GAIA Benchmark Implementation
4
- Optimized for 30%+ performance on GAIA benchmark with complete API integration
5
- """
6
-
7
- import os
8
- import re
9
- import json
10
- import base64
11
- import logging
12
- import requests
13
- from typing import Dict, List, Any, Optional, Tuple
14
- from urllib.parse import urlparse, quote
15
- from io import BytesIO
16
- import pandas as pd
17
- import numpy as np
18
- from datetime import datetime
19
- from bs4 import BeautifulSoup
20
- # import markdownify # Removed for compatibility
21
- from huggingface_hub import InferenceClient
22
- import mimetypes
23
- import openpyxl
24
- import cv2
25
- import torch
26
- from PIL import Image
27
- import subprocess
28
- import tempfile
29
-
30
- # Configure logging
31
- logging.basicConfig(filename='gaia_agent.log', level=logging.INFO, format='%(asctime)s %(levelname)s:%(message)s')
32
- logger = logging.getLogger(__name__)
33
-
34
- DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
35
- HF_TOKEN = os.environ.get("HF_TOKEN", "")
36
-
37
- # --- Tool/LLM Wrappers ---
38
- def llama3_chat(prompt):
39
- try:
40
- client = InferenceClient(provider="fireworks-ai", api_key=HF_TOKEN)
41
- completion = client.chat.completions.create(
42
- model="meta-llama/Llama-3.1-8B-Instruct",
43
- messages=[{"role": "user", "content": prompt}],
44
- )
45
- return completion.choices[0].message.content
46
- except Exception as e:
47
- logging.error(f"llama3_chat error: {e}")
48
- return f"LLM error: {e}"
49
-
50
- def mixtral_chat(prompt):
51
- try:
52
- client = InferenceClient(provider="hf-inference", api_key=HF_TOKEN)
53
- completion = client.chat.completions.create(
54
- model="mistralai/Mixtral-8x7B-Instruct-v0.1",
55
- messages=[{"role": "user", "content": prompt}],
56
- )
57
- return completion.choices[0].message.content
58
- except Exception as e:
59
- logging.error(f"mixtral_chat error: {e}")
60
- return f"LLM error: {e}"
61
-
62
- def extractive_qa(question, context):
63
- try:
64
- client = InferenceClient(provider="hf-inference", api_key=HF_TOKEN)
65
- answer = client.question_answering(
66
- question=question,
67
- context=context,
68
- model="deepset/roberta-base-squad2",
69
- )
70
- return answer["answer"]
71
- except Exception as e:
72
- logging.error(f"extractive_qa error: {e}")
73
- return f"QA error: {e}"
74
-
75
- def table_qa(query, table):
76
- try:
77
- client = InferenceClient(provider="hf-inference", api_key=HF_TOKEN)
78
- answer = client.table_question_answering(
79
- query=query,
80
- table=table,
81
- model="google/tapas-large-finetuned-wtq",
82
- )
83
- return answer["answer"]
84
- except Exception as e:
85
- logging.error(f"table_qa error: {e}")
86
- return f"Table QA error: {e}"
87
-
88
- def asr_transcribe(audio_path):
89
- try:
90
- import torchaudio
91
- from transformers import pipeline
92
- asr = pipeline("automatic-speech-recognition", model="openai/whisper-base.en")
93
- result = asr(audio_path)
94
- return result["text"]
95
- except Exception as e:
96
- logging.error(f"asr_transcribe error: {e}")
97
- return f"ASR error: {e}"
98
-
99
- def image_caption(image_path):
100
- try:
101
- from transformers import BlipProcessor, BlipForConditionalGeneration
102
- from PIL import Image
103
- processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
104
- model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")
105
- raw_image = Image.open(image_path).convert('RGB')
106
- inputs = processor(raw_image, return_tensors="pt")
107
- out = model.generate(**inputs)
108
- return processor.decode(out[0], skip_special_tokens=True)
109
- except Exception as e:
110
- logging.error(f"image_caption error: {e}")
111
- return f"Image captioning error: {e}"
112
-
113
- def code_analysis(py_path):
114
- try:
115
- # Hardened: run code in subprocess with timeout and memory limit
116
- with open(py_path) as f:
117
- code = f.read()
118
- with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as tmp:
119
- tmp.write(code)
120
- tmp_path = tmp.name
121
- try:
122
- result = subprocess.run([
123
- "python3", tmp_path
124
- ], capture_output=True, text=True, timeout=5)
125
- if result.returncode == 0:
126
- output = result.stdout.strip().split('\n')
127
- return output[-1] if output else ''
128
- else:
129
- logging.error(f"code_analysis subprocess error: {result.stderr}")
130
- return f"Code error: {result.stderr}"
131
- except subprocess.TimeoutExpired:
132
- logging.error("code_analysis timeout")
133
- return "Code execution timed out"
134
- finally:
135
- os.remove(tmp_path)
136
- except Exception as e:
137
- logging.error(f"code_analysis error: {e}")
138
- return f"Code analysis error: {e}"
139
-
140
- def youtube_video_qa(youtube_url, question):
141
- import subprocess
142
- import tempfile
143
- import os
144
- from transformers import pipeline
145
- try:
146
- with tempfile.TemporaryDirectory() as tmpdir:
147
- # Download video
148
- video_path = os.path.join(tmpdir, "video.mp4")
149
- cmd = ["yt-dlp", "-f", "mp4", "-o", video_path, youtube_url]
150
- subprocess.run(cmd, check=True)
151
- # Extract audio for ASR
152
- audio_path = os.path.join(tmpdir, "audio.mp3")
153
- cmd_audio = ["yt-dlp", "-f", "bestaudio", "--extract-audio", "--audio-format", "mp3", "-o", audio_path, youtube_url]
154
- subprocess.run(cmd_audio, check=True)
155
- # Transcribe audio
156
- asr = pipeline("automatic-speech-recognition", model="openai/whisper-base.en")
157
- result = asr(audio_path)
158
- transcript = result["text"]
159
- # Extract frames for vision QA
160
- cap = cv2.VideoCapture(video_path)
161
- frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
162
- fps = int(cap.get(cv2.CAP_PROP_FPS))
163
- frames = []
164
- for i in range(0, frame_count, max(1, fps*5)):
165
- cap.set(cv2.CAP_PROP_POS_FRAMES, i)
166
- ret, frame = cap.read()
167
- if not ret:
168
- break
169
- img = Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
170
- frames.append(img)
171
- cap.release()
172
- # Object detection (YOLOv8)
173
- try:
174
- from ultralytics import YOLO
175
- yolo = YOLO("yolov8n.pt")
176
- detections = []
177
- for img in frames:
178
- results = yolo(np.array(img))
179
- for r in results:
180
- for c in r.boxes.cls:
181
- detections.append(yolo.model.names[int(c)])
182
- detection_summary = {}
183
- for obj in detections:
184
- detection_summary[obj] = detection_summary.get(obj, 0) + 1
185
- except Exception as e:
186
- logging.error(f"YOLOv8 error: {e}")
187
- detection_summary = {}
188
- # Image captioning (BLIP)
189
- try:
190
- from transformers import BlipProcessor, BlipForConditionalGeneration
191
- processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
192
- model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")
193
- captions = []
194
- for img in frames:
195
- inputs = processor(img, return_tensors="pt")
196
- out = model.generate(**inputs)
197
- captions.append(processor.decode(out[0], skip_special_tokens=True))
198
- except Exception as e:
199
- logging.error(f"BLIP error: {e}")
200
- captions = []
201
- # Aggregate and answer
202
- context = f"Transcript: {transcript}\nCaptions: {' | '.join(captions)}\nDetections: {detection_summary}"
203
- answer = extractive_qa(question, context)
204
- return answer
205
- except Exception as e:
206
- logging.error(f"YouTube video QA error: {e}")
207
- return f"Video analysis error: {e}"
208
-
209
- # --- Tool Registry ---
210
- TOOL_REGISTRY = {
211
- "llama3_chat": llama3_chat,
212
- "mixtral_chat": mixtral_chat,
213
- "extractive_qa": extractive_qa,
214
- "table_qa": table_qa,
215
- "asr_transcribe": asr_transcribe,
216
- "image_caption": image_caption,
217
- "code_analysis": code_analysis,
218
- "youtube_video_qa": youtube_video_qa,
219
- }
220
-
221
- class ModularGAIAAgent:
222
- """
223
- Modular GAIA Agent: fetches questions from API, downloads files, routes to tools/LLMs, chains outputs, and formats GAIA-compliant answers.
224
- """
225
- def __init__(self, api_url=DEFAULT_API_URL, tool_registry=TOOL_REGISTRY):
226
- self.api_url = api_url
227
- self.tools = tool_registry
228
- self.reasoning_trace = []
229
- self.file_cache = set(os.listdir('.'))
230
-
231
- def fetch_questions(self, from_api=True, questions_path="Hugging Face Questions") -> List[Dict[str, Any]]:
232
- if from_api:
233
- r = requests.get(f"{self.api_url}/questions")
234
- r.raise_for_status()
235
- return r.json()
236
- else:
237
- with open(questions_path) as f:
238
- data = f.read()
239
- start = data.find("[")
240
- end = data.rfind("]") + 1
241
- questions = json.loads(data[start:end])
242
- return questions
243
-
244
- def download_file(self, file_id, file_name=None):
245
- if not file_name:
246
- file_name = file_id
247
- if file_name in self.file_cache:
248
- return file_name
249
- url = f"{self.api_url}/files/{file_id}"
250
- r = requests.get(url)
251
- if r.status_code == 200:
252
- with open(file_name, "wb") as f:
253
- f.write(r.content)
254
- self.file_cache.add(file_name)
255
- return file_name
256
- else:
257
- self.reasoning_trace.append(f"Failed to download file {file_id} (status {r.status_code})")
258
- return None
259
-
260
- def detect_file_type(self, file_name):
261
- ext = os.path.splitext(file_name)[-1].lower()
262
- if ext in ['.mp3', '.wav', '.flac']:
263
- return 'audio'
264
- elif ext in ['.png', '.jpg', '.jpeg', '.bmp']:
265
- return 'image'
266
- elif ext in ['.py']:
267
- return 'code'
268
- elif ext in ['.xlsx']:
269
- return 'excel'
270
- elif ext in ['.csv']:
271
- return 'csv'
272
- elif ext in ['.json']:
273
- return 'json'
274
- elif ext in ['.txt', '.md']:
275
- return 'text'
276
- else:
277
- return 'unknown'
278
-
279
- def analyze_file(self, file_name, file_type):
280
- if file_type == 'audio':
281
- transcript = self.tools['asr_transcribe'](file_name)
282
- self.reasoning_trace.append(f"Transcribed audio: {transcript[:100]}...")
283
- return transcript
284
- elif file_type == 'image':
285
- caption = self.tools['image_caption'](file_name)
286
- self.reasoning_trace.append(f"Image caption: {caption}")
287
- return caption
288
- elif file_type == 'code':
289
- result = self.tools['code_analysis'](file_name)
290
- self.reasoning_trace.append(f"Code analysis result: {result}")
291
- return result
292
- elif file_type == 'excel':
293
- wb = openpyxl.load_workbook(file_name)
294
- ws = wb.active
295
- data = list(ws.values)
296
- headers = data[0]
297
- table = [dict(zip(headers, row)) for row in data[1:]]
298
- self.reasoning_trace.append(f"Excel table loaded: {table[:2]}...")
299
- return table
300
- elif file_type == 'csv':
301
- df = pd.read_csv(file_name)
302
- table = df.to_dict(orient='records')
303
- self.reasoning_trace.append(f"CSV table loaded: {table[:2]}...")
304
- return table
305
- elif file_type == 'json':
306
- with open(file_name) as f:
307
- data = json.load(f)
308
- self.reasoning_trace.append(f"JSON loaded: {str(data)[:100]}...")
309
- return data
310
- elif file_type == 'text':
311
- with open(file_name) as f:
312
- text = f.read()
313
- self.reasoning_trace.append(f"Text loaded: {text[:100]}...")
314
- return text
315
- else:
316
- self.reasoning_trace.append(f"Unknown file type: {file_name}")
317
- return None
318
-
319
- def answer_question(self, question_obj):
320
- self.reasoning_trace = []
321
- q = question_obj["question"]
322
- file_name = question_obj.get("file_name", "")
323
- file_content = None
324
- file_type = None
325
- # YouTube video question detection
326
- if "youtube.com" in q or "youtu.be" in q:
327
- url = None
328
- for word in q.split():
329
- if "youtube.com" in word or "youtu.be" in word:
330
- url = word.strip().strip(',')
331
- break
332
- if url:
333
- answer = self.tools['youtube_video_qa'](url, q)
334
- self.reasoning_trace.append(f"YouTube video analyzed: {url}")
335
- self.reasoning_trace.append(f"Final answer: {answer}")
336
- return self.format_answer(answer), self.reasoning_trace
337
- if file_name:
338
- file_id = file_name.split('.')[0]
339
- local_file = self.download_file(file_id, file_name)
340
- if local_file:
341
- file_type = self.detect_file_type(local_file)
342
- file_content = self.analyze_file(local_file, file_type)
343
- # Plan: choose tool based on question and file
344
- if file_type == 'audio' or file_type == 'text':
345
- if file_content:
346
- answer = self.tools['extractive_qa'](q, file_content)
347
- else:
348
- answer = self.tools['llama3_chat'](q)
349
- elif file_type == 'excel' or file_type == 'csv':
350
- if file_content:
351
- answer = self.tools['table_qa'](q, file_content)
352
- else:
353
- answer = self.tools['llama3_chat'](q)
354
- elif file_type == 'image':
355
- if file_content:
356
- answer = self.tools['llama3_chat'](f"{q}\nImage description: {file_content}")
357
- else:
358
- answer = self.tools['llama3_chat'](q)
359
- elif file_type == 'code':
360
- answer = file_content
361
- else:
362
- answer = self.tools['llama3_chat'](q)
363
- self.reasoning_trace.append(f"Final answer: {answer}")
364
- return self.format_answer(answer), self.reasoning_trace
365
-
366
- def format_answer(self, answer):
367
- # GAIA compliance: remove extra words, units, articles, etc.
368
- if isinstance(answer, str):
369
- answer = answer.strip().rstrip('.')
370
- # Remove common prefixes
371
- for prefix in ['answer:', 'result:', 'the answer is', 'final answer:', 'response:']:
372
- if answer.lower().startswith(prefix):
373
- answer = answer[len(prefix):].strip()
374
- # Remove articles
375
- import re
376
- answer = re.sub(r'\b(the|a|an)\b ', '', answer, flags=re.IGNORECASE)
377
- # Remove trailing punctuation
378
- answer = answer.strip().rstrip('.')
379
- return answer
380
-
381
- def run(self, from_api=True, questions_path="Hugging Face Questions"):
382
- questions = self.fetch_questions(from_api=from_api, questions_path=questions_path)
383
- results = []
384
- for qobj in questions:
385
- answer, trace = self.answer_question(qobj)
386
- results.append({
387
- "task_id": qobj["task_id"],
388
- "answer": answer,
389
- "reasoning_trace": trace
390
- })
391
- return results
392
-
393
- # --- Usage Example ---
394
- # agent = ModularGAIAAgent()
395
- # results = agent.run()
396
- # for r in results:
397
- # print(r)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
requirements.txt CHANGED
@@ -1,19 +1,13 @@
1
- # Enhanced GAIA Agent Requirements - Essential Functionality
2
- gradio>=5.0.0
3
- pandas==2.1.0
4
- numpy==1.25.2
5
- requests==2.31.0
6
- urllib3==2.0.4
7
- python-dateutil==2.8.2
8
- regex==2023.10.3
9
- beautifulsoup4==4.12.2
10
- pillow==10.0.1
11
  transformers
12
  huggingface_hub
13
- openpyxl
14
- torchaudio
15
- Pillow
16
  opencv-python
17
- torch
18
- ultralytics
19
- pytest
 
1
+ gradio
2
+ requests
3
+ pandas
4
+ numpy
5
+ openpyxl
6
+ pillow
7
+ torch
 
 
 
8
  transformers
9
  huggingface_hub
 
 
 
10
  opencv-python
11
+ beautifulsoup4
12
+ yt-dlp
13
+ ultralytics
tests/test_agent_core.py DELETED
@@ -1,38 +0,0 @@
1
- import pytest
2
- from gaia_agent import ModularGAIAAgent
3
- import os
4
-
5
- @pytest.fixture
6
- def agent():
7
- return ModularGAIAAgent()
8
-
9
- def test_tool_registry(agent):
10
- assert 'llama3_chat' in agent.tools
11
- assert 'extractive_qa' in agent.tools
12
- assert 'youtube_video_qa' in agent.tools
13
-
14
- def test_fetch_questions_api(monkeypatch, agent):
15
- class MockResponse:
16
- def json(self):
17
- return [{"task_id": "1", "question": "What is 2+2?", "file_name": ""}]
18
- def raise_for_status(self):
19
- pass
20
- monkeypatch.setattr("requests.get", lambda url: MockResponse())
21
- questions = agent.fetch_questions(from_api=True)
22
- assert isinstance(questions, list)
23
- assert questions[0]["question"] == "What is 2+2?"
24
-
25
- def test_download_file(monkeypatch, agent, tmp_path):
26
- test_file = tmp_path / "test.txt"
27
- monkeypatch.setattr("requests.get", lambda url: type("R", (), {"status_code": 200, "content": b"hello"})())
28
- fname = agent.download_file("testid", str(test_file))
29
- assert os.path.exists(fname)
30
- with open(fname) as f:
31
- assert f.read() == "hello"
32
-
33
- def test_end_to_end(monkeypatch, agent):
34
- # Mock API and tools for a simple run
35
- monkeypatch.setattr(agent, "fetch_questions", lambda from_api, questions_path=None: [{"task_id": "1", "question": "What is 2+2?", "file_name": ""}])
36
- agent.tools['llama3_chat'] = lambda prompt: "4"
37
- results = agent.run(from_api=True)
38
- assert results[0]["answer"] == "4"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
tests/test_video_qa.py DELETED
@@ -1,22 +0,0 @@
1
- import pytest
2
- from gaia_agent import ModularGAIAAgent
3
-
4
- @pytest.fixture
5
- def agent():
6
- return ModularGAIAAgent()
7
-
8
- def test_youtube_video_qa(monkeypatch, agent):
9
- # Mock subprocess, ASR, YOLO, BLIP, and extractive_qa
10
- monkeypatch.setattr("subprocess.run", lambda *a, **k: None)
11
- monkeypatch.setattr("cv2.VideoCapture", lambda *a, **k: type("C", (), {
12
- "get": lambda self, x: 10 if x == 7 else 1, # 10 frames, 1 fps
13
- "set": lambda self, x, y: None,
14
- "read": lambda self: (True, __import__('numpy').zeros((10,10,3), dtype='uint8')),
15
- "release": lambda self: None
16
- })())
17
- monkeypatch.setattr("PIL.Image.fromarray", lambda arr: arr)
18
- agent.tools['extractive_qa'] = lambda q, c: "bird species: 5"
19
- # Simulate a YouTube question
20
- qobj = {"task_id": "yt1", "question": "In the video https://youtube.com/watch?v=abc123, what is the highest number of bird species to be on camera simultaneously?", "file_name": ""}
21
- answer, trace = agent.answer_question(qobj)
22
- assert "bird species" in answer