Spaces:

onisj
/

jarvis_gaia_agent

Starting

App Files Files Community

onisj commited on May 29

Commit

f95c630

1 Parent(s): 9cd535d

docs(readme): readme updated

Browse files

Files changed (6) hide show

.gitignore +5 -0
README.md +131 -41
app.py +38 -18
requirements.txt +4 -1
test.py +8 -5
tools/search.py +4 -2

.gitignore CHANGED Viewed

@@ -47,6 +47,7 @@ coverage.xml
 *.log.*
 *.tmp
 temp/
 # Dependency directories
 pip-wheel-metadata/
@@ -72,3 +73,7 @@ pip-wheel-metadata/
 *~
 *.bak
 *.old

 *.log.*
 *.tmp
 temp/
+*.json
 # Dependency directories
 pip-wheel-metadata/
 *~
 *.bak
 *.old
+project_struct.txt
+test.py
+result.txt

README.md CHANGED Viewed

@@ -1,62 +1,81 @@
 ---
 title: JARVIS Gaia Agent
-emoji: 🐢
 colorFrom: indigo
 colorTo: green
-sdk: docker
 pinned: false
 license: mit
 short_description: Enhanced JARVIS AI agent for GAIA benchmark
 ---
 # Evolved JARVIS Gaia Agent
-An advanced Python-based AI agent combining `langchain`, `smolagents`, SERPAPI, and OCR for web searches, file parsing, and data retrieval. Deployed as a Hugging Face Space for GAIA benchmark evaluation.
-#### Directory Structure
 ```
 jarvis_gaia_agent/
-├── app.py                  # Main application with Gradio interface and agent logic
-├── state.py                # Defines JARVISState for state management
-├── retriever.py            # Guest info retriever tool
 ├── tools/                  # Directory for all tools
 │   ├── __init__.py         # Exports all tools
-│   ├── search.py           # Web search tools (SERPAPI-based)
-│   ├── file_parser.py      # File parsing tool (CSV, TXT, PDF, Excel)
-│   ├── image_parser.py     # Image parsing tool (OCR)
-│   ├── calculator.py       # Calculator tool
-│   ├── document_retriever.py # Document retrieval tool
-│   ├── duckduckgo_search.py # DuckDuckGo search tool (from smolagents)
-│   ├── weather_info.py     # Weather info tool (OpenWeatherMap)
-│   ├── hub_stats.py        # Hugging Face Hub stats tool
-│   ├── guest_info.py       # Guest info retriever tool (moved from retriever.py)
 ├── requirements.txt        # Python dependencies
-├── Dockerfile              # Docker configuration
 ├── README.md               # Project documentation
-├── .env                    # Environment variables (not committed)
 ```
-## Features
-- **Web Search**: SERPAPI and DuckDuckGo for robust searches.
-- **File Parsing**: Handles CSV, TXT, PDF, and Excel files.
-- **Image Parsing**: OCR with `easyocr` for image-based questions.
-- **Data Retrieval**: Guest info retriever for structured data.
-- **External APIs**: Weather (OpenWeatherMap), Hugging Face Hub stats.
-- **State Management**: `langgraph` for multi-step reasoning.
-- **Exact-Match Answers**: Optimized for GAIA Level 1 questions.
 ## Prerequisites
-- Python 3.11
-- Tesseract OCR (`brew install tesseract` on macOS)
-- API keys in `.env`:
-  - `HUGGINGFACEHUB_API_TOKEN`
-  - `SERPAPI_API_KEY`
-  - `OPENWEATHERMAP_API_KEY`
-  - `SPACE_ID`
-## Setup
 1. **Clone the Repository**:
    ```bash
@@ -64,16 +83,87 @@ jarvis_gaia_agent/
    cd jarvis_gaia_agent
    ```
-2. **Set Up Environment Variables**:
-   Create a `.env` file with your API keys.
-3. **Run Locally**:
    ```bash
    pip install -r requirements.txt
    python app.py
    ```
-4. **Deploy to Hugging Face Space**:
-   - Push code to your Space.
-   - Set environment variables in Space settings.
-   - Run evaluation via Gradio interface.

 ---
 title: JARVIS Gaia Agent
+emoji: 🦾
 colorFrom: indigo
 colorTo: green
+sdk: gradio
 pinned: false
 license: mit
 short_description: Enhanced JARVIS AI agent for GAIA benchmark
+models:
+  - meta-llama/Llama-3.2-1B-Instruct
+  - sentence-transformers/all-MiniLM-L6-v2
+datasets:
+  - gaia-benchmark/GAIA
 ---
 # Evolved JARVIS Gaia Agent
+An advanced Python-based AI agent built with `langchain`, `langgraph`, SERPAPI, and OCR capabilities for web searches, file parsing, image analysis, and data retrieval. Deployed as a Hugging Face Space (`onisj/jarvis_gaia_agent`) for evaluating performance on the GAIA benchmark, targeting a score >30% (6/20 correct).
+## Features
+- **Web Search**: Integrates SERPAPI and DuckDuckGo for robust, multi-hop searches.
+- **File Parsing**: Processes CSV, TXT, Excel, and PDF files for GAIA tasks.
+- **Image Parsing**: Uses OCR (`easyocr`) to extract text from images.
+- **Data Retrieval**: Includes a guest info retriever for structured queries.
+- **External APIs**: Supports weather data (OpenWeatherMap) and Hugging Face Hub stats.
+- **State Management**: Employs `langgraph` for multi-step reasoning workflows.
+- **Exact-Match Answers**: Optimized for GAIA Level 1 questions with precise formatting (e.g., USD to two decimals, comma-separated lists).
+- **Gradio Interface**: Provides a user-friendly UI for running evaluations and submitting answers.
+## Directory Structure
 ```
 jarvis_gaia_agent/
+├── app.py                  # Main Gradio application with agent logic
+├── state.py                # Defines JARVISState for LangGraph state management
+├── search.py               # Web search tools (SERPAPI, multi-hop search)
 ├── tools/                  # Directory for all tools
 │   ├── __init__.py         # Exports all tools
+│   ├── file_parser.py      # Parses CSV, TXT, Excel, and PDF files
+│   ├── image_parser.py     # OCR-based image parsing
+│   ├── calculator.py       # Mathematical calculations
+│   ├── document_retriever.py # PDF document retrieval
+│   ├── duckduckgo_search.py # DuckDuckGo search integration
+│   ├── weather_info.py     # Weather data via OpenWeatherMap
+│   ├── hub_stats.py        # Hugging Face Hub statistics
+│   ├── guest_info.py       # Guest information retrieval
 ├── requirements.txt        # Python dependencies
 ├── README.md               # Project documentation
+├── .gitignore              # Excludes .env, temp/, etc.
+├── temp/                   # Temporary directory for GAIA files (created at runtime)
 ```
+## Models and Datasets
+- **Models**:
+  - `meta-llama/Llama-3.2-1B-Instruct`: Primary LLM for reasoning and tool selection (Hugging Face Inference API or local).
+  - `sentence-transformers/all-MiniLM-L6-v2`: Embedding model for text similarity tasks.
+  - Note: Together AI models (`meta-llama/Llama-3.3-70B-Instruct-Turbo-Free`, `deepseek-ai/DeepSeek-R1-Distill-Llama-70B-free`) are used via API but not hosted on Hugging Face, so they’re not listed in metadata.
+- **Datasets**:
+  - `gaia-benchmark/GAIA`: Benchmark dataset for evaluating agent performance.
 ## Prerequisites
+- **Python**: 3.9 or higher.
+- **Tesseract OCR**: Required for image parsing.
+  - macOS: `brew install tesseract`
+  - Ubuntu: `sudo apt-get install tesseract-ocr`
+  - Windows: Install via [Tesseract Installer](https://github.com/UB-Mannheim/tesseract/wiki).
+- **API Keys**: Set in `.env` (local) or Hugging Face Space Secrets (deployment):
+  - `HUGGINGFACEHUB_API_TOKEN`: Hugging Face token for model access.
+  - `TOGETHER_API_KEY`: Together AI API key for LLM inference.
+  - `SERPAPI_API_KEY`: SERPAPI key for web searches.
+  - `OPENWEATHERMAP_API_KEY`: OpenWeatherMap key for weather queries.
+  - `SPACE_ID`: `onisj/jarvis_gaia_agent`.
+## Setup and Local Testing
 1. **Clone the Repository**:
    ```bash
    cd jarvis_gaia_agent
    ```
+2. **Create Virtual Environment**:
+   ```bash
+   python -m venv venv
+   source venv/bin/activate  # Windows: venv\Scripts\activate
+   ```
+3. **Install Dependencies**:
    ```bash
    pip install -r requirements.txt
+   ```
+4. **Configure Environment Variables**:
+   Create a `.env` file:
+   ```text
+   SPACE_ID=onisj/jarvis_gaia_agent
+   HUGGINGFACEHUB_API_TOKEN=your_hf_token
+   TOGETHER_API_KEY=your_together_api_key
+   SERPAPI_API_KEY=your_serpapi_key
+   OPENWEATHERMAP_API_KEY=your_openweather_key
+   ```
+5. **Test with Mock File** (optional):
+   ```bash
+   mkdir temp
+   echo "Item,Type,Sales\nBurger,Food,1000\nCola,Drink,500" > temp/7bd855d8-463d-4ed5-93ca-5fe35145f733.xlsx
+   ```
+6. **Run Locally**:
+   ```bash
    python app.py
    ```
+   - Open `http://127.0.0.1:7860` (port may vary).
+   - Log in with Hugging Face credentials.
+   - Click “Run Evaluation & Submit All Answers” to test GAIA tasks.
+## Deployment to Hugging Face Space
+1. **Push Code**:
+   ```bash
+   git add .
+   git commit -m "Update JARVIS Gaia Agent with README metadata"
+   git push origin main
+   ```
+2. **Set Space Secrets**:
+   - Go to `https://huggingface.co/spaces/onisj/jarvis_gaia_agent` > Settings > Repository Secrets.
+   - Add:
+     - `SPACE_ID`: `onisj/jarvis_gaia_agent`
+     - `HUGGINGFACEHUB_API_TOKEN`
+     - `TOGETHER_API_KEY`
+     - `SERPAPI_API_KEY`
+     - `OPENWEATHERMAP_API_KEY`
+3. **Build and Run**:
+   - Hugging Face auto-builds the Space after pushing.
+   - Access the Gradio interface at `https://onisj-jarvis-gaia-agent.hf.space`.
+   - Log in and click “Run Evaluation & Submit All Answers” to submit GAIA answers.
+4. **Verify Submission**:
+   - Check `status_output` for:
+     ```
+     Submission Successful!
+     User: your_username
+     Overall Score: XX% (Y/20 correct)
+     Message: ...
+     ```
+   - Aim for >30% (6/20 correct).
+## Troubleshooting
+- **Model Access (404)**: Verify API keys; test `initialize_llm` locally.
+- **SERPAPI Timeout**: Ensure `SERPAPI_API_KEY` is valid; check `search.py` logs.
+- **GAIA File Access**: Confirm `temp/` directory permissions; test `download_file`.
+- **Low GAIA Score**: Analyze `results_table` for errors; enhance `multi_hop_search_tool` or answer formatting.
+- **Logs**: Check Space > Settings > Logs for build/run errors.
+## License
+MIT License. See [LICENSE](LICENSE) for details.
+## Acknowledgements
+- Built with `langchain`, `langgraph`, and Hugging Face tools.
+- Evaluated on the GAIA benchmark (`gaia-benchmark/GAIA`).

app.py CHANGED Viewed

@@ -15,6 +15,7 @@ import gradio as gr
 from dotenv import load_dotenv
 from huggingface_hub import InferenceClient
 from transformers import AutoTokenizer, AutoModelForCausalLM
 from state import JARVISState
 from tools import (
     search_tool, multi_hop_search_tool, file_parser_tool, image_parser_tool,
@@ -55,24 +56,23 @@ HF_MODEL = "meta-llama/Llama-3.2-1B-Instruct"
 # Initialize LLM clients
 def initialize_llm():
     for model in TOGETHER_MODELS:
         try:
-            client = InferenceClient(
-                model=model,
-                api_key=TOGETHER_API_KEY,
-                base_url="https://api.together.ai/v1",
-                timeout=30
-            )
-            client.chat.completions.create(
                 model=model,
                 messages=[{"role": "user", "content": "Test"}],
-                max_tokens=10,
             )
             logger.info(f"Initialized Together AI model: {model}")
             return client, "together"
         except Exception as e:
-            logger.warning(f"Failed to initialize {model}: {e}")
     try:
         client = InferenceClient(
             model=HF_MODEL,
@@ -84,9 +84,10 @@ def initialize_llm():
     except Exception as e:
         logger.warning(f"Failed to initialize HF Inference API: {e}")
     try:
         tokenizer = AutoTokenizer.from_pretrained(HF_MODEL, token=HF_API_TOKEN)
-        model = AutoModelForCausalLM.from_pretrained(HF_MODEL, token=HF_API_TOKEN, device_map="mps")
         logger.info(f"Initialized local Hugging Face model: {HF_MODEL}")
         return (model, tokenizer), "hf_local"
     except Exception as e:
@@ -155,13 +156,24 @@ async def parse_question(state: JARVISState) -> JARVISState:
                     inputs = tokenizer.apply_chat_template(
                         [{"role": "system", "content": prompt[0].content}, {"role": "user", "content": prompt[1].content}],
                         return_tensors="pt"
-                    ).to("mps")
                     outputs = model.generate(inputs, max_new_tokens=512, temperature=0.7)
                     response = tokenizer.decode(outputs[0], skip_special_tokens=True)
                     tools_needed = json.loads(response.strip())
-                else:
                     response = llm_client.chat.completions.create(
-                        model=llm_client.model if llm_type == "together" else HF_MODEL,
                         messages=[
                             {"role": "system", "content": prompt[0].content},
                             {"role": "user", "content": prompt[1].content}
@@ -322,12 +334,20 @@ Document results: {document_results}""")
             try:
                 if llm_type == "hf_local":
                     model, tokenizer = llm_client
-                    inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("mps")
                     outputs = model.generate(inputs, max_new_tokens=512, temperature=0.7)
                     answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
-                else:
                     response = llm_client.chat.completions.create(
-                        model=llm_client.model if llm_type == "together" else HF_MODEL,
                         messages=messages,
                         max_tokens=512,
                         temperature=0.7
@@ -518,8 +538,8 @@ with gr.Blocks() as demo:
         """
     )
     with gr.Row():
-        gr.LoginButton()
-        gr.LogoutButton()
     run_button = gr.Button("Run Evaluation & Submit All Answers")
     status_output = gr.Textbox(label="Run Status / Submission Result", lines=5, interactive=False)
     results_table = gr.DataFrame(label="Questions and Answers", wrap=True, headers=["Task ID", "Question", "Answer"])

 from dotenv import load_dotenv
 from huggingface_hub import InferenceClient
 from transformers import AutoTokenizer, AutoModelForCausalLM
+import together
 from state import JARVISState
 from tools import (
     search_tool, multi_hop_search_tool, file_parser_tool, image_parser_tool,
 # Initialize LLM clients
 def initialize_llm():
+    # Try Together AI models
     for model in TOGETHER_MODELS:
         try:
+            together.api_key = TOGETHER_API_KEY
+            client = together.Together()
+            # Test the model
+            response = client.chat.completions.create(
                 model=model,
                 messages=[{"role": "user", "content": "Test"}],
+                max_tokens=10
             )
             logger.info(f"Initialized Together AI model: {model}")
             return client, "together"
         except Exception as e:
+            logger.warning(f"Failed to initialize Together AI model {model}: {e}")
+    # Fallback to Hugging Face Inference API
     try:
         client = InferenceClient(
             model=HF_MODEL,
     except Exception as e:
         logger.warning(f"Failed to initialize HF Inference API: {e}")
+    # Fallback to local Hugging Face model
     try:
         tokenizer = AutoTokenizer.from_pretrained(HF_MODEL, token=HF_API_TOKEN)
+        model = AutoModelForCausalLM.from_pretrained(HF_MODEL, token=HF_API_TOKEN, device_map="auto")
         logger.info(f"Initialized local Hugging Face model: {HF_MODEL}")
         return (model, tokenizer), "hf_local"
     except Exception as e:
                     inputs = tokenizer.apply_chat_template(
                         [{"role": "system", "content": prompt[0].content}, {"role": "user", "content": prompt[1].content}],
                         return_tensors="pt"
+                    ).to(model.device)
                     outputs = model.generate(inputs, max_new_tokens=512, temperature=0.7)
                     response = tokenizer.decode(outputs[0], skip_special_tokens=True)
                     tools_needed = json.loads(response.strip())
+                elif llm_type == "together":
                     response = llm_client.chat.completions.create(
+                        model=llm_client.model,
+                        messages=[
+                            {"role": "system", "content": prompt[0].content},
+                            {"role": "user", "content": prompt[1].content}
+                        ],
+                        max_tokens=512,
+                        temperature=0.7
+                    )
+                    tools_needed = json.loads(response.choices[0].message.content.strip())
+                else:  # hf_api
+                    response = llm_client.chat.completions.create(
+                        model=HF_MODEL,
                         messages=[
                             {"role": "system", "content": prompt[0].content},
                             {"role": "user", "content": prompt[1].content}
             try:
                 if llm_type == "hf_local":
                     model, tokenizer = llm_client
+                    inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
                     outputs = model.generate(inputs, max_new_tokens=512, temperature=0.7)
                     answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
+                elif llm_type == "together":
+                    response = llm_client.chat.completions.create(
+                        model=llm_client.model,
+                        messages=messages,
+                        max_tokens=512,
+                        temperature=0.7
+                    )
+                    answer = response.choices[0].message.content.strip()
+                else:  # hf_api
                     response = llm_client.chat.completions.create(
+                        model=HF_MODEL,
                         messages=messages,
                         max_tokens=512,
                         temperature=0.7
         """
     )
     with gr.Row():
+        gr.LoginButton(value="Login to Hugging Face")
+        # Removed gr.LogoutButton due to deprecation
     run_button = gr.Button("Run Evaluation & Submit All Answers")
     status_output = gr.Textbox(label="Run Status / Submission Result", lines=5, interactive=False)
     results_table = gr.DataFrame(label="Questions and Answers", wrap=True, headers=["Task ID", "Question", "Answer"])

requirements.txt CHANGED Viewed

@@ -20,4 +20,7 @@ transformers
 asyncio
 serpapi
 duckduckgo-search
-torch

 asyncio
 serpapi
 duckduckgo-search
+torch
+together
+google-search-results
+beautifulsoup4

test.py CHANGED Viewed

@@ -1,7 +1,10 @@
-import os
-import requests
-headers = {"Authorization": f"Bearer {os.getenv('TOGETHER_API_KEY')}"}
-response = requests.get("https://api.together.ai/models", headers=headers)
-print(response.json())

+from serpapi import GoogleSearch
+params = {
+  "q": "drop shipping",
+  "api_key": "e44c79583cac0e507fee32d564f190b7290a313d886edd5ba5fccc93df932733"
+}
+search = GoogleSearch(params)
+results = search.get_dict()
+ai_overview = results["ai_overview"]

tools/search.py CHANGED Viewed

@@ -1,7 +1,9 @@
 import os
-from serpapi import GoogleSearch
-from langchain.tools import Tool
 import asyncio
 from typing import List, Dict, Any
 from langchain_core.prompts import ChatPromptTemplate
 from langchain_core.messages import SystemMessage, HumanMessage

 import os
+import json
 import asyncio
+# from serpapi import GoogleSearch
+from google_search_results import GoogleSearch
+from langchain.tools import Tool
 from typing import List, Dict, Any
 from langchain_core.prompts import ChatPromptTemplate
 from langchain_core.messages import SystemMessage, HumanMessage