# GraphRAG API This README provides a detailed guide on the `api.py` file, which serves as the API interface for the GraphRAG (Graph Retrieval-Augmented Generation) system. GraphRAG is a powerful tool that combines graph-based knowledge representation with retrieval-augmented generation techniques to provide context-aware responses to queries. ## Table of Contents 1. [Overview](#overview) 2. [Setup](#setup) 3. [API Endpoints](#api-endpoints) 4. [Data Models](#data-models) 5. [Core Functionality](#core-functionality) 6. [Usage Examples](#usage-examples) 7. [Configuration](#configuration) 8. [Troubleshooting](#troubleshooting) ## Overview The `api.py` file implements a FastAPI-based server that provides various endpoints for interacting with the GraphRAG system. It supports different types of queries, including direct chat, GraphRAG-specific queries, DuckDuckGo searches, and a combined full-model search. Key features: - Multiple query types (local and global searches) - Context caching for improved performance - Background tasks for long-running operations - Customizable settings through environment variables and config files - Integration with external services (e.g., Ollama for LLM interactions) ## Setup 1. Install dependencies: ``` pip install -r requirements.txt ``` 2. Set up environment variables: Create a `.env` file in the `indexing` directory with the following variables: ``` LLM_API_BASE= LLM_MODEL= LLM_PROVIDER= EMBEDDINGS_API_BASE= EMBEDDINGS_MODEL= EMBEDDINGS_PROVIDER= INPUT_DIR=./indexing/output ROOT_DIR=indexing API_PORT=8012 ``` 3. Run the API server: ``` python api.py --host 0.0.0.0 --port 8012 ``` ## API Endpoints ### `/v1/chat/completions` (POST) Main endpoint for chat completions. Supports different models: - `direct-chat`: Direct interaction with the LLM - `graphrag-local-search:latest`: Local search using GraphRAG - `graphrag-global-search:latest`: Global search using GraphRAG - `duckduckgo-search:latest`: Web search using DuckDuckGo - `full-model:latest`: Combined search using all available models ### `/v1/prompt_tune` (POST) Initiates prompt tuning process in the background. ### `/v1/prompt_tune_status` (GET) Retrieves the status and logs of the prompt tuning process. ### `/v1/index` (POST) Starts the indexing process for GraphRAG in the background. ### `/v1/index_status` (GET) Retrieves the status and logs of the indexing process. ### `/health` (GET) Health check endpoint. ### `/v1/models` (GET) Lists available models. ## Data Models The API uses several Pydantic models for request and response handling: - `Message`: Represents a chat message with role and content. - `QueryOptions`: Options for GraphRAG queries, including query type, preset, and community level. - `ChatCompletionRequest`: Request model for chat completions. - `ChatCompletionResponse`: Response model for chat completions. - `PromptTuneRequest`: Request model for prompt tuning. - `IndexingRequest`: Request model for indexing. ## Core Functionality ### Context Loading The `load_context` function loads necessary data for GraphRAG queries, including entities, relationships, reports, text units, and covariates. ### Search Engine Setup `setup_search_engines` initializes both local and global search engines using the loaded context data. ### Query Execution Different query types are handled by separate functions: - `run_direct_chat`: Sends queries directly to the LLM. - `run_graphrag_query`: Executes GraphRAG queries (local or global). - `run_duckduckgo_search`: Performs web searches using DuckDuckGo. - `run_full_model_search`: Combines results from all search types. ### Background Tasks Long-running tasks like prompt tuning and indexing are executed as background tasks to prevent blocking the API. ## Usage Examples ### Sending a GraphRAG Query ```python import requests url = "http://localhost:8012/v1/chat/completions" payload = { "model": "graphrag-local-search:latest", "messages": [{"role": "user", "content": "What is GraphRAG?"}], "query_options": { "query_type": "local-search", "selected_folder": "your_indexed_folder", "community_level": 2, "response_type": "Multiple Paragraphs" } } response = requests.post(url, json=payload) print(response.json()) ``` ### Starting Indexing Process ```python import requests url = "http://localhost:8012/v1/index" payload = { "llm_model": "your_llm_model", "embed_model": "your_embed_model", "root": "./indexing", "verbose": True, "emit": ["parquet", "csv"] } response = requests.post(url, json=payload) print(response.json()) ``` ## Configuration The API can be configured through: 1. Environment variables 2. A `config.yaml` file (path specified by `GRAPHRAG_CONFIG` environment variable) 3. Command-line arguments when starting the server Key configuration options: - `llm_model`: The language model to use - `embedding_model`: The embedding model for vector representations - `community_level`: Depth of community analysis in GraphRAG - `token_limit`: Maximum tokens for context - `api_key`: API key for LLM service - `api_base`: Base URL for LLM API - `api_type`: Type of API (e.g., "openai") ## Troubleshooting 1. If you encounter connection errors with Ollama, ensure the service is running and accessible. 2. For "context loading failed" errors, check that the indexed data is present in the specified output folder. 3. If prompt tuning or indexing processes fail, review the logs using the respective status endpoints. 4. For performance issues, consider adjusting the `community_level` and `token_limit` settings. For more detailed information on GraphRAG's indexing and querying processes, refer to the official GraphRAG documentation.