Spaces:
Runtime error
A newer version of the Gradio SDK is available:
5.6.0
GraphRAG API
This README provides a detailed guide on the api.py
file, which serves as the API interface for the GraphRAG (Graph Retrieval-Augmented Generation) system. GraphRAG is a powerful tool that combines graph-based knowledge representation with retrieval-augmented generation techniques to provide context-aware responses to queries.
Table of Contents
- Overview
- Setup
- API Endpoints
- Data Models
- Core Functionality
- Usage Examples
- Configuration
- Troubleshooting
Overview
The api.py
file implements a FastAPI-based server that provides various endpoints for interacting with the GraphRAG system. It supports different types of queries, including direct chat, GraphRAG-specific queries, DuckDuckGo searches, and a combined full-model search.
Key features:
- Multiple query types (local and global searches)
- Context caching for improved performance
- Background tasks for long-running operations
- Customizable settings through environment variables and config files
- Integration with external services (e.g., Ollama for LLM interactions)
Setup
Install dependencies:
pip install -r requirements.txt
Set up environment variables: Create a
.env
file in theindexing
directory with the following variables:LLM_API_BASE=<your_llm_api_base_url> LLM_MODEL=<your_llm_model> LLM_PROVIDER=<llm_provider> EMBEDDINGS_API_BASE=<your_embeddings_api_base_url> EMBEDDINGS_MODEL=<your_embeddings_model> EMBEDDINGS_PROVIDER=<embeddings_provider> INPUT_DIR=./indexing/output ROOT_DIR=indexing API_PORT=8012
Run the API server:
python api.py --host 0.0.0.0 --port 8012
API Endpoints
/v1/chat/completions
(POST)
Main endpoint for chat completions. Supports different models:
direct-chat
: Direct interaction with the LLMgraphrag-local-search:latest
: Local search using GraphRAGgraphrag-global-search:latest
: Global search using GraphRAGduckduckgo-search:latest
: Web search using DuckDuckGofull-model:latest
: Combined search using all available models
/v1/prompt_tune
(POST)
Initiates prompt tuning process in the background.
/v1/prompt_tune_status
(GET)
Retrieves the status and logs of the prompt tuning process.
/v1/index
(POST)
Starts the indexing process for GraphRAG in the background.
/v1/index_status
(GET)
Retrieves the status and logs of the indexing process.
/health
(GET)
Health check endpoint.
/v1/models
(GET)
Lists available models.
Data Models
The API uses several Pydantic models for request and response handling:
Message
: Represents a chat message with role and content.QueryOptions
: Options for GraphRAG queries, including query type, preset, and community level.ChatCompletionRequest
: Request model for chat completions.ChatCompletionResponse
: Response model for chat completions.PromptTuneRequest
: Request model for prompt tuning.IndexingRequest
: Request model for indexing.
Core Functionality
Context Loading
The load_context
function loads necessary data for GraphRAG queries, including entities, relationships, reports, text units, and covariates.
Search Engine Setup
setup_search_engines
initializes both local and global search engines using the loaded context data.
Query Execution
Different query types are handled by separate functions:
run_direct_chat
: Sends queries directly to the LLM.run_graphrag_query
: Executes GraphRAG queries (local or global).run_duckduckgo_search
: Performs web searches using DuckDuckGo.run_full_model_search
: Combines results from all search types.
Background Tasks
Long-running tasks like prompt tuning and indexing are executed as background tasks to prevent blocking the API.
Usage Examples
Sending a GraphRAG Query
import requests
url = "http://localhost:8012/v1/chat/completions"
payload = {
"model": "graphrag-local-search:latest",
"messages": [{"role": "user", "content": "What is GraphRAG?"}],
"query_options": {
"query_type": "local-search",
"selected_folder": "your_indexed_folder",
"community_level": 2,
"response_type": "Multiple Paragraphs"
}
}
response = requests.post(url, json=payload)
print(response.json())
Starting Indexing Process
import requests
url = "http://localhost:8012/v1/index"
payload = {
"llm_model": "your_llm_model",
"embed_model": "your_embed_model",
"root": "./indexing",
"verbose": True,
"emit": ["parquet", "csv"]
}
response = requests.post(url, json=payload)
print(response.json())
Configuration
The API can be configured through:
- Environment variables
- A
config.yaml
file (path specified byGRAPHRAG_CONFIG
environment variable) - Command-line arguments when starting the server
Key configuration options:
llm_model
: The language model to useembedding_model
: The embedding model for vector representationscommunity_level
: Depth of community analysis in GraphRAGtoken_limit
: Maximum tokens for contextapi_key
: API key for LLM serviceapi_base
: Base URL for LLM APIapi_type
: Type of API (e.g., "openai")
Troubleshooting
- If you encounter connection errors with Ollama, ensure the service is running and accessible.
- For "context loading failed" errors, check that the indexed data is present in the specified output folder.
- If prompt tuning or indexing processes fail, review the logs using the respective status endpoints.
- For performance issues, consider adjusting the
community_level
andtoken_limit
settings.
For more detailed information on GraphRAG's indexing and querying processes, refer to the official GraphRAG documentation.