sichaolong commited on
Commit
e331e72
1 Parent(s): 2d119be

Upload folder using huggingface_hub

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .DS_Store +0 -0
  2. API_README.md +170 -0
  3. EMBEDDING_PROXY_README.md +36 -0
  4. INDEX_APP_README.md +127 -0
  5. LICENSE +21 -0
  6. README.md +263 -7
  7. __pycache__/api.cpython-310.pyc +0 -0
  8. __pycache__/embedding_proxy.cpython-310.pyc +0 -0
  9. __pycache__/web.cpython-310.pyc +0 -0
  10. api.py +943 -0
  11. app.py +1786 -0
  12. css +242 -0
  13. embedding_proxy.py +62 -0
  14. env-example.txt +19 -0
  15. graphrag/.github/ISSUE_TEMPLATE.md +69 -0
  16. graphrag/.github/ISSUE_TEMPLATE/bug_report.yml +57 -0
  17. graphrag/.github/ISSUE_TEMPLATE/config.yml +1 -0
  18. graphrag/.github/ISSUE_TEMPLATE/feature_request.yml +26 -0
  19. graphrag/.github/ISSUE_TEMPLATE/general_issue.yml +51 -0
  20. graphrag/.github/dependabot.yml +19 -0
  21. graphrag/.github/pull_request_template.md +36 -0
  22. graphrag/.github/workflows/gh-pages.yml +97 -0
  23. graphrag/.github/workflows/javascript-ci.yml +30 -0
  24. graphrag/.github/workflows/python-ci.yml +122 -0
  25. graphrag/.github/workflows/python-publish.yml +52 -0
  26. graphrag/.github/workflows/semver.yml +15 -0
  27. graphrag/.github/workflows/spellcheck.yml +15 -0
  28. graphrag/.gitignore +68 -0
  29. graphrag/.semversioner/0.1.0.json +10 -0
  30. graphrag/.semversioner/next-release/minor-20240710183748086411.json +4 -0
  31. graphrag/.semversioner/next-release/patch-20240701233152787373.json +4 -0
  32. graphrag/.semversioner/next-release/patch-20240703152422358587.json +4 -0
  33. graphrag/.semversioner/next-release/patch-20240703182750529114.json +4 -0
  34. graphrag/.semversioner/next-release/patch-20240704181236015699.json +4 -0
  35. graphrag/.semversioner/next-release/patch-20240705184142723331.json +4 -0
  36. graphrag/.semversioner/next-release/patch-20240705235656897489.json +4 -0
  37. graphrag/.semversioner/next-release/patch-20240707063053679262.json +4 -0
  38. graphrag/.semversioner/next-release/patch-20240709225514193665.json +4 -0
  39. graphrag/.semversioner/next-release/patch-20240710114442871595.json +4 -0
  40. graphrag/.semversioner/next-release/patch-20240710165603516866.json +4 -0
  41. graphrag/.semversioner/next-release/patch-20240711004716103302.json +4 -0
  42. graphrag/.semversioner/next-release/patch-20240711092703710242.json +4 -0
  43. graphrag/.semversioner/next-release/patch-20240711223132221685.json +4 -0
  44. graphrag/.semversioner/next-release/patch-20240712035356859335.json +4 -0
  45. graphrag/.semversioner/next-release/patch-20240712210400518089.json +4 -0
  46. graphrag/.semversioner/next-release/patch-20240712235357550877.json +4 -0
  47. graphrag/.semversioner/next-release/patch-20240716225953784804.json +4 -0
  48. graphrag/.vsts-ci.yml +41 -0
  49. graphrag/CODEOWNERS +6 -0
  50. graphrag/CODE_OF_CONDUCT.md +9 -0
.DS_Store ADDED
Binary file (6.15 kB). View file
 
API_README.md ADDED
@@ -0,0 +1,170 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # GraphRAG API
2
+
3
+ This README provides a detailed guide on the `api.py` file, which serves as the API interface for the GraphRAG (Graph Retrieval-Augmented Generation) system. GraphRAG is a powerful tool that combines graph-based knowledge representation with retrieval-augmented generation techniques to provide context-aware responses to queries.
4
+
5
+ ## Table of Contents
6
+
7
+ 1. [Overview](#overview)
8
+ 2. [Setup](#setup)
9
+ 3. [API Endpoints](#api-endpoints)
10
+ 4. [Data Models](#data-models)
11
+ 5. [Core Functionality](#core-functionality)
12
+ 6. [Usage Examples](#usage-examples)
13
+ 7. [Configuration](#configuration)
14
+ 8. [Troubleshooting](#troubleshooting)
15
+
16
+ ## Overview
17
+
18
+ The `api.py` file implements a FastAPI-based server that provides various endpoints for interacting with the GraphRAG system. It supports different types of queries, including direct chat, GraphRAG-specific queries, DuckDuckGo searches, and a combined full-model search.
19
+
20
+ Key features:
21
+ - Multiple query types (local and global searches)
22
+ - Context caching for improved performance
23
+ - Background tasks for long-running operations
24
+ - Customizable settings through environment variables and config files
25
+ - Integration with external services (e.g., Ollama for LLM interactions)
26
+
27
+ ## Setup
28
+
29
+ 1. Install dependencies:
30
+ ```
31
+ pip install -r requirements.txt
32
+ ```
33
+
34
+ 2. Set up environment variables:
35
+ Create a `.env` file in the `indexing` directory with the following variables:
36
+ ```
37
+ LLM_API_BASE=<your_llm_api_base_url>
38
+ LLM_MODEL=<your_llm_model>
39
+ LLM_PROVIDER=<llm_provider>
40
+ EMBEDDINGS_API_BASE=<your_embeddings_api_base_url>
41
+ EMBEDDINGS_MODEL=<your_embeddings_model>
42
+ EMBEDDINGS_PROVIDER=<embeddings_provider>
43
+ INPUT_DIR=./indexing/output
44
+ ROOT_DIR=indexing
45
+ API_PORT=8012
46
+ ```
47
+
48
+ 3. Run the API server:
49
+ ```
50
+ python api.py --host 0.0.0.0 --port 8012
51
+ ```
52
+
53
+ ## API Endpoints
54
+
55
+ ### `/v1/chat/completions` (POST)
56
+ Main endpoint for chat completions. Supports different models:
57
+ - `direct-chat`: Direct interaction with the LLM
58
+ - `graphrag-local-search:latest`: Local search using GraphRAG
59
+ - `graphrag-global-search:latest`: Global search using GraphRAG
60
+ - `duckduckgo-search:latest`: Web search using DuckDuckGo
61
+ - `full-model:latest`: Combined search using all available models
62
+
63
+ ### `/v1/prompt_tune` (POST)
64
+ Initiates prompt tuning process in the background.
65
+
66
+ ### `/v1/prompt_tune_status` (GET)
67
+ Retrieves the status and logs of the prompt tuning process.
68
+
69
+ ### `/v1/index` (POST)
70
+ Starts the indexing process for GraphRAG in the background.
71
+
72
+ ### `/v1/index_status` (GET)
73
+ Retrieves the status and logs of the indexing process.
74
+
75
+ ### `/health` (GET)
76
+ Health check endpoint.
77
+
78
+ ### `/v1/models` (GET)
79
+ Lists available models.
80
+
81
+ ## Data Models
82
+
83
+ The API uses several Pydantic models for request and response handling:
84
+
85
+ - `Message`: Represents a chat message with role and content.
86
+ - `QueryOptions`: Options for GraphRAG queries, including query type, preset, and community level.
87
+ - `ChatCompletionRequest`: Request model for chat completions.
88
+ - `ChatCompletionResponse`: Response model for chat completions.
89
+ - `PromptTuneRequest`: Request model for prompt tuning.
90
+ - `IndexingRequest`: Request model for indexing.
91
+
92
+ ## Core Functionality
93
+
94
+ ### Context Loading
95
+ The `load_context` function loads necessary data for GraphRAG queries, including entities, relationships, reports, text units, and covariates.
96
+
97
+ ### Search Engine Setup
98
+ `setup_search_engines` initializes both local and global search engines using the loaded context data.
99
+
100
+ ### Query Execution
101
+ Different query types are handled by separate functions:
102
+ - `run_direct_chat`: Sends queries directly to the LLM.
103
+ - `run_graphrag_query`: Executes GraphRAG queries (local or global).
104
+ - `run_duckduckgo_search`: Performs web searches using DuckDuckGo.
105
+ - `run_full_model_search`: Combines results from all search types.
106
+
107
+ ### Background Tasks
108
+ Long-running tasks like prompt tuning and indexing are executed as background tasks to prevent blocking the API.
109
+
110
+ ## Usage Examples
111
+
112
+ ### Sending a GraphRAG Query
113
+ ```python
114
+ import requests
115
+
116
+ url = "http://localhost:8012/v1/chat/completions"
117
+ payload = {
118
+ "model": "graphrag-local-search:latest",
119
+ "messages": [{"role": "user", "content": "What is GraphRAG?"}],
120
+ "query_options": {
121
+ "query_type": "local-search",
122
+ "selected_folder": "your_indexed_folder",
123
+ "community_level": 2,
124
+ "response_type": "Multiple Paragraphs"
125
+ }
126
+ }
127
+ response = requests.post(url, json=payload)
128
+ print(response.json())
129
+ ```
130
+
131
+ ### Starting Indexing Process
132
+ ```python
133
+ import requests
134
+
135
+ url = "http://localhost:8012/v1/index"
136
+ payload = {
137
+ "llm_model": "your_llm_model",
138
+ "embed_model": "your_embed_model",
139
+ "root": "./indexing",
140
+ "verbose": True,
141
+ "emit": ["parquet", "csv"]
142
+ }
143
+ response = requests.post(url, json=payload)
144
+ print(response.json())
145
+ ```
146
+
147
+ ## Configuration
148
+
149
+ The API can be configured through:
150
+ 1. Environment variables
151
+ 2. A `config.yaml` file (path specified by `GRAPHRAG_CONFIG` environment variable)
152
+ 3. Command-line arguments when starting the server
153
+
154
+ Key configuration options:
155
+ - `llm_model`: The language model to use
156
+ - `embedding_model`: The embedding model for vector representations
157
+ - `community_level`: Depth of community analysis in GraphRAG
158
+ - `token_limit`: Maximum tokens for context
159
+ - `api_key`: API key for LLM service
160
+ - `api_base`: Base URL for LLM API
161
+ - `api_type`: Type of API (e.g., "openai")
162
+
163
+ ## Troubleshooting
164
+
165
+ 1. If you encounter connection errors with Ollama, ensure the service is running and accessible.
166
+ 2. For "context loading failed" errors, check that the indexed data is present in the specified output folder.
167
+ 3. If prompt tuning or indexing processes fail, review the logs using the respective status endpoints.
168
+ 4. For performance issues, consider adjusting the `community_level` and `token_limit` settings.
169
+
170
+ For more detailed information on GraphRAG's indexing and querying processes, refer to the official GraphRAG documentation.
EMBEDDING_PROXY_README.md ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Using Ollama Embeddings with GraphRAG: A Quick Guide
2
+
3
+ ## Problem
4
+
5
+ GraphRAG is designed to work with OpenAI-compatible APIs for both language models and embeddings and Ollama currently has their own way of doing embeddings.
6
+
7
+ ## Solution: Embeddings Proxy
8
+
9
+ To bridge this gap, let's use an embeddings proxy. This proxy acts as a middleware between GraphRAG and Ollama, translating Ollama's embedding responses into a format that GraphRAG expects.
10
+
11
+ ## Use the Embeddings Proxy
12
+
13
+ 1. **Set up the proxy:**
14
+ - Save the provided `embedding_proxy.py` script to your project directory.
15
+ - Install required dependencies (not needed if you've already done this in the normal setup): `pip install fastapi uvicorn httpx`
16
+
17
+ 2. **Run the proxy:**
18
+ ```bash
19
+ python embedding_proxy.py --port 11435 --host http://localhost:11434
20
+ ```
21
+ This starts the proxy on port 11435, connecting to Ollama at localhost:11434.
22
+
23
+ 3. **Configure GraphRAG:**
24
+ Update your `settings.yaml` file to use the proxy for embeddings:
25
+
26
+ ```yaml
27
+ embeddings:
28
+ llm:
29
+ api_key: ${GRAPHRAG_API_KEY}
30
+ type: openai_embedding
31
+ model: nomic-embed-text:latest
32
+ api_base: http://localhost:11435 # Point to your proxy
33
+ ```
34
+
35
+ 4. **Run GraphRAG:**
36
+ With the proxy running and the configuration updated, you can now run GraphRAG as usual. It will use Ollama for embeddings through the proxy.
INDEX_APP_README.md ADDED
@@ -0,0 +1,127 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # GraphRAG Indexer Application
2
+
3
+ ## Table of Contents
4
+ 1. [Introduction](#introduction)
5
+ 2. [Setup](#setup)
6
+ 3. [Application Structure](#application-structure)
7
+ 4. [Indexing](#indexing)
8
+ 5. [Prompt Tuning](#prompt-tuning)
9
+ 6. [Data Management](#data-management)
10
+ 7. [Configuration](#configuration)
11
+ 8. [API Integration](#api-integration)
12
+ 9. [Troubleshooting](#troubleshooting)
13
+
14
+ ## Introduction
15
+
16
+ The GraphRAG Indexer Application is a Gradio-based user interface for managing the indexing and prompt tuning processes of the GraphRAG (Graph Retrieval-Augmented Generation) system. This application provides an intuitive way to configure, run, and monitor indexing and prompt tuning tasks, as well as manage related data files.
17
+
18
+ ## Setup
19
+
20
+ 1. Ensure you have Python 3.7+ installed.
21
+ 2. Install required dependencies:
22
+ ```
23
+ pip install gradio requests pydantic python-dotenv pyyaml pandas lancedb
24
+ ```
25
+ 3. Set up environment variables in `indexing/.env`:
26
+ ```
27
+ API_BASE_URL=http://localhost:8012
28
+ LLM_API_BASE=http://localhost:11434
29
+ EMBEDDINGS_API_BASE=http://localhost:11434
30
+ ROOT_DIR=indexing
31
+ ```
32
+ 4. Run the application:
33
+ ```
34
+ python index_app.py
35
+ ```
36
+
37
+ ## Application Structure
38
+
39
+ The application is divided into three main tabs:
40
+ 1. Indexing
41
+ 2. Prompt Tuning
42
+ 3. Data Management
43
+
44
+ Each tab provides specific functionality related to its purpose.
45
+
46
+ ## Indexing
47
+
48
+ The Indexing tab allows users to configure and run the GraphRAG indexing process.
49
+
50
+ ### Features:
51
+ - Select LLM and Embedding models
52
+ - Set root directory for indexing
53
+ - Configure verbose and cache options
54
+ - Advanced options for resuming, reporting, and output formats
55
+ - Run indexing and check status
56
+
57
+ ### Usage:
58
+ 1. Select the desired LLM and Embedding models from the dropdowns.
59
+ 2. Set the root directory for indexing.
60
+ 3. Configure additional options as needed.
61
+ 4. Click "Run Indexing" to start the process.
62
+ 5. Use "Check Indexing Status" to monitor progress.
63
+
64
+ ## Prompt Tuning
65
+
66
+ The Prompt Tuning tab enables users to configure and run prompt tuning for GraphRAG.
67
+
68
+ ### Features:
69
+ - Set root directory and domain
70
+ - Choose tuning method (random, top, all)
71
+ - Configure limit, language, max tokens, and chunk size
72
+ - Option to exclude entity types
73
+ - Run prompt tuning and check status
74
+
75
+ ### Usage:
76
+ 1. Set the root directory and optional domain.
77
+ 2. Choose the tuning method and configure parameters.
78
+ 3. Click "Run Prompt Tuning" to start the process.
79
+ 4. Use "Check Prompt Tuning Status" to monitor progress.
80
+
81
+ ## Data Management
82
+
83
+ The Data Management tab provides tools for managing input files and viewing output folders.
84
+
85
+ ### Features:
86
+ - File upload functionality
87
+ - File list management (view, refresh, delete)
88
+ - Output folder exploration
89
+ - File content viewing and editing
90
+
91
+ ### Usage:
92
+ 1. Use the File Upload section to add new input files.
93
+ 2. Manage existing files in the File Management section.
94
+ 3. Explore output folders and their contents in the Output Folders section.
95
+
96
+ ## Configuration
97
+
98
+ The application uses a combination of environment variables and a `config.yaml` file for configuration. Key settings include:
99
+
100
+ - LLM and Embedding models
101
+ - API endpoints
102
+ - Community level for GraphRAG
103
+ - Token limits
104
+ - API keys and types
105
+
106
+ To modify these settings, edit the `.env` file or create a `config.yaml` file in the root directory.
107
+
108
+ ## API Integration
109
+
110
+ The application integrates with a backend API for executing indexing and prompt tuning tasks. Key API endpoints used:
111
+
112
+ - `/v1/index`: Start indexing process
113
+ - `/v1/index_status`: Check indexing status
114
+ - `/v1/prompt_tune`: Start prompt tuning process
115
+ - `/v1/prompt_tune_status`: Check prompt tuning status
116
+
117
+ These endpoints are called using the `requests` library, with appropriate error handling and logging.
118
+
119
+ ## Troubleshooting
120
+
121
+ Common issues and solutions:
122
+
123
+ 1. **Model loading fails**: Ensure the LLM_API_BASE is correctly set and the API is accessible.
124
+ 2. **Indexing or Prompt Tuning doesn't start**: Check API connectivity and verify that all required fields are filled.
125
+ 3. **File management issues**: Ensure proper read/write permissions in the ROOT_DIR.
126
+
127
+ For any persistent issues, check the application logs (visible in the console) for detailed error messages.
LICENSE ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) 2024 Beckett
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
README.md CHANGED
@@ -1,12 +1,268 @@
1
  ---
2
- title: Graph Rag Local Ui Scl
3
- emoji: 📊
4
- colorFrom: yellow
5
- colorTo: indigo
6
  sdk: gradio
7
  sdk_version: 4.41.0
8
- app_file: app.py
9
- pinned: false
10
  ---
 
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: graph-rag-local-ui-scl
3
+ app_file: index_app.py
 
 
4
  sdk: gradio
5
  sdk_version: 4.41.0
 
 
6
  ---
7
+ # 🕸️ GraphRAG Local
8
 
9
+ Welcome to **GraphRAG Local with Index/Prompt-Tuning and Querying/Chat UIs**! This project is an adaptation of Microsoft's [GraphRAG](https://github.com/microsoft/graphrag), tailored to support local models and featuring a comprehensive interactive user interface ecosystem.
10
+
11
+ ## 📄 Research Paper
12
+
13
+ For more details on the original GraphRAG implementation, please refer to the [GraphRAG paper](https://arxiv.org/pdf/2404.16130).
14
+
15
+ ## 🌟 Features
16
+
17
+ - **API-Centric Architecture:** A robust FastAPI-based server (`api.py`) serving as the core of the GraphRAG operations.
18
+ - **Dedicated Indexing and Prompt Tuning UI:** A separate Gradio-based interface (`index_app.py`) for managing indexing and prompt tuning processes.
19
+ - **Local Model Support:** Leverage local models for LLM and embeddings, including compatibility with Ollama and OpenAI-compatible APIs.
20
+ - **Cost-Effective:** Eliminate dependency on costly cloud-based models by using your own local models.
21
+ - **Interactive UI:** User-friendly interface for managing data, running queries, and visualizing results (main app).
22
+ - **Real-time Graph Visualization:** Visualize your knowledge graph in 2D or 3D using Plotly (main app).
23
+ - **File Management:** Upload, view, edit, and delete input files directly from the UI.
24
+ - **Settings Management:** Easily update and manage your GraphRAG settings through the UI.
25
+ - **Output Exploration:** Browse and view indexing outputs and artifacts.
26
+ - **Logging:** Real-time logging for better debugging and monitoring.
27
+ - **Flexible Querying:** Support for global, local, and direct chat queries with customizable parameters (main app).
28
+ - **Customizable Visualization:** Adjust graph layout, node sizes, colors, and more to suit your preferences (main app).
29
+
30
+ ![GraphRAG UI](uiv3.png)
31
+
32
+ ## 🗺️ Roadmap
33
+
34
+ ### **Important Note:** *Updates have been slow due to the day job and lack of immediate time, but I promise I am working on errors/issues in the background when able to. Please feel free to contribute/create a PR if you want to help out and find a great solution to an issue presented.*
35
+ **The GraphRAG Local UI ecosystem is currently undergoing a major transition. While the main app remains functional, I am actively developing separate applications for Indexing/Prompt Tuning and Querying/Chat, all built around a robust central API. Users should expect some changes and potential instability during this transition period.**
36
+
37
+ *While it is currently functional, it has only been primarily tested on a Mac Studio M2.*
38
+
39
+ My vision for the GraphRAG Local UI ecosystem is to become the ultimate set of tools for working with GraphRAG and local LLMs, incorporating as many cool features and knowledge graph tools as possible. I am continuously working on improvements and new features.
40
+
41
+ ### Recent Updates
42
+ - [x] New API-centric architecture (`api.py`)
43
+ - [x] Dedicated Indexing and Prompt Tuning UI (`index_app.py`)
44
+ - [x] Improved file management and output exploration
45
+ - [x] Background task handling for long-running operations
46
+ - [x] Enhanced configuration options through environment variables and YAML files
47
+
48
+ ### Upcoming Features
49
+ - [ ] Dedicated Querying/Chat UI that interacts with the API
50
+ - [ ] Dockerfile for easier deployment
51
+ - [ ] Launch your own GraphRAG API server for use in external applications
52
+ - [ ] Experimental: Mixture of Agents for Indexing/Query of knowledge graph
53
+ - [ ] Support for more file formats (CSV, PDF, etc.)
54
+ - [ ] Web search/Scraping capabilities
55
+ - [ ] Advanced graph analysis tools
56
+ - [ ] Integration with popular knowledge management tools
57
+ - [ ] Collaborative features for team-based knowledge graph building
58
+
59
+ I am committed to making the GraphRAG Local UI ecosystem the most comprehensive and user-friendly toolset for working with knowledge graphs and LLMs. Your feedback and suggestions are much needed in shaping the future of this project.
60
+
61
+ Feel free to open an Issue if you run into an error, and I will try to address it as soon as possible to minimize any downtime you might experience.
62
+
63
+ ---
64
+
65
+ ## 📦 Installation and Setup
66
+
67
+ Follow these steps to set up and run the GraphRAG Local UI ecosystem:
68
+
69
+ 1. **Create and activate a new conda environment:**
70
+ ```bash
71
+ conda create -n graphrag-local -y
72
+ conda activate graphrag-local
73
+ ```
74
+
75
+ 2. **Install the required packages:**
76
+
77
+ First install the GraphRAG dir from this repo (has changes not present in the Microsoft repo):
78
+
79
+ ```bash
80
+ pip install -e ./graphrag
81
+ ```
82
+
83
+ Then install the rest of the dependencies:
84
+
85
+ ```bash
86
+ pip install -r requirements.txt
87
+ ```
88
+
89
+ 3. **Launch the API server:**
90
+ ```bash
91
+ python api.py --host 0.0.0.0 --port 8012 --reload
92
+ ```
93
+
94
+ 4. **If using Ollama for embeddings, launch the embedding proxy:**
95
+ ```bash
96
+ python embedding_proxy.py --port 11435 --host http://localhost:11434
97
+ ```
98
+ Note: For detailed instructions on using Ollama embeddings with GraphRAG, refer to the EMBEDDING_PROXY_README.md file.
99
+
100
+ 5. **Launch the Indexing and Prompt Tuning UI:**
101
+ ```bash
102
+ gradio index_app.py
103
+ ```
104
+
105
+ 6. **Launch the main interactive UI (legacy app):**
106
+ ```bash
107
+ gradio app.py
108
+ ```
109
+ or
110
+ ```bash
111
+ python app.py
112
+ ```
113
+
114
+ 7. **Access the UIs:**
115
+ - Indexing and Prompt Tuning UI: Open your web browser and navigate to `http://localhost:7861`
116
+ - Main UI (legacy): Open your web browser and navigate to `http://localhost:7860`
117
+
118
+ ---
119
+
120
+ ## 🚀 Getting Started with GraphRAG Local
121
+
122
+ GraphRAG is designed for flexibility, allowing you to quickly create and initialize your own indexing directory. Follow these steps to set up your environment:
123
+
124
+ ### 1. Create the Indexing Directory
125
+
126
+ This repo comes with a pre-made Indexing folder but you may want to make your own, so here are the steps. First, create the required directory structure for your input data and indexing results:
127
+
128
+ ```bash
129
+ mkdir -p ./indexing/input
130
+ ```
131
+
132
+ This directory will store:
133
+ - Input .txt files for indexing
134
+ - Output results
135
+ - Prompts for Prompt Tuning
136
+
137
+ ### 2. Add Sample Data (Optional)
138
+
139
+ If you want to start with sample data, copy it to your new input directory:
140
+
141
+ ```bash
142
+ cp input/* ./indexing/input
143
+ ```
144
+
145
+ You can also add your own .txt files to this directory for indexing.
146
+
147
+ ### 3. Initialize the Indexing Folder
148
+
149
+ Run the following command to initialize the ./indexing folder with the required files:
150
+
151
+ ```bash
152
+ python -m graphrag.index --init --root ./indexing
153
+ ```
154
+
155
+ ### 4. Configure Settings
156
+
157
+ Move the pre-configured `settings.yaml` file to your indexing directory:
158
+
159
+ ```bash
160
+ mv settings.yaml ./indexing
161
+ ```
162
+
163
+ This file contains the main configuration, pre-set for use with local models.
164
+
165
+ ### 5. Customization
166
+
167
+ You can customize your setup by modifying the following environment variables:
168
+ - `ROOT_DIR`: Points to your main indexing directory
169
+ - `INPUT_DIR`: Specifies the location of your input files
170
+
171
+ ### 📚 Additional Resources
172
+
173
+ For more detailed information and advanced usage, refer to the [official GraphRAG documentation](https://microsoft.github.io/graphrag/posts/get_started/).
174
+
175
+ ---
176
+
177
+ ## 🖥️ GraphRAG Application Ecosystem
178
+
179
+ The GraphRAG Local UI ecosystem consists of three main components, each serving a specific purpose in the knowledge graph creation and querying process:
180
+
181
+ ### 1. Core API (`api.py`)
182
+
183
+ The `api.py` file serves as the backbone of the GraphRAG system, providing a robust FastAPI-based server that handles all core operations.
184
+
185
+ Key features:
186
+ - Manages indexing and prompt tuning processes
187
+ - Handles various query types (local, global, and direct chat)
188
+ - Integrates with local LLM and embedding models
189
+ - Provides endpoints for file management and system configuration
190
+
191
+ Usage:
192
+ ```bash
193
+ python api.py --host 0.0.0.0 --port 8012 --reload
194
+ ```
195
+
196
+ Note: If using Ollama for embeddings, make sure to run the embedding proxy (`embedding_proxy.py`) alongside `api.py`. Refer to the EMBEDDING_PROXY_README.md for detailed instructions.
197
+
198
+ ### 2. Indexing and Prompt Tuning UI (`index_app.py`)
199
+
200
+ #### Workflow Integration
201
+
202
+ 1. Start the Core API (`api.py`) to enable backend functionality.
203
+ 2. If using Ollama for embeddings, start the embedding proxy (`embedding_proxy.py`).
204
+ 3. Use the Indexing and Prompt Tuning UI (`index_app.py`) to prepare your data and fine-tune the system.
205
+ 4. (Optional) Use the Main Interactive UI (`app.py`) for visualization and legacy features.
206
+
207
+ This modular approach allows for greater flexibility and easier maintenance of the GraphRAG system. As development continues, the functionality of `app.py` will be gradually integrated into new, specialized interfaces that interact with the core API.
208
+
209
+ ### 2. Indexing and Prompt Tuning UI (`index_app.py`)
210
+
211
+ The `index_app.py` file provides a user-friendly Gradio interface for managing the indexing and prompt tuning processes.
212
+
213
+ Key features:
214
+ - Configure and run indexing tasks
215
+ - Set up and execute prompt tuning
216
+ - Manage input files and explore output data
217
+ - Adjust LLM and embedding settings
218
+
219
+ Usage:
220
+ ```bash
221
+ python index_app.py
222
+ ```
223
+ Access the UI at `http://localhost:7861`
224
+
225
+ ### 3. Main Interactive UI (Legacy App) (`app.py`)
226
+
227
+ The `app.py` file is the pre-existing main application, which is being phased out but still provides useful functionality.
228
+
229
+ Key features:
230
+ - Visualize knowledge graphs in 2D or 3D
231
+ - Run queries and view results
232
+ - Manage GraphRAG settings
233
+ - Explore indexed data
234
+
235
+ Usage:
236
+ ```bash
237
+ python app.py
238
+ ```
239
+ or
240
+ ```bash
241
+ gradio app.py
242
+ ```
243
+ Access the UI at `http://localhost:7860`
244
+
245
+ ### Workflow Integration
246
+
247
+ 1. Start the Core API (`api.py`) to enable backend functionality.
248
+ 2. Use the Indexing and Prompt Tuning UI (`index_app.py`) to prepare your data and fine-tune the system.
249
+ 3. (Optional) Use the Main Interactive UI (`app.py`) for visualization and legacy features.
250
+
251
+ This modular approach allows for greater flexibility and easier maintenance of the GraphRAG system. As development continues, the functionality of `app.py` will be gradually integrated into new, specialized interfaces that interact with the core API.
252
+
253
+ ---
254
+
255
+ ## 📚 Citations
256
+
257
+ - Original GraphRAG repository by Microsoft: [GraphRAG](https://github.com/microsoft/graphrag)
258
+ - This project took inspiration and used the GraphRAG4OpenWebUI repository by win4r (https://github.com/win4r/GraphRAG4OpenWebUI) as a starting point for the API implementation.
259
+
260
+ ---
261
+
262
+ ## Troubleshooting
263
+
264
+ - If you encounter any issues with the new API or Indexing UI, please check the console logs for detailed error messages.
265
+ - For the main app, if you can't run `gradio app.py`, try running `pip install --upgrade gradio` and then exit out and start a new terminal. It should then load and launch properly as a Gradio app.
266
+ - On Windows, if you run into an encoding/UTF error, you can change it to the correct format in the YAML Settings menu.
267
+
268
+ For any issues or feature requests, please open an issue on the GitHub repository. Happy knowledge graphing!
__pycache__/api.cpython-310.pyc ADDED
Binary file (27.4 kB). View file
 
__pycache__/embedding_proxy.cpython-310.pyc ADDED
Binary file (2.39 kB). View file
 
__pycache__/web.cpython-310.pyc ADDED
Binary file (4.84 kB). View file
 
api.py ADDED
@@ -0,0 +1,943 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from dotenv import load_dotenv
2
+ import os
3
+ import asyncio
4
+ import tempfile
5
+ from collections import deque
6
+ import time
7
+ import uuid
8
+ import json
9
+ import re
10
+ import pandas as pd
11
+ import tiktoken
12
+ import logging
13
+ import yaml
14
+ import shutil
15
+ from fastapi import Body
16
+ from fastapi import FastAPI, HTTPException, Request, BackgroundTasks, Depends
17
+ from fastapi.responses import JSONResponse, StreamingResponse
18
+ from pydantic import BaseModel, Field
19
+ from typing import List, Optional, Dict, Any, Union
20
+ from contextlib import asynccontextmanager
21
+ from web import DuckDuckGoSearchAPIWrapper
22
+ from functools import lru_cache
23
+ import requests
24
+ import subprocess
25
+ import argparse
26
+
27
+ # GraphRAG related imports
28
+ from graphrag.query.context_builder.entity_extraction import EntityVectorStoreKey
29
+ from graphrag.query.indexer_adapters import (
30
+ read_indexer_covariates,
31
+ read_indexer_entities,
32
+ read_indexer_relationships,
33
+ read_indexer_reports,
34
+ read_indexer_text_units,
35
+ )
36
+ from graphrag.query.input.loaders.dfs import store_entity_semantic_embeddings
37
+ from graphrag.query.llm.oai.chat_openai import ChatOpenAI
38
+ from graphrag.query.llm.oai.embedding import OpenAIEmbedding
39
+ from graphrag.query.llm.oai.typing import OpenaiApiType
40
+ from graphrag.query.question_gen.local_gen import LocalQuestionGen
41
+ from graphrag.query.structured_search.local_search.mixed_context import LocalSearchMixedContext
42
+ from graphrag.query.structured_search.local_search.search import LocalSearch
43
+ from graphrag.query.structured_search.global_search.community_context import GlobalCommunityContext
44
+ from graphrag.query.structured_search.global_search.search import GlobalSearch
45
+ from graphrag.vector_stores.lancedb import LanceDBVectorStore
46
+
47
+ # Set up logging
48
+ logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
49
+ logger = logging.getLogger(__name__)
50
+
51
+ # Load environment variables
52
+ load_dotenv('indexing/.env')
53
+ LLM_API_BASE = os.getenv('LLM_API_BASE', '')
54
+ LLM_MODEL = os.getenv('LLM_MODEL')
55
+ LLM_PROVIDER = os.getenv('LLM_PROVIDER', 'openai').lower()
56
+ EMBEDDINGS_API_BASE = os.getenv('EMBEDDINGS_API_BASE', '')
57
+ EMBEDDINGS_MODEL = os.getenv('EMBEDDINGS_MODEL')
58
+ EMBEDDINGS_PROVIDER = os.getenv('EMBEDDINGS_PROVIDER', 'openai').lower()
59
+ INPUT_DIR = os.getenv('INPUT_DIR', './indexing/output')
60
+ ROOT_DIR = os.getenv('ROOT_DIR', 'indexing')
61
+ PORT = int(os.getenv('API_PORT', 8012))
62
+ LANCEDB_URI = f"{INPUT_DIR}/lancedb"
63
+ COMMUNITY_REPORT_TABLE = "create_final_community_reports"
64
+ ENTITY_TABLE = "create_final_nodes"
65
+ ENTITY_EMBEDDING_TABLE = "create_final_entities"
66
+ RELATIONSHIP_TABLE = "create_final_relationships"
67
+ COVARIATE_TABLE = "create_final_covariates"
68
+ TEXT_UNIT_TABLE = "create_final_text_units"
69
+ COMMUNITY_LEVEL = 2
70
+
71
+ # Global variables for storing search engines and question generator
72
+ local_search_engine = None
73
+ global_search_engine = None
74
+ question_generator = None
75
+
76
+ # Data models
77
+ class Message(BaseModel):
78
+ role: str
79
+ content: str
80
+
81
+ class QueryOptions(BaseModel):
82
+ query_type: str
83
+ preset: Optional[str] = None
84
+ community_level: Optional[int] = None
85
+ response_type: Optional[str] = None
86
+ custom_cli_args: Optional[str] = None
87
+ selected_folder: Optional[str] = None
88
+
89
+ class ChatCompletionRequest(BaseModel):
90
+ model: str
91
+ messages: List[Message]
92
+ temperature: Optional[float] = 0.7
93
+ max_tokens: Optional[int] = None
94
+ stream: Optional[bool] = False
95
+ query_options: Optional[QueryOptions] = None
96
+
97
+ class ChatCompletionResponseChoice(BaseModel):
98
+ index: int
99
+ message: Message
100
+ finish_reason: Optional[str] = None
101
+
102
+ class Usage(BaseModel):
103
+ prompt_tokens: int
104
+ completion_tokens: int
105
+ total_tokens: int
106
+
107
+ class ChatCompletionResponse(BaseModel):
108
+ id: str = Field(default_factory=lambda: f"chatcmpl-{uuid.uuid4().hex}")
109
+ object: str = "chat.completion"
110
+ created: int = Field(default_factory=lambda: int(time.time()))
111
+ model: str
112
+ choices: List[ChatCompletionResponseChoice]
113
+ usage: Usage
114
+ system_fingerprint: Optional[str] = None
115
+
116
+ def list_output_folders():
117
+ return [f for f in os.listdir(INPUT_DIR) if os.path.isdir(os.path.join(INPUT_DIR, f))]
118
+
119
+ def list_folder_contents(folder_name):
120
+ folder_path = os.path.join(INPUT_DIR, folder_name, "artifacts")
121
+ if not os.path.exists(folder_path):
122
+ return []
123
+ return [item for item in os.listdir(folder_path) if item.endswith('.parquet')]
124
+
125
+ def normalize_api_base(api_base: str) -> str:
126
+ """Normalize the API base URL by removing trailing slashes and /v1 or /api suffixes."""
127
+ api_base = api_base.rstrip('/')
128
+ if api_base.endswith('/v1') or api_base.endswith('/api'):
129
+ api_base = api_base[:-3]
130
+ return api_base
131
+
132
+ def get_models_endpoint(api_base: str, api_type: str) -> str:
133
+ """Get the appropriate models endpoint based on the API type."""
134
+ normalized_base = normalize_api_base(api_base)
135
+ if api_type.lower() == 'openai':
136
+ return f"{normalized_base}/v1/models"
137
+ elif api_type.lower() == 'azure':
138
+ return f"{normalized_base}/openai/deployments?api-version=2022-12-01"
139
+ else: # For other API types (e.g., local LLMs)
140
+ return f"{normalized_base}/models"
141
+
142
+ async def fetch_available_models(settings: Dict[str, Any]) -> List[str]:
143
+ """Fetch available models from the API."""
144
+ api_base = settings['api_base']
145
+ api_type = settings['api_type']
146
+ api_key = settings['api_key']
147
+
148
+ models_endpoint = get_models_endpoint(api_base, api_type)
149
+ headers = {"Authorization": f"Bearer {api_key}"} if api_key else {}
150
+
151
+ try:
152
+ response = requests.get(models_endpoint, headers=headers, timeout=10)
153
+ response.raise_for_status()
154
+ data = response.json()
155
+
156
+ if api_type.lower() == 'openai':
157
+ return [model['id'] for model in data['data']]
158
+ elif api_type.lower() == 'azure':
159
+ return [model['id'] for model in data['value']]
160
+ else:
161
+ # Adjust this based on the actual response format of your local LLM API
162
+ return [model['name'] for model in data['models']]
163
+ except requests.exceptions.RequestException as e:
164
+ logger.error(f"Error fetching models: {str(e)}")
165
+ return []
166
+
167
+ def load_settings():
168
+ config_path = os.getenv('GRAPHRAG_CONFIG', 'config.yaml')
169
+ if os.path.exists(config_path):
170
+ with open(config_path, 'r') as config_file:
171
+ config = yaml.safe_load(config_file)
172
+ else:
173
+ config = {}
174
+
175
+ settings = {
176
+ 'llm_model': os.getenv('LLM_MODEL', config.get('llm_model')),
177
+ 'embedding_model': os.getenv('EMBEDDINGS_MODEL', config.get('embedding_model')),
178
+ 'community_level': int(os.getenv('COMMUNITY_LEVEL', config.get('community_level', 2))),
179
+ 'token_limit': int(os.getenv('TOKEN_LIMIT', config.get('token_limit', 4096))),
180
+ 'api_key': os.getenv('GRAPHRAG_API_KEY', config.get('api_key')),
181
+ 'api_base': os.getenv('LLM_API_BASE', config.get('api_base')),
182
+ 'embeddings_api_base': os.getenv('EMBEDDINGS_API_BASE', config.get('embeddings_api_base')),
183
+ 'api_type': os.getenv('API_TYPE', config.get('api_type', 'openai')),
184
+ }
185
+
186
+ return settings
187
+
188
+ return settings
189
+
190
+ async def setup_llm_and_embedder(settings):
191
+ logger.info("Setting up LLM and embedder")
192
+ try:
193
+ llm = ChatOpenAI(
194
+ api_key=settings['api_key'],
195
+ api_base=f"{settings['api_base']}/v1",
196
+ model=settings['llm_model'],
197
+ api_type=OpenaiApiType[settings['api_type'].capitalize()],
198
+ max_retries=20,
199
+ )
200
+
201
+ token_encoder = tiktoken.get_encoding("cl100k_base")
202
+
203
+ text_embedder = OpenAIEmbedding(
204
+ api_key=settings['api_key'],
205
+ api_base=f"{settings['embeddings_api_base']}/v1",
206
+ api_type=OpenaiApiType[settings['api_type'].capitalize()],
207
+ model=settings['embedding_model'],
208
+ deployment_name=settings['embedding_model'],
209
+ max_retries=20,
210
+ )
211
+
212
+ logger.info("LLM and embedder setup complete")
213
+ return llm, token_encoder, text_embedder
214
+ except Exception as e:
215
+ logger.error(f"Error setting up LLM and embedder: {str(e)}")
216
+ raise HTTPException(status_code=500, detail=f"Failed to set up LLM and embedder: {str(e)}")
217
+
218
+ async def load_context(selected_folder, settings):
219
+ """
220
+ Load context data including entities, relationships, reports, text units, and covariates
221
+ """
222
+ logger.info("Loading context data")
223
+ try:
224
+ input_dir = os.path.join(INPUT_DIR, selected_folder, "artifacts")
225
+ entity_df = pd.read_parquet(f"{input_dir}/{ENTITY_TABLE}.parquet")
226
+ entity_embedding_df = pd.read_parquet(f"{input_dir}/{ENTITY_EMBEDDING_TABLE}.parquet")
227
+ entities = read_indexer_entities(entity_df, entity_embedding_df, settings['community_level'])
228
+
229
+ description_embedding_store = LanceDBVectorStore(collection_name="entity_description_embeddings")
230
+ description_embedding_store.connect(db_uri=LANCEDB_URI)
231
+ store_entity_semantic_embeddings(entities=entities, vectorstore=description_embedding_store)
232
+
233
+ relationship_df = pd.read_parquet(f"{input_dir}/{RELATIONSHIP_TABLE}.parquet")
234
+ relationships = read_indexer_relationships(relationship_df)
235
+
236
+ report_df = pd.read_parquet(f"{input_dir}/{COMMUNITY_REPORT_TABLE}.parquet")
237
+ reports = read_indexer_reports(report_df, entity_df, COMMUNITY_LEVEL)
238
+
239
+ text_unit_df = pd.read_parquet(f"{input_dir}/{TEXT_UNIT_TABLE}.parquet")
240
+ text_units = read_indexer_text_units(text_unit_df)
241
+
242
+ covariate_df = pd.read_parquet(f"{input_dir}/{COVARIATE_TABLE}.parquet")
243
+ claims = read_indexer_covariates(covariate_df)
244
+ logger.info(f"Number of claim records: {len(claims)}")
245
+ covariates = {"claims": claims}
246
+
247
+ logger.info("Context data loading complete")
248
+ return entities, relationships, reports, text_units, description_embedding_store, covariates
249
+ except Exception as e:
250
+ logger.error(f"Error loading context data: {str(e)}")
251
+ raise
252
+
253
+ async def setup_search_engines(llm, token_encoder, text_embedder, entities, relationships, reports, text_units,
254
+ description_embedding_store, covariates):
255
+ """
256
+ Set up local and global search engines
257
+ """
258
+ logger.info("Setting up search engines")
259
+
260
+ # Set up local search engine
261
+ local_context_builder = LocalSearchMixedContext(
262
+ community_reports=reports,
263
+ text_units=text_units,
264
+ entities=entities,
265
+ relationships=relationships,
266
+ covariates=covariates,
267
+ entity_text_embeddings=description_embedding_store,
268
+ embedding_vectorstore_key=EntityVectorStoreKey.ID,
269
+ text_embedder=text_embedder,
270
+ token_encoder=token_encoder,
271
+ )
272
+
273
+ local_context_params = {
274
+ "text_unit_prop": 0.5,
275
+ "community_prop": 0.1,
276
+ "conversation_history_max_turns": 5,
277
+ "conversation_history_user_turns_only": True,
278
+ "top_k_mapped_entities": 10,
279
+ "top_k_relationships": 10,
280
+ "include_entity_rank": True,
281
+ "include_relationship_weight": True,
282
+ "include_community_rank": False,
283
+ "return_candidate_context": False,
284
+ "embedding_vectorstore_key": EntityVectorStoreKey.ID,
285
+ "max_tokens": 12_000,
286
+ }
287
+
288
+ local_llm_params = {
289
+ "max_tokens": 2_000,
290
+ "temperature": 0.0,
291
+ }
292
+
293
+ local_search_engine = LocalSearch(
294
+ llm=llm,
295
+ context_builder=local_context_builder,
296
+ token_encoder=token_encoder,
297
+ llm_params=local_llm_params,
298
+ context_builder_params=local_context_params,
299
+ response_type="multiple paragraphs",
300
+ )
301
+
302
+ # Set up global search engine
303
+ global_context_builder = GlobalCommunityContext(
304
+ community_reports=reports,
305
+ entities=entities,
306
+ token_encoder=token_encoder,
307
+ )
308
+
309
+ global_context_builder_params = {
310
+ "use_community_summary": False,
311
+ "shuffle_data": True,
312
+ "include_community_rank": True,
313
+ "min_community_rank": 0,
314
+ "community_rank_name": "rank",
315
+ "include_community_weight": True,
316
+ "community_weight_name": "occurrence weight",
317
+ "normalize_community_weight": True,
318
+ "max_tokens": 12_000,
319
+ "context_name": "Reports",
320
+ }
321
+
322
+ map_llm_params = {
323
+ "max_tokens": 1000,
324
+ "temperature": 0.0,
325
+ "response_format": {"type": "json_object"},
326
+ }
327
+
328
+ reduce_llm_params = {
329
+ "max_tokens": 2000,
330
+ "temperature": 0.0,
331
+ }
332
+
333
+ global_search_engine = GlobalSearch(
334
+ llm=llm,
335
+ context_builder=global_context_builder,
336
+ token_encoder=token_encoder,
337
+ max_data_tokens=12_000,
338
+ map_llm_params=map_llm_params,
339
+ reduce_llm_params=reduce_llm_params,
340
+ allow_general_knowledge=False,
341
+ json_mode=True,
342
+ context_builder_params=global_context_builder_params,
343
+ concurrent_coroutines=32,
344
+ response_type="multiple paragraphs",
345
+ )
346
+
347
+ logger.info("Search engines setup complete")
348
+ return local_search_engine, global_search_engine, local_context_builder, local_llm_params, local_context_params
349
+
350
+ def format_response(response):
351
+ """
352
+ Format the response by adding appropriate line breaks and paragraph separations.
353
+ """
354
+ paragraphs = re.split(r'\n{2,}', response)
355
+
356
+ formatted_paragraphs = []
357
+ for para in paragraphs:
358
+ if '```' in para:
359
+ parts = para.split('```')
360
+ for i, part in enumerate(parts):
361
+ if i % 2 == 1: # This is a code block
362
+ parts[i] = f"\n```\n{part.strip()}\n```\n"
363
+ para = ''.join(parts)
364
+ else:
365
+ para = para.replace('. ', '.\n')
366
+
367
+ formatted_paragraphs.append(para.strip())
368
+
369
+ return '\n\n'.join(formatted_paragraphs)
370
+
371
+ @asynccontextmanager
372
+ async def lifespan(app: FastAPI):
373
+ global settings
374
+ try:
375
+ logger.info("Loading settings...")
376
+ settings = load_settings()
377
+ logger.info("Settings loaded successfully.")
378
+ except Exception as e:
379
+ logger.error(f"Error loading settings: {str(e)}")
380
+ raise
381
+
382
+ yield
383
+
384
+ logger.info("Shutting down...")
385
+
386
+ app = FastAPI(lifespan=lifespan)
387
+
388
+ # Create a cache for loaded contexts
389
+ context_cache = {}
390
+
391
+ @lru_cache()
392
+ def get_settings():
393
+ return load_settings()
394
+
395
+ async def get_context(selected_folder: str, settings: dict = Depends(get_settings)):
396
+ if selected_folder not in context_cache:
397
+ try:
398
+ llm, token_encoder, text_embedder = await setup_llm_and_embedder(settings)
399
+ entities, relationships, reports, text_units, description_embedding_store, covariates = await load_context(selected_folder, settings)
400
+ local_search_engine, global_search_engine, local_context_builder, local_llm_params, local_context_params = await setup_search_engines(
401
+ llm, token_encoder, text_embedder, entities, relationships, reports, text_units,
402
+ description_embedding_store, covariates
403
+ )
404
+ question_generator = LocalQuestionGen(
405
+ llm=llm,
406
+ context_builder=local_context_builder,
407
+ token_encoder=token_encoder,
408
+ llm_params=local_llm_params,
409
+ context_builder_params=local_context_params,
410
+ )
411
+ context_cache[selected_folder] = {
412
+ "local_search_engine": local_search_engine,
413
+ "global_search_engine": global_search_engine,
414
+ "question_generator": question_generator
415
+ }
416
+ except Exception as e:
417
+ logger.error(f"Error loading context for folder {selected_folder}: {str(e)}")
418
+ raise HTTPException(status_code=500, detail=f"Failed to load context for folder {selected_folder}")
419
+
420
+ return context_cache[selected_folder]
421
+
422
+ @app.post("/v1/chat/completions")
423
+ async def chat_completions(request: ChatCompletionRequest):
424
+ try:
425
+ logger.info(f"Received request for model: {request.model}")
426
+ if request.model == "direct-chat":
427
+ logger.info("Routing to direct chat")
428
+ return await run_direct_chat(request)
429
+ elif request.model.startswith("graphrag-"):
430
+ logger.info("Routing to GraphRAG query")
431
+ if not request.query_options or not request.query_options.selected_folder:
432
+ raise HTTPException(status_code=400, detail="Selected folder is required for GraphRAG queries")
433
+ return await run_graphrag_query(request)
434
+ elif request.model == "duckduckgo-search:latest":
435
+ logger.info("Routing to DuckDuckGo search")
436
+ return await run_duckduckgo_search(request)
437
+ elif request.model == "full-model:latest":
438
+ logger.info("Routing to full model search")
439
+ return await run_full_model_search(request)
440
+ else:
441
+ raise HTTPException(status_code=400, detail=f"Invalid model specified: {request.model}")
442
+ except HTTPException as he:
443
+ logger.error(f"HTTP Exception: {str(he)}")
444
+ raise he
445
+ except Exception as e:
446
+ logger.error(f"Error in chat completion: {str(e)}", exc_info=True)
447
+ raise HTTPException(status_code=500, detail=str(e))
448
+
449
+ async def run_direct_chat(request: ChatCompletionRequest) -> ChatCompletionResponse:
450
+ try:
451
+ if not LLM_API_BASE:
452
+ raise ValueError("LLM_API_BASE environment variable is not set")
453
+
454
+ headers = {"Content-Type": "application/json"}
455
+
456
+ payload = {
457
+ "model": LLM_MODEL,
458
+ "messages": [{"role": msg.role, "content": msg.content} for msg in request.messages],
459
+ "stream": False
460
+ }
461
+
462
+ # Optional parameters
463
+ if request.temperature is not None:
464
+ payload["temperature"] = request.temperature
465
+ if request.max_tokens is not None:
466
+ payload["max_tokens"] = request.max_tokens
467
+
468
+ full_url = f"{normalize_api_base(LLM_API_BASE)}/v1/chat/completions"
469
+
470
+ logger.info(f"Sending request to: {full_url}")
471
+ logger.info(f"Payload: {payload}")
472
+
473
+ try:
474
+ response = requests.post(full_url, json=payload, headers=headers, timeout=10)
475
+ response.raise_for_status()
476
+ except requests.exceptions.RequestException as req_ex:
477
+ logger.error(f"Request to LLM API failed: {str(req_ex)}")
478
+ if isinstance(req_ex, requests.exceptions.ConnectionError):
479
+ raise HTTPException(status_code=503, detail="Unable to connect to LLM API. Please check your API settings.")
480
+ elif isinstance(req_ex, requests.exceptions.Timeout):
481
+ raise HTTPException(status_code=504, detail="Request to LLM API timed out")
482
+ else:
483
+ raise HTTPException(status_code=500, detail=f"Request to LLM API failed: {str(req_ex)}")
484
+
485
+ result = response.json()
486
+ logger.info(f"Received response: {result}")
487
+
488
+ content = result['choices'][0]['message']['content']
489
+
490
+ return ChatCompletionResponse(
491
+ model=LLM_MODEL,
492
+ choices=[
493
+ ChatCompletionResponseChoice(
494
+ index=0,
495
+ message=Message(
496
+ role="assistant",
497
+ content=content
498
+ ),
499
+ finish_reason=None
500
+ )
501
+ ],
502
+ usage=None
503
+ )
504
+ except HTTPException as he:
505
+ logger.error(f"HTTP Exception in direct chat: {str(he)}")
506
+ raise he
507
+ except Exception as e:
508
+ logger.error(f"Unexpected error in direct chat: {str(e)}")
509
+ raise HTTPException(status_code=500, detail=f"An unexpected error occurred during the direct chat: {str(e)}")
510
+
511
+ def get_embeddings(text: str) -> List[float]:
512
+ settings = load_settings()
513
+ embeddings_api_base = settings['embeddings_api_base']
514
+
515
+ headers = {"Content-Type": "application/json"}
516
+
517
+ if EMBEDDINGS_PROVIDER == 'ollama':
518
+ payload = {
519
+ "model": EMBEDDINGS_MODEL,
520
+ "prompt": text
521
+ }
522
+ full_url = f"{embeddings_api_base}/api/embeddings"
523
+ else: # OpenAI-compatible API
524
+ payload = {
525
+ "model": EMBEDDINGS_MODEL,
526
+ "input": text
527
+ }
528
+ full_url = f"{embeddings_api_base}/v1/embeddings"
529
+
530
+ try:
531
+ response = requests.post(full_url, json=payload, headers=headers)
532
+ response.raise_for_status()
533
+ except requests.exceptions.RequestException as req_ex:
534
+ logger.error(f"Request to Embeddings API failed: {str(req_ex)}")
535
+ raise HTTPException(status_code=500, detail=f"Failed to get embeddings: {str(req_ex)}")
536
+
537
+ result = response.json()
538
+
539
+ if EMBEDDINGS_PROVIDER == 'ollama':
540
+ return result['embedding']
541
+ else:
542
+ return result['data'][0]['embedding']
543
+
544
+
545
+ async def run_graphrag_query(request: ChatCompletionRequest) -> ChatCompletionResponse:
546
+ try:
547
+ query_options = request.query_options
548
+ query = request.messages[-1].content # Get the last user message as the query
549
+
550
+ cmd = ["python", "-m", "graphrag.query"]
551
+ cmd.extend(["--data", f"./indexing/output/{query_options.selected_folder}/artifacts"])
552
+ cmd.extend(["--method", query_options.query_type.split('-')[1]]) # 'global' or 'local'
553
+
554
+ if query_options.community_level:
555
+ cmd.extend(["--community_level", str(query_options.community_level)])
556
+ if query_options.response_type:
557
+ cmd.extend(["--response_type", query_options.response_type])
558
+
559
+ # Handle preset CLI args
560
+ if query_options.preset and query_options.preset != "Custom Query":
561
+ preset_args = get_preset_args(query_options.preset)
562
+ cmd.extend(preset_args)
563
+
564
+ # Handle custom CLI args
565
+ if query_options.custom_cli_args:
566
+ cmd.extend(query_options.custom_cli_args.split())
567
+
568
+ cmd.append(query)
569
+
570
+ logger.info(f"Executing GraphRAG query: {' '.join(cmd)}")
571
+
572
+ result = subprocess.run(cmd, capture_output=True, text=True)
573
+ if result.returncode != 0:
574
+ raise Exception(f"GraphRAG query failed: {result.stderr}")
575
+
576
+ return ChatCompletionResponse(
577
+ model=request.model,
578
+ choices=[
579
+ ChatCompletionResponseChoice(
580
+ index=0,
581
+ message=Message(
582
+ role="assistant",
583
+ content=result.stdout
584
+ ),
585
+ finish_reason="stop"
586
+ )
587
+ ],
588
+ usage=Usage(
589
+ prompt_tokens=0,
590
+ completion_tokens=0,
591
+ total_tokens=0
592
+ )
593
+ )
594
+ except Exception as e:
595
+ logger.error(f"Error in GraphRAG query: {str(e)}")
596
+ raise HTTPException(status_code=500, detail=f"An error occurred during the GraphRAG query: {str(e)}")
597
+
598
+
599
+ def get_preset_args(preset: str) -> List[str]:
600
+ preset_args = {
601
+ "Default Global Search": ["--community_level", "2", "--response_type", "Multiple Paragraphs"],
602
+ "Default Local Search": ["--community_level", "2", "--response_type", "Multiple Paragraphs"],
603
+ "Detailed Global Analysis": ["--community_level", "3", "--response_type", "Multi-Page Report"],
604
+ "Detailed Local Analysis": ["--community_level", "3", "--response_type", "Multi-Page Report"],
605
+ "Quick Global Summary": ["--community_level", "1", "--response_type", "Single Paragraph"],
606
+ "Quick Local Summary": ["--community_level", "1", "--response_type", "Single Paragraph"],
607
+ "Global Bullet Points": ["--community_level", "2", "--response_type", "List of 3-7 Points"],
608
+ "Local Bullet Points": ["--community_level", "2", "--response_type", "List of 3-7 Points"],
609
+ "Comprehensive Global Report": ["--community_level", "4", "--response_type", "Multi-Page Report"],
610
+ "Comprehensive Local Report": ["--community_level", "4", "--response_type", "Multi-Page Report"],
611
+ "High-Level Global Overview": ["--community_level", "1", "--response_type", "Single Page"],
612
+ "High-Level Local Overview": ["--community_level", "1", "--response_type", "Single Page"],
613
+ "Focused Global Insight": ["--community_level", "3", "--response_type", "Single Paragraph"],
614
+ "Focused Local Insight": ["--community_level", "3", "--response_type", "Single Paragraph"],
615
+ }
616
+ return preset_args.get(preset, [])
617
+
618
+ ddg_search = DuckDuckGoSearchAPIWrapper(max_results=5)
619
+
620
+ async def run_duckduckgo_search(request: ChatCompletionRequest) -> ChatCompletionResponse:
621
+ query = request.messages[-1].content
622
+ results = ddg_search.results(query, max_results=5)
623
+
624
+ if not results:
625
+ content = "No results found for the given query."
626
+ else:
627
+ content = "DuckDuckGo Search Results:\n\n"
628
+ for result in results:
629
+ content += f"Title: {result['title']}\n"
630
+ content += f"Snippet: {result['snippet']}\n"
631
+ content += f"Link: {result['link']}\n"
632
+ if 'date' in result:
633
+ content += f"Date: {result['date']}\n"
634
+ if 'source' in result:
635
+ content += f"Source: {result['source']}\n"
636
+ content += "\n"
637
+
638
+ return ChatCompletionResponse(
639
+ model=request.model,
640
+ choices=[
641
+ ChatCompletionResponseChoice(
642
+ index=0,
643
+ message=Message(
644
+ role="assistant",
645
+ content=content
646
+ ),
647
+ finish_reason="stop"
648
+ )
649
+ ],
650
+ usage=Usage(
651
+ prompt_tokens=0,
652
+ completion_tokens=0,
653
+ total_tokens=0
654
+ )
655
+ )
656
+
657
+ async def run_full_model_search(request: ChatCompletionRequest) -> ChatCompletionResponse:
658
+ query = request.messages[-1].content
659
+
660
+ # Run all search types
661
+ graphrag_global = await run_graphrag_query(ChatCompletionRequest(model="graphrag-global-search:latest", messages=request.messages, query_options=request.query_options))
662
+ graphrag_local = await run_graphrag_query(ChatCompletionRequest(model="graphrag-local-search:latest", messages=request.messages, query_options=request.query_options))
663
+ duckduckgo = await run_duckduckgo_search(request)
664
+
665
+ # Combine results
666
+ combined_content = f"""Full Model Search Results:
667
+
668
+ Global Search:
669
+ {graphrag_global.choices[0].message.content}
670
+
671
+ Local Search:
672
+ {graphrag_local.choices[0].message.content}
673
+
674
+ DuckDuckGo Search:
675
+ {duckduckgo.choices[0].message.content}
676
+ """
677
+
678
+ return ChatCompletionResponse(
679
+ model=request.model,
680
+ choices=[
681
+ ChatCompletionResponseChoice(
682
+ index=0,
683
+ message=Message(
684
+ role="assistant",
685
+ content=combined_content
686
+ ),
687
+ finish_reason="stop"
688
+ )
689
+ ],
690
+ usage=Usage(
691
+ prompt_tokens=0,
692
+ completion_tokens=0,
693
+ total_tokens=0
694
+ )
695
+ )
696
+
697
+ @app.get("/health")
698
+ async def health_check():
699
+ return {"status": "ok"}
700
+
701
+ @app.get("/v1/models")
702
+ async def list_models():
703
+ settings = load_settings()
704
+ try:
705
+ api_models = await fetch_available_models(settings)
706
+ except Exception as e:
707
+ logger.error(f"Error fetching API models: {str(e)}")
708
+ api_models = []
709
+
710
+ # Include the hardcoded models
711
+ hardcoded_models = [
712
+ {"id": "graphrag-local-search:latest", "object": "model", "owned_by": "graphrag"},
713
+ {"id": "graphrag-global-search:latest", "object": "model", "owned_by": "graphrag"},
714
+ {"id": "duckduckgo-search:latest", "object": "model", "owned_by": "duckduckgo"},
715
+ {"id": "full-model:latest", "object": "model", "owned_by": "combined"},
716
+ ]
717
+
718
+ # Combine API models with hardcoded models
719
+ all_models = [{"id": model, "object": "model", "owned_by": "api"} for model in api_models] + hardcoded_models
720
+
721
+ return JSONResponse(content={"data": all_models})
722
+
723
+ class PromptTuneRequest(BaseModel):
724
+ root: str = "./{ROOT_DIR}"
725
+ domain: Optional[str] = None
726
+ method: str = "random"
727
+ limit: int = 15
728
+ language: Optional[str] = None
729
+ max_tokens: int = 2000
730
+ chunk_size: int = 200
731
+ no_entity_types: bool = False
732
+ output: str = "./{ROOT_DIR}/prompts"
733
+
734
+ class PromptTuneResponse(BaseModel):
735
+ status: str
736
+ message: str
737
+
738
+ # Global variable to store the latest logs
739
+ prompt_tune_logs = deque(maxlen=100)
740
+
741
+ async def run_prompt_tuning(request: PromptTuneRequest):
742
+ cmd = ["python", "-m", "graphrag.prompt_tune"]
743
+
744
+ # Create a temporary directory for output
745
+ with tempfile.TemporaryDirectory() as temp_output:
746
+ # Expand environment variables in the root path
747
+ root_path = os.path.expandvars(request.root)
748
+
749
+ cmd.extend(["--root", root_path])
750
+ cmd.extend(["--method", request.method])
751
+ cmd.extend(["--limit", str(request.limit)])
752
+
753
+ if request.domain:
754
+ cmd.extend(["--domain", request.domain])
755
+
756
+ if request.language:
757
+ cmd.extend(["--language", request.language])
758
+
759
+ cmd.extend(["--max-tokens", str(request.max_tokens)])
760
+ cmd.extend(["--chunk-size", str(request.chunk_size)])
761
+
762
+ if request.no_entity_types:
763
+ cmd.append("--no-entity-types")
764
+
765
+ # Use the temporary directory for output
766
+ cmd.extend(["--output", temp_output])
767
+
768
+ logger.info(f"Executing prompt tuning command: {' '.join(cmd)}")
769
+
770
+ try:
771
+ process = await asyncio.create_subprocess_exec(
772
+ *cmd,
773
+ stdout=asyncio.subprocess.PIPE,
774
+ stderr=asyncio.subprocess.PIPE
775
+ )
776
+
777
+ async def read_stream(stream):
778
+ while True:
779
+ line = await stream.readline()
780
+ if not line:
781
+ break
782
+ line = line.decode().strip()
783
+ prompt_tune_logs.append(line)
784
+ logger.info(line)
785
+
786
+ await asyncio.gather(
787
+ read_stream(process.stdout),
788
+ read_stream(process.stderr)
789
+ )
790
+
791
+ await process.wait()
792
+
793
+ if process.returncode == 0:
794
+ logger.info("Prompt tuning completed successfully")
795
+
796
+ # Replace the existing template files with the newly generated prompts
797
+ dest_dir = os.path.join(ROOT_DIR, "prompts")
798
+
799
+ for filename in os.listdir(temp_output):
800
+ if filename.endswith(".txt"):
801
+ source_file = os.path.join(temp_output, filename)
802
+ dest_file = os.path.join(dest_dir, filename)
803
+ shutil.move(source_file, dest_file)
804
+ logger.info(f"Replaced {filename} in {dest_file}")
805
+
806
+ return PromptTuneResponse(status="success", message="Prompt tuning completed successfully. Existing prompts have been replaced.")
807
+ else:
808
+ logger.error("Prompt tuning failed")
809
+ return PromptTuneResponse(status="error", message="Prompt tuning failed. Check logs for details.")
810
+ except Exception as e:
811
+ logger.error(f"Prompt tuning failed: {str(e)}")
812
+ return PromptTuneResponse(status="error", message=f"Prompt tuning failed: {str(e)}")
813
+
814
+ @app.post("/v1/prompt_tune")
815
+ async def prompt_tune(request: PromptTuneRequest, background_tasks: BackgroundTasks):
816
+ background_tasks.add_task(run_prompt_tuning, request)
817
+ return {"status": "started", "message": "Prompt tuning process has been started in the background"}
818
+
819
+ @app.get("/v1/prompt_tune_status")
820
+ async def prompt_tune_status():
821
+ return {
822
+ "status": "running" if prompt_tune_logs else "idle",
823
+ "logs": list(prompt_tune_logs)
824
+ }
825
+
826
+ class IndexingRequest(BaseModel):
827
+ llm_model: str
828
+ embed_model: str
829
+ llm_api_base: str
830
+ embed_api_base: str
831
+ root: str
832
+ verbose: bool = False
833
+ nocache: bool = False
834
+ resume: Optional[str] = None
835
+ reporter: str = "rich"
836
+ emit: List[str] = ["parquet"]
837
+ custom_args: Optional[str] = None
838
+ llm_params: Dict[str, Any] = Field(default_factory=dict)
839
+ embed_params: Dict[str, Any] = Field(default_factory=dict)
840
+
841
+ # Global variable to store the latest indexing logs
842
+ indexing_logs = deque(maxlen=100)
843
+
844
+ async def run_indexing(request: IndexingRequest):
845
+ cmd = ["python", "-m", "graphrag.index"]
846
+
847
+ cmd.extend(["--root", request.root])
848
+
849
+ if request.verbose:
850
+ cmd.append("--verbose")
851
+
852
+ if request.nocache:
853
+ cmd.append("--nocache")
854
+
855
+ if request.resume:
856
+ cmd.extend(["--resume", request.resume])
857
+
858
+ cmd.extend(["--reporter", request.reporter])
859
+ cmd.extend(["--emit", ",".join(request.emit)])
860
+
861
+ # Set environment variables for LLM and embedding models
862
+ env: Dict[str, Any] = os.environ.copy()
863
+ env["GRAPHRAG_LLM_MODEL"] = request.llm_model
864
+ env["GRAPHRAG_EMBED_MODEL"] = request.embed_model
865
+ env["GRAPHRAG_LLM_API_BASE"] = LLM_API_BASE
866
+ env["GRAPHRAG_EMBED_API_BASE"] = EMBEDDINGS_API_BASE
867
+
868
+ # Set environment variables for LLM parameters
869
+ for key, value in request.llm_params.items():
870
+ env[f"GRAPHRAG_LLM_{key.upper()}"] = str(value)
871
+
872
+ # Set environment variables for embedding parameters
873
+ for key, value in request.embed_params.items():
874
+ env[f"GRAPHRAG_EMBED_{key.upper()}"] = str(value)
875
+
876
+ # Add custom CLI arguments
877
+ if request.custom_args:
878
+ cmd.extend(request.custom_args.split())
879
+
880
+ logger.info(f"Executing indexing command: {' '.join(cmd)}")
881
+ logger.info(f"Environment variables: {env}")
882
+
883
+ try:
884
+ process = await asyncio.create_subprocess_exec(
885
+ *cmd,
886
+ stdout=asyncio.subprocess.PIPE,
887
+ stderr=asyncio.subprocess.PIPE,
888
+ env=env
889
+ )
890
+
891
+ async def read_stream(stream):
892
+ while True:
893
+ line = await stream.readline()
894
+ if not line:
895
+ break
896
+ line = line.decode().strip()
897
+ indexing_logs.append(line)
898
+ logger.info(line)
899
+
900
+ await asyncio.gather(
901
+ read_stream(process.stdout),
902
+ read_stream(process.stderr)
903
+ )
904
+
905
+ await process.wait()
906
+
907
+ if process.returncode == 0:
908
+ logger.info("Indexing completed successfully")
909
+ return {"status": "success", "message": "Indexing completed successfully"}
910
+ else:
911
+ logger.error("Indexing failed")
912
+ return {"status": "error", "message": "Indexing failed. Check logs for details."}
913
+ except Exception as e:
914
+ logger.error(f"Indexing failed: {str(e)}")
915
+ return {"status": "error", "message": f"Indexing failed: {str(e)}"}
916
+
917
+
918
+ @app.post("/v1/index")
919
+ async def start_indexing(request: IndexingRequest, background_tasks: BackgroundTasks):
920
+ background_tasks.add_task(run_indexing, request)
921
+ return {"status": "started", "message": "Indexing process has been started in the background"}
922
+
923
+ @app.get("/v1/index_status")
924
+ async def indexing_status():
925
+ return {
926
+ "status": "running" if indexing_logs else "idle",
927
+ "logs": list(indexing_logs)
928
+ }
929
+
930
+ if __name__ == "__main__":
931
+ parser = argparse.ArgumentParser(description="Launch the GraphRAG API server")
932
+ parser.add_argument("--host", type=str, default="127.0.0.1", help="Host to bind the server to")
933
+ parser.add_argument("--port", type=int, default=PORT, help="Port to bind the server to")
934
+ parser.add_argument("--reload", action="store_true", help="Enable auto-reload mode")
935
+ args = parser.parse_args()
936
+
937
+ import uvicorn
938
+ uvicorn.run(
939
+ "api:app",
940
+ host=args.host,
941
+ port=args.port,
942
+ reload=args.reload
943
+ )
app.py ADDED
@@ -0,0 +1,1786 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ from gradio.helpers import Progress
3
+ import asyncio
4
+ import subprocess
5
+ import yaml
6
+ import os
7
+ import networkx as nx
8
+ import plotly.graph_objects as go
9
+ import numpy as np
10
+ import plotly.io as pio
11
+ import lancedb
12
+ import random
13
+ import io
14
+ import shutil
15
+ import logging
16
+ import queue
17
+ import threading
18
+ import time
19
+ from collections import deque
20
+ import re
21
+ import glob
22
+ from datetime import datetime
23
+ import json
24
+ import requests
25
+ import aiohttp
26
+ from openai import OpenAI
27
+ from openai import AsyncOpenAI
28
+ import pyarrow.parquet as pq
29
+ import pandas as pd
30
+ import sys
31
+ import colorsys
32
+ from dotenv import load_dotenv, set_key
33
+ import argparse
34
+ import socket
35
+ import tiktoken
36
+ from graphrag.query.context_builder.entity_extraction import EntityVectorStoreKey
37
+ from graphrag.query.indexer_adapters import (
38
+ read_indexer_covariates,
39
+ read_indexer_entities,
40
+ read_indexer_relationships,
41
+ read_indexer_reports,
42
+ read_indexer_text_units,
43
+ )
44
+ from graphrag.llm.openai import create_openai_chat_llm
45
+ from graphrag.llm.openai.factories import create_openai_embedding_llm
46
+ from graphrag.query.input.loaders.dfs import store_entity_semantic_embeddings
47
+ from graphrag.query.llm.oai.chat_openai import ChatOpenAI
48
+ from graphrag.llm.openai.openai_configuration import OpenAIConfiguration
49
+ from graphrag.llm.openai.openai_embeddings_llm import OpenAIEmbeddingsLLM
50
+ from graphrag.query.llm.oai.typing import OpenaiApiType
51
+ from graphrag.query.structured_search.local_search.mixed_context import LocalSearchMixedContext
52
+ from graphrag.query.structured_search.local_search.search import LocalSearch
53
+ from graphrag.query.structured_search.global_search.community_context import GlobalCommunityContext
54
+ from graphrag.query.structured_search.global_search.search import GlobalSearch
55
+ from graphrag.vector_stores.lancedb import LanceDBVectorStore
56
+ import textwrap
57
+
58
+
59
+
60
+ # Suppress warnings
61
+ import warnings
62
+ warnings.filterwarnings("ignore", category=UserWarning, module="gradio_client.documentation")
63
+
64
+
65
+ load_dotenv('indexing/.env')
66
+
67
+ # Set default values for API-related environment variables
68
+ os.environ.setdefault("LLM_API_BASE", os.getenv("LLM_API_BASE"))
69
+ os.environ.setdefault("LLM_API_KEY", os.getenv("LLM_API_KEY"))
70
+ os.environ.setdefault("LLM_MODEL", os.getenv("LLM_MODEL"))
71
+ os.environ.setdefault("EMBEDDINGS_API_BASE", os.getenv("EMBEDDINGS_API_BASE"))
72
+ os.environ.setdefault("EMBEDDINGS_API_KEY", os.getenv("EMBEDDINGS_API_KEY"))
73
+ os.environ.setdefault("EMBEDDINGS_MODEL", os.getenv("EMBEDDINGS_MODEL"))
74
+
75
+ # Add the project root to the Python path
76
+ project_root = os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))
77
+ sys.path.insert(0, project_root)
78
+
79
+
80
+ # Set up logging
81
+ log_queue = queue.Queue()
82
+ logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
83
+
84
+
85
+ llm = None
86
+ text_embedder = None
87
+
88
+ class QueueHandler(logging.Handler):
89
+ def __init__(self, log_queue):
90
+ super().__init__()
91
+ self.log_queue = log_queue
92
+
93
+ def emit(self, record):
94
+ self.log_queue.put(self.format(record))
95
+ queue_handler = QueueHandler(log_queue)
96
+ logging.getLogger().addHandler(queue_handler)
97
+
98
+
99
+
100
+ def initialize_models():
101
+ global llm, text_embedder
102
+
103
+ llm_api_base = os.getenv("LLM_API_BASE")
104
+ llm_api_key = os.getenv("LLM_API_KEY")
105
+ embeddings_api_base = os.getenv("EMBEDDINGS_API_BASE")
106
+ embeddings_api_key = os.getenv("EMBEDDINGS_API_KEY")
107
+
108
+ llm_service_type = os.getenv("LLM_SERVICE_TYPE", "openai_chat").lower() # Provide a default and lower it
109
+ embeddings_service_type = os.getenv("EMBEDDINGS_SERVICE_TYPE", "openai").lower() # Provide a default and lower it
110
+
111
+ llm_model = os.getenv("LLM_MODEL")
112
+ embeddings_model = os.getenv("EMBEDDINGS_MODEL")
113
+
114
+ logging.info("Fetching models...")
115
+ models = fetch_models(llm_api_base, llm_api_key, llm_service_type)
116
+
117
+ # Use the same models list for both LLM and embeddings
118
+ llm_models = models
119
+ embeddings_models = models
120
+
121
+ # Initialize LLM
122
+ if llm_service_type == "openai_chat":
123
+ llm = ChatOpenAI(
124
+ api_key=llm_api_key,
125
+ api_base=f"{llm_api_base}/v1",
126
+ model=llm_model,
127
+ api_type=OpenaiApiType.OpenAI,
128
+ max_retries=20,
129
+ )
130
+ # Initialize OpenAI client for embeddings
131
+ openai_client = OpenAI(
132
+ api_key=embeddings_api_key or "dummy_key",
133
+ base_url=f"{embeddings_api_base}/v1"
134
+ )
135
+
136
+ # Initialize text embedder using OpenAIEmbeddingsLLM
137
+ text_embedder = OpenAIEmbeddingsLLM(
138
+ client=openai_client,
139
+ configuration={
140
+ "model": embeddings_model,
141
+ "api_type": "open_ai",
142
+ "api_base": embeddings_api_base,
143
+ "api_key": embeddings_api_key or None,
144
+ "provider": embeddings_service_type
145
+ }
146
+ )
147
+
148
+ return llm_models, embeddings_models, llm_service_type, embeddings_service_type, llm_api_base, embeddings_api_base, text_embedder
149
+
150
+ def find_latest_output_folder():
151
+ root_dir = "./indexing/output"
152
+ folders = [f for f in os.listdir(root_dir) if os.path.isdir(os.path.join(root_dir, f))]
153
+
154
+ if not folders:
155
+ raise ValueError("No output folders found")
156
+
157
+ # Sort folders by creation time, most recent first
158
+ sorted_folders = sorted(folders, key=lambda x: os.path.getctime(os.path.join(root_dir, x)), reverse=True)
159
+
160
+ latest_folder = None
161
+ timestamp = None
162
+
163
+ for folder in sorted_folders:
164
+ try:
165
+ # Try to parse the folder name as a timestamp
166
+ timestamp = datetime.strptime(folder, "%Y%m%d-%H%M%S")
167
+ latest_folder = folder
168
+ break
169
+ except ValueError:
170
+ # If the folder name is not a valid timestamp, skip it
171
+ continue
172
+
173
+ if latest_folder is None:
174
+ raise ValueError("No valid timestamp folders found")
175
+
176
+ latest_path = os.path.join(root_dir, latest_folder)
177
+ artifacts_path = os.path.join(latest_path, "artifacts")
178
+
179
+ if not os.path.exists(artifacts_path):
180
+ raise ValueError(f"Artifacts folder not found in {latest_path}")
181
+
182
+ return latest_path, latest_folder
183
+
184
+ def initialize_data():
185
+ global entity_df, relationship_df, text_unit_df, report_df, covariate_df
186
+
187
+ tables = {
188
+ "entity_df": "create_final_nodes",
189
+ "relationship_df": "create_final_edges",
190
+ "text_unit_df": "create_final_text_units",
191
+ "report_df": "create_final_reports",
192
+ "covariate_df": "create_final_covariates"
193
+ }
194
+
195
+ timestamp = None # Initialize timestamp to None
196
+
197
+ try:
198
+ latest_output_folder, timestamp = find_latest_output_folder()
199
+ artifacts_folder = os.path.join(latest_output_folder, "artifacts")
200
+
201
+ for df_name, file_prefix in tables.items():
202
+ file_pattern = os.path.join(artifacts_folder, f"{file_prefix}*.parquet")
203
+ matching_files = glob.glob(file_pattern)
204
+
205
+ if matching_files:
206
+ latest_file = max(matching_files, key=os.path.getctime)
207
+ df = pd.read_parquet(latest_file)
208
+ globals()[df_name] = df
209
+ logging.info(f"Successfully loaded {df_name} from {latest_file}")
210
+ else:
211
+ logging.warning(f"No matching file found for {df_name} in {artifacts_folder}. Initializing as an empty DataFrame.")
212
+ globals()[df_name] = pd.DataFrame()
213
+
214
+ except Exception as e:
215
+ logging.error(f"Error initializing data: {str(e)}")
216
+ for df_name in tables.keys():
217
+ globals()[df_name] = pd.DataFrame()
218
+
219
+ return timestamp
220
+
221
+ # Call initialize_data and store the timestamp
222
+ current_timestamp = initialize_data()
223
+
224
+
225
+ def find_available_port(start_port, max_attempts=100):
226
+ for port in range(start_port, start_port + max_attempts):
227
+ with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
228
+ try:
229
+ s.bind(('', port))
230
+ return port
231
+ except OSError:
232
+ continue
233
+ raise IOError("No free ports found")
234
+
235
+ def start_api_server(port):
236
+ subprocess.Popen([sys.executable, "api_server.py", "--port", str(port)])
237
+
238
+ def wait_for_api_server(port):
239
+ max_retries = 30
240
+ for _ in range(max_retries):
241
+ try:
242
+ response = requests.get(f"http://localhost:{port}")
243
+ if response.status_code == 200:
244
+ print(f"API server is up and running on port {port}")
245
+ return
246
+ else:
247
+ print(f"Unexpected response from API server: {response.status_code}")
248
+ except requests.ConnectionError:
249
+ time.sleep(1)
250
+ print("Failed to connect to API server")
251
+
252
+ def load_settings():
253
+ try:
254
+ with open("indexing/settings.yaml", "r") as f:
255
+ return yaml.safe_load(f) or {}
256
+ except FileNotFoundError:
257
+ return {}
258
+
259
+ def update_setting(key, value):
260
+ settings = load_settings()
261
+ try:
262
+ settings[key] = json.loads(value)
263
+ except json.JSONDecodeError:
264
+ settings[key] = value
265
+
266
+ try:
267
+ with open("indexing/settings.yaml", "w") as f:
268
+ yaml.dump(settings, f, default_flow_style=False)
269
+ return f"Setting '{key}' updated successfully"
270
+ except Exception as e:
271
+ return f"Error updating setting '{key}': {str(e)}"
272
+
273
+ def create_setting_component(key, value):
274
+ with gr.Accordion(key, open=False):
275
+ if isinstance(value, (dict, list)):
276
+ value_str = json.dumps(value, indent=2)
277
+ lines = value_str.count('\n') + 1
278
+ else:
279
+ value_str = str(value)
280
+ lines = 1
281
+
282
+ text_area = gr.TextArea(value=value_str, label="Value", lines=lines, max_lines=20)
283
+ update_btn = gr.Button("Update", variant="primary")
284
+ status = gr.Textbox(label="Status", visible=False)
285
+
286
+ update_btn.click(
287
+ fn=update_setting,
288
+ inputs=[gr.Textbox(value=key, visible=False), text_area],
289
+ outputs=[status]
290
+ ).then(
291
+ fn=lambda: gr.update(visible=True),
292
+ outputs=[status]
293
+ )
294
+
295
+
296
+
297
+ def get_openai_client():
298
+ return OpenAI(
299
+ base_url=os.getenv("LLM_API_BASE"),
300
+ api_key=os.getenv("LLM_API_KEY"),
301
+ llm_model = os.getenv("LLM_MODEL")
302
+ )
303
+
304
+ async def chat_with_openai(messages, model, temperature, max_tokens, api_base):
305
+ client = AsyncOpenAI(
306
+ base_url=api_base,
307
+ api_key=os.getenv("LLM_API_KEY")
308
+ )
309
+
310
+ try:
311
+ response = await client.chat.completions.create(
312
+ model=model,
313
+ messages=messages,
314
+ temperature=temperature,
315
+ max_tokens=max_tokens
316
+ )
317
+ return response.choices[0].message.content
318
+ except Exception as e:
319
+ logging.error(f"Error in chat_with_openai: {str(e)}")
320
+ return f"An error occurred: {str(e)}"
321
+ return f"Error: {str(e)}"
322
+
323
+ def chat_with_llm(query, history, system_message, temperature, max_tokens, model, api_base):
324
+ try:
325
+ messages = [{"role": "system", "content": system_message}]
326
+ for item in history:
327
+ if isinstance(item, tuple) and len(item) == 2:
328
+ human, ai = item
329
+ messages.append({"role": "user", "content": human})
330
+ messages.append({"role": "assistant", "content": ai})
331
+ messages.append({"role": "user", "content": query})
332
+
333
+ logging.info(f"Sending chat request to {api_base} with model {model}")
334
+ client = OpenAI(base_url=api_base, api_key=os.getenv("LLM_API_KEY", "dummy-key"))
335
+ response = client.chat.completions.create(
336
+ model=model,
337
+ messages=messages,
338
+ temperature=temperature,
339
+ max_tokens=max_tokens
340
+ )
341
+ return response.choices[0].message.content
342
+ except Exception as e:
343
+ logging.error(f"Error in chat_with_llm: {str(e)}")
344
+ logging.error(f"Attempted with model: {model}, api_base: {api_base}")
345
+ raise RuntimeError(f"Chat request failed: {str(e)}")
346
+
347
+ def run_graphrag_query(cli_args):
348
+ try:
349
+ command = ' '.join(cli_args)
350
+ logging.info(f"Executing command: {command}")
351
+ result = subprocess.run(cli_args, capture_output=True, text=True, check=True)
352
+ return result.stdout.strip()
353
+ except subprocess.CalledProcessError as e:
354
+ logging.error(f"Error running GraphRAG query: {e}")
355
+ logging.error(f"Command output (stdout): {e.stdout}")
356
+ logging.error(f"Command output (stderr): {e.stderr}")
357
+ raise RuntimeError(f"GraphRAG query failed: {e.stderr}")
358
+
359
+ def parse_query_response(response: str):
360
+ try:
361
+ # Split the response into metadata and content
362
+ parts = response.split("\n\n", 1)
363
+ if len(parts) < 2:
364
+ return response # Return original response if it doesn't contain metadata
365
+
366
+ metadata_str, content = parts
367
+ metadata = json.loads(metadata_str)
368
+
369
+ # Extract relevant information from metadata
370
+ query_type = metadata.get("query_type", "Unknown")
371
+ execution_time = metadata.get("execution_time", "N/A")
372
+ tokens_used = metadata.get("tokens_used", "N/A")
373
+
374
+ # Remove unwanted lines from the content
375
+ content_lines = content.split('\n')
376
+ filtered_content = '\n'.join([line for line in content_lines if not line.startswith("INFO:") and not line.startswith("creating llm client")])
377
+
378
+ # Format the parsed response
379
+ parsed_response = f"""
380
+ Query Type: {query_type}
381
+ Execution Time: {execution_time} seconds
382
+ Tokens Used: {tokens_used}
383
+
384
+ {filtered_content.strip()}
385
+ """
386
+ return parsed_response
387
+ except Exception as e:
388
+ print(f"Error parsing query response: {str(e)}")
389
+ return response
390
+
391
+ def send_message(query_type, query, history, system_message, temperature, max_tokens, preset, community_level, response_type, custom_cli_args, selected_folder):
392
+ try:
393
+ if query_type in ["global", "local"]:
394
+ cli_args = construct_cli_args(query_type, preset, community_level, response_type, custom_cli_args, query, selected_folder)
395
+ logging.info(f"Executing {query_type} search with command: {' '.join(cli_args)}")
396
+ result = run_graphrag_query(cli_args)
397
+ parsed_result = parse_query_response(result)
398
+ logging.info(f"Parsed query result: {parsed_result}")
399
+ else: # Direct chat
400
+ llm_model = os.getenv("LLM_MODEL")
401
+ api_base = os.getenv("LLM_API_BASE")
402
+ logging.info(f"Executing direct chat with model: {llm_model}")
403
+
404
+ try:
405
+ result = chat_with_llm(query, history, system_message, temperature, max_tokens, llm_model, api_base)
406
+ parsed_result = result # No parsing needed for direct chat
407
+ logging.info(f"Direct chat result: {parsed_result[:100]}...") # Log first 100 chars of result
408
+ except Exception as chat_error:
409
+ logging.error(f"Error in chat_with_llm: {str(chat_error)}")
410
+ raise RuntimeError(f"Direct chat failed: {str(chat_error)}")
411
+
412
+ history.append((query, parsed_result))
413
+ except Exception as e:
414
+ error_message = f"An error occurred: {str(e)}"
415
+ logging.error(error_message)
416
+ logging.exception("Exception details:")
417
+ history.append((query, error_message))
418
+
419
+ return history, gr.update(value=""), update_logs()
420
+
421
+ def construct_cli_args(query_type, preset, community_level, response_type, custom_cli_args, query, selected_folder):
422
+ if not selected_folder:
423
+ raise ValueError("No folder selected. Please select an output folder before querying.")
424
+
425
+ artifacts_folder = os.path.join("./indexing/output", selected_folder, "artifacts")
426
+ if not os.path.exists(artifacts_folder):
427
+ raise ValueError(f"Artifacts folder not found in {artifacts_folder}")
428
+
429
+ base_args = [
430
+ "python", "-m", "graphrag.query",
431
+ "--data", artifacts_folder,
432
+ "--method", query_type,
433
+ ]
434
+
435
+ # Apply preset configurations
436
+ if preset.startswith("Default"):
437
+ base_args.extend(["--community_level", "2", "--response_type", "Multiple Paragraphs"])
438
+ elif preset.startswith("Detailed"):
439
+ base_args.extend(["--community_level", "4", "--response_type", "Multi-Page Report"])
440
+ elif preset.startswith("Quick"):
441
+ base_args.extend(["--community_level", "1", "--response_type", "Single Paragraph"])
442
+ elif preset.startswith("Bullet"):
443
+ base_args.extend(["--community_level", "2", "--response_type", "List of 3-7 Points"])
444
+ elif preset.startswith("Comprehensive"):
445
+ base_args.extend(["--community_level", "5", "--response_type", "Multi-Page Report"])
446
+ elif preset.startswith("High-Level"):
447
+ base_args.extend(["--community_level", "1", "--response_type", "Single Page"])
448
+ elif preset.startswith("Focused"):
449
+ base_args.extend(["--community_level", "3", "--response_type", "Multiple Paragraphs"])
450
+ elif preset == "Custom Query":
451
+ base_args.extend([
452
+ "--community_level", str(community_level),
453
+ "--response_type", f'"{response_type}"',
454
+ ])
455
+ if custom_cli_args:
456
+ base_args.extend(custom_cli_args.split())
457
+
458
+ # Add the query at the end
459
+ base_args.append(query)
460
+
461
+ return base_args
462
+
463
+
464
+
465
+
466
+
467
+
468
+ def upload_file(file):
469
+ if file is not None:
470
+ input_dir = os.path.join("indexing", "input")
471
+ os.makedirs(input_dir, exist_ok=True)
472
+
473
+ # Get the original filename from the uploaded file
474
+ original_filename = file.name
475
+
476
+ # Create the destination path
477
+ destination_path = os.path.join(input_dir, os.path.basename(original_filename))
478
+
479
+ # Move the uploaded file to the destination path
480
+ shutil.move(file.name, destination_path)
481
+
482
+ logging.info(f"File uploaded and moved to: {destination_path}")
483
+ status = f"File uploaded: {os.path.basename(original_filename)}"
484
+ else:
485
+ status = "No file uploaded"
486
+
487
+ # Get the updated file list
488
+ updated_file_list = [f["path"] for f in list_input_files()]
489
+
490
+ return status, gr.update(choices=updated_file_list), update_logs()
491
+
492
+ def list_input_files():
493
+ input_dir = os.path.join("indexing", "input")
494
+ files = []
495
+ if os.path.exists(input_dir):
496
+ files = os.listdir(input_dir)
497
+ return [{"name": f, "path": os.path.join(input_dir, f)} for f in files]
498
+
499
+ def delete_file(file_path):
500
+ try:
501
+ os.remove(file_path)
502
+ logging.info(f"File deleted: {file_path}")
503
+ status = f"File deleted: {os.path.basename(file_path)}"
504
+ except Exception as e:
505
+ logging.error(f"Error deleting file: {str(e)}")
506
+ status = f"Error deleting file: {str(e)}"
507
+
508
+ # Get the updated file list
509
+ updated_file_list = [f["path"] for f in list_input_files()]
510
+
511
+ return status, gr.update(choices=updated_file_list), update_logs()
512
+
513
+ def read_file_content(file_path):
514
+ try:
515
+ if file_path.endswith('.parquet'):
516
+ df = pd.read_parquet(file_path)
517
+
518
+ # Get basic information about the DataFrame
519
+ info = f"Parquet File: {os.path.basename(file_path)}\n"
520
+ info += f"Rows: {len(df)}, Columns: {len(df.columns)}\n\n"
521
+ info += "Column Names:\n" + "\n".join(df.columns) + "\n\n"
522
+
523
+ # Display first few rows
524
+ info += "First 5 rows:\n"
525
+ info += df.head().to_string() + "\n\n"
526
+
527
+ # Display basic statistics
528
+ info += "Basic Statistics:\n"
529
+ info += df.describe().to_string()
530
+
531
+ return info
532
+ else:
533
+ with open(file_path, 'r', encoding='utf-8', errors='replace') as file:
534
+ content = file.read()
535
+ return content
536
+ except Exception as e:
537
+ logging.error(f"Error reading file: {str(e)}")
538
+ return f"Error reading file: {str(e)}"
539
+
540
+ def save_file_content(file_path, content):
541
+ try:
542
+ with open(file_path, 'w') as file:
543
+ file.write(content)
544
+ logging.info(f"File saved: {file_path}")
545
+ status = f"File saved: {os.path.basename(file_path)}"
546
+ except Exception as e:
547
+ logging.error(f"Error saving file: {str(e)}")
548
+ status = f"Error saving file: {str(e)}"
549
+ return status, update_logs()
550
+
551
+ def manage_data():
552
+ db = lancedb.connect("./indexing/lancedb")
553
+ tables = db.table_names()
554
+ table_info = ""
555
+ if tables:
556
+ table = db[tables[0]]
557
+ table_info = f"Table: {tables[0]}\nSchema: {table.schema}"
558
+
559
+ input_files = list_input_files()
560
+
561
+ return {
562
+ "database_info": f"Tables: {', '.join(tables)}\n\n{table_info}",
563
+ "input_files": input_files
564
+ }
565
+
566
+
567
+ def find_latest_graph_file(root_dir):
568
+ pattern = os.path.join(root_dir, "output", "*", "artifacts", "*.graphml")
569
+ graph_files = glob.glob(pattern)
570
+ if not graph_files:
571
+ # If no files found, try excluding .DS_Store
572
+ output_dir = os.path.join(root_dir, "output")
573
+ run_dirs = [d for d in os.listdir(output_dir) if os.path.isdir(os.path.join(output_dir, d)) and d != ".DS_Store"]
574
+ if run_dirs:
575
+ latest_run = max(run_dirs)
576
+ pattern = os.path.join(root_dir, "output", latest_run, "artifacts", "*.graphml")
577
+ graph_files = glob.glob(pattern)
578
+
579
+ if not graph_files:
580
+ return None
581
+
582
+ # Sort files by modification time, most recent first
583
+ latest_file = max(graph_files, key=os.path.getmtime)
584
+ return latest_file
585
+
586
+ def update_visualization(folder_name, file_name, layout_type, node_size, edge_width, node_color_attribute, color_scheme, show_labels, label_size):
587
+ root_dir = "./indexing"
588
+ if not folder_name or not file_name:
589
+ return None, "Please select a folder and a GraphML file."
590
+ file_name = file_name.split("] ")[1] if "]" in file_name else file_name # Remove file type prefix
591
+ graph_path = os.path.join(root_dir, "output", folder_name, "artifacts", file_name)
592
+ if not graph_path.endswith('.graphml'):
593
+ return None, "Please select a GraphML file for visualization."
594
+ try:
595
+ # Load the GraphML file
596
+ graph = nx.read_graphml(graph_path)
597
+
598
+ # Create layout based on user selection
599
+ if layout_type == "3D Spring":
600
+ pos = nx.spring_layout(graph, dim=3, seed=42, k=0.5)
601
+ elif layout_type == "2D Spring":
602
+ pos = nx.spring_layout(graph, dim=2, seed=42, k=0.5)
603
+ else: # Circular
604
+ pos = nx.circular_layout(graph)
605
+
606
+ # Extract node positions
607
+ if layout_type == "3D Spring":
608
+ x_nodes = [pos[node][0] for node in graph.nodes()]
609
+ y_nodes = [pos[node][1] for node in graph.nodes()]
610
+ z_nodes = [pos[node][2] for node in graph.nodes()]
611
+ else:
612
+ x_nodes = [pos[node][0] for node in graph.nodes()]
613
+ y_nodes = [pos[node][1] for node in graph.nodes()]
614
+ z_nodes = [0] * len(graph.nodes()) # Set all z-coordinates to 0 for 2D layouts
615
+
616
+ # Extract edge positions
617
+ x_edges, y_edges, z_edges = [], [], []
618
+ for edge in graph.edges():
619
+ x_edges.extend([pos[edge[0]][0], pos[edge[1]][0], None])
620
+ y_edges.extend([pos[edge[0]][1], pos[edge[1]][1], None])
621
+ if layout_type == "3D Spring":
622
+ z_edges.extend([pos[edge[0]][2], pos[edge[1]][2], None])
623
+ else:
624
+ z_edges.extend([0, 0, None])
625
+
626
+ # Generate node colors based on user selection
627
+ if node_color_attribute == "Degree":
628
+ node_colors = [graph.degree(node) for node in graph.nodes()]
629
+ else: # Random
630
+ node_colors = [random.random() for _ in graph.nodes()]
631
+ node_colors = np.array(node_colors)
632
+ node_colors = (node_colors - node_colors.min()) / (node_colors.max() - node_colors.min())
633
+
634
+ # Create the trace for edges
635
+ edge_trace = go.Scatter3d(
636
+ x=x_edges, y=y_edges, z=z_edges,
637
+ mode='lines',
638
+ line=dict(color='lightgray', width=edge_width),
639
+ hoverinfo='none'
640
+ )
641
+
642
+ # Create the trace for nodes
643
+ node_trace = go.Scatter3d(
644
+ x=x_nodes, y=y_nodes, z=z_nodes,
645
+ mode='markers+text' if show_labels else 'markers',
646
+ marker=dict(
647
+ size=node_size,
648
+ color=node_colors,
649
+ colorscale=color_scheme,
650
+ colorbar=dict(
651
+ title='Node Degree' if node_color_attribute == "Degree" else "Random Value",
652
+ thickness=10,
653
+ x=1.1,
654
+ tickvals=[0, 1],
655
+ ticktext=['Low', 'High']
656
+ ),
657
+ line=dict(width=1)
658
+ ),
659
+ text=[node for node in graph.nodes()],
660
+ textposition="top center",
661
+ textfont=dict(size=label_size, color='black'),
662
+ hoverinfo='text'
663
+ )
664
+
665
+ # Create the plot
666
+ fig = go.Figure(data=[edge_trace, node_trace])
667
+
668
+ # Update layout for better visualization
669
+ fig.update_layout(
670
+ title=f'{layout_type} Graph Visualization: {os.path.basename(graph_path)}',
671
+ showlegend=False,
672
+ scene=dict(
673
+ xaxis=dict(showbackground=False, showticklabels=False, title=''),
674
+ yaxis=dict(showbackground=False, showticklabels=False, title=''),
675
+ zaxis=dict(showbackground=False, showticklabels=False, title='')
676
+ ),
677
+ margin=dict(l=0, r=0, b=0, t=40),
678
+ annotations=[
679
+ dict(
680
+ showarrow=False,
681
+ text=f"Interactive {layout_type} visualization of GraphML data",
682
+ xref="paper",
683
+ yref="paper",
684
+ x=0,
685
+ y=0
686
+ )
687
+ ],
688
+ autosize=True
689
+ )
690
+
691
+ fig.update_layout(autosize=True)
692
+ fig.update_layout(height=600) # Set a fixed height
693
+ return fig, f"Graph visualization generated successfully. Using file: {graph_path}"
694
+ except Exception as e:
695
+ return go.Figure(), f"Error visualizing graph: {str(e)}"
696
+
697
+
698
+
699
+
700
+
701
+ def update_logs():
702
+ logs = []
703
+ while not log_queue.empty():
704
+ logs.append(log_queue.get())
705
+ return "\n".join(logs)
706
+
707
+
708
+
709
+ def fetch_models(base_url, api_key, service_type):
710
+ try:
711
+ if service_type.lower() == "ollama":
712
+ response = requests.get(f"{base_url}/tags", timeout=10)
713
+ else: # OpenAI Compatible
714
+ headers = {
715
+ "Authorization": f"Bearer {api_key}",
716
+ "Content-Type": "application/json"
717
+ }
718
+ response = requests.get(f"{base_url}/models", headers=headers, timeout=10)
719
+
720
+ logging.info(f"Raw API response: {response.text}")
721
+
722
+ if response.status_code == 200:
723
+ data = response.json()
724
+ if service_type.lower() == "ollama":
725
+ models = [model.get('name', '') for model in data.get('models', data) if isinstance(model, dict)]
726
+ else: # OpenAI Compatible
727
+ models = [model.get('id', '') for model in data.get('data', []) if isinstance(model, dict)]
728
+
729
+ models = [model for model in models if model] # Remove empty strings
730
+
731
+ if not models:
732
+ logging.warning(f"No models found in {service_type} API response")
733
+ return ["No models available"]
734
+
735
+ logging.info(f"Successfully fetched {service_type} models: {models}")
736
+ return models
737
+ else:
738
+ logging.error(f"Error fetching {service_type} models. Status code: {response.status_code}, Response: {response.text}")
739
+ return ["Error fetching models"]
740
+ except requests.RequestException as e:
741
+ logging.error(f"Exception while fetching {service_type} models: {str(e)}")
742
+ return ["Error: Connection failed"]
743
+ except Exception as e:
744
+ logging.error(f"Unexpected error in fetch_models: {str(e)}")
745
+ return ["Error: Unexpected issue"]
746
+
747
+ def update_model_choices(base_url, api_key, service_type, settings_key):
748
+ models = fetch_models(base_url, api_key, service_type)
749
+
750
+ if not models:
751
+ logging.warning(f"No models fetched for {service_type}.")
752
+
753
+ # Get the current model from settings
754
+ current_model = settings.get(settings_key, {}).get('llm', {}).get('model')
755
+
756
+ # If the current model is not in the list, add it
757
+ if current_model and current_model not in models:
758
+ models.append(current_model)
759
+
760
+ return gr.update(choices=models, value=current_model if current_model in models else (models[0] if models else None))
761
+
762
+ def update_llm_model_choices(base_url, api_key, service_type):
763
+ return update_model_choices(base_url, api_key, service_type, 'llm')
764
+
765
+ def update_embeddings_model_choices(base_url, api_key, service_type):
766
+ return update_model_choices(base_url, api_key, service_type, 'embeddings')
767
+
768
+
769
+
770
+
771
+ def update_llm_settings(llm_model, embeddings_model, context_window, system_message, temperature, max_tokens,
772
+ llm_api_base, llm_api_key,
773
+ embeddings_api_base, embeddings_api_key, embeddings_service_type):
774
+ try:
775
+ # Update settings.yaml
776
+ settings = load_settings()
777
+ settings['llm'].update({
778
+ "type": "openai", # Always set to "openai" since we removed the radio button
779
+ "model": llm_model,
780
+ "api_base": llm_api_base,
781
+ "api_key": "${GRAPHRAG_API_KEY}",
782
+ "temperature": temperature,
783
+ "max_tokens": max_tokens,
784
+ "provider": "openai_chat" # Always set to "openai_chat"
785
+ })
786
+ settings['embeddings']['llm'].update({
787
+ "type": "openai_embedding", # Always use OpenAIEmbeddingsLLM
788
+ "model": embeddings_model,
789
+ "api_base": embeddings_api_base,
790
+ "api_key": "${GRAPHRAG_API_KEY}",
791
+ "provider": embeddings_service_type
792
+ })
793
+
794
+ with open("indexing/settings.yaml", 'w') as f:
795
+ yaml.dump(settings, f, default_flow_style=False)
796
+
797
+ # Update .env file
798
+ update_env_file("LLM_API_BASE", llm_api_base)
799
+ update_env_file("LLM_API_KEY", llm_api_key)
800
+ update_env_file("LLM_MODEL", llm_model)
801
+ update_env_file("EMBEDDINGS_API_BASE", embeddings_api_base)
802
+ update_env_file("EMBEDDINGS_API_KEY", embeddings_api_key)
803
+ update_env_file("EMBEDDINGS_MODEL", embeddings_model)
804
+ update_env_file("CONTEXT_WINDOW", str(context_window))
805
+ update_env_file("SYSTEM_MESSAGE", system_message)
806
+ update_env_file("TEMPERATURE", str(temperature))
807
+ update_env_file("MAX_TOKENS", str(max_tokens))
808
+ update_env_file("LLM_SERVICE_TYPE", "openai_chat")
809
+ update_env_file("EMBEDDINGS_SERVICE_TYPE", embeddings_service_type)
810
+
811
+ # Reload environment variables
812
+ load_dotenv(override=True)
813
+
814
+ return "LLM and embeddings settings updated successfully in both settings.yaml and .env files."
815
+ except Exception as e:
816
+ return f"Error updating LLM and embeddings settings: {str(e)}"
817
+
818
+ def update_env_file(key, value):
819
+ env_path = 'indexing/.env'
820
+ with open(env_path, 'r') as file:
821
+ lines = file.readlines()
822
+
823
+ updated = False
824
+ for i, line in enumerate(lines):
825
+ if line.startswith(f"{key}="):
826
+ lines[i] = f"{key}={value}\n"
827
+ updated = True
828
+ break
829
+
830
+ if not updated:
831
+ lines.append(f"{key}={value}\n")
832
+
833
+ with open(env_path, 'w') as file:
834
+ file.writelines(lines)
835
+
836
+ custom_css = """
837
+ html, body {
838
+ margin: 0;
839
+ padding: 0;
840
+ height: 100vh;
841
+ overflow: hidden;
842
+ }
843
+
844
+ .gradio-container {
845
+ margin: 0 !important;
846
+ padding: 0 !important;
847
+ width: 100vw !important;
848
+ max-width: 100vw !important;
849
+ height: 100vh !important;
850
+ max-height: 100vh !important;
851
+ overflow: auto;
852
+ display: flex;
853
+ flex-direction: column;
854
+ }
855
+
856
+ #main-container {
857
+ flex: 1;
858
+ display: flex;
859
+ overflow: hidden;
860
+ }
861
+
862
+ #left-column, #right-column {
863
+ height: 100%;
864
+ overflow-y: auto;
865
+ padding: 10px;
866
+ }
867
+
868
+ #left-column {
869
+ flex: 1;
870
+ }
871
+
872
+ #right-column {
873
+ flex: 2;
874
+ display: flex;
875
+ flex-direction: column;
876
+ }
877
+
878
+ #chat-container {
879
+ flex: 0 0 auto; /* Don't allow this to grow */
880
+ height: 100%;
881
+ display: flex;
882
+ flex-direction: column;
883
+ overflow: hidden;
884
+ border: 1px solid var(--color-accent);
885
+ border-radius: 8px;
886
+ padding: 10px;
887
+ overflow-y: auto;
888
+ }
889
+
890
+ #chatbot {
891
+ overflow-y: hidden;
892
+ height: 100%;
893
+ }
894
+
895
+ #chat-input-row {
896
+ margin-top: 10px;
897
+ }
898
+
899
+ #visualization-plot {
900
+ width: 100%;
901
+ aspect-ratio: 1 / 1;
902
+ max-height: 600px; /* Adjust this value as needed */
903
+ }
904
+
905
+ #vis-controls-row {
906
+ display: flex;
907
+ justify-content: space-between;
908
+ align-items: center;
909
+ margin-top: 10px;
910
+ }
911
+
912
+ #vis-controls-row > * {
913
+ flex: 1;
914
+ margin: 0 5px;
915
+ }
916
+
917
+ #vis-status {
918
+ margin-top: 10px;
919
+ }
920
+
921
+ /* Chat input styling */
922
+ #chat-input-row {
923
+ display: flex;
924
+ flex-direction: column;
925
+ }
926
+
927
+ #chat-input-row > div {
928
+ width: 100% !important;
929
+ }
930
+
931
+ #chat-input-row input[type="text"] {
932
+ width: 100% !important;
933
+ }
934
+
935
+ /* Adjust padding for all containers */
936
+ .gr-box, .gr-form, .gr-panel {
937
+ padding: 10px !important;
938
+ }
939
+
940
+ /* Ensure all textboxes and textareas have full height */
941
+ .gr-textbox, .gr-textarea {
942
+ height: auto !important;
943
+ min-height: 100px !important;
944
+ }
945
+
946
+ /* Ensure all dropdowns have full width */
947
+ .gr-dropdown {
948
+ width: 100% !important;
949
+ }
950
+
951
+ :root {
952
+ --color-background: #2C3639;
953
+ --color-foreground: #3F4E4F;
954
+ --color-accent: #A27B5C;
955
+ --color-text: #DCD7C9;
956
+ }
957
+
958
+ body, .gradio-container {
959
+ background-color: var(--color-background);
960
+ color: var(--color-text);
961
+ }
962
+
963
+ .gr-button {
964
+ background-color: var(--color-accent);
965
+ color: var(--color-text);
966
+ }
967
+
968
+ .gr-input, .gr-textarea, .gr-dropdown {
969
+ background-color: var(--color-foreground);
970
+ color: var(--color-text);
971
+ border: 1px solid var(--color-accent);
972
+ }
973
+
974
+ .gr-panel {
975
+ background-color: var(--color-foreground);
976
+ border: 1px solid var(--color-accent);
977
+ }
978
+
979
+ .gr-box {
980
+ border-radius: 8px;
981
+ margin-bottom: 10px;
982
+ background-color: var(--color-foreground);
983
+ }
984
+
985
+ .gr-padded {
986
+ padding: 10px;
987
+ }
988
+
989
+ .gr-form {
990
+ background-color: var(--color-foreground);
991
+ }
992
+
993
+ .gr-input-label, .gr-radio-label {
994
+ color: var(--color-text);
995
+ }
996
+
997
+ .gr-checkbox-label {
998
+ color: var(--color-text);
999
+ }
1000
+
1001
+ .gr-markdown {
1002
+ color: var(--color-text);
1003
+ }
1004
+
1005
+ .gr-accordion {
1006
+ background-color: var(--color-foreground);
1007
+ border: 1px solid var(--color-accent);
1008
+ }
1009
+
1010
+ .gr-accordion-header {
1011
+ background-color: var(--color-accent);
1012
+ color: var(--color-text);
1013
+ }
1014
+
1015
+ #visualization-container {
1016
+ display: flex;
1017
+ flex-direction: column;
1018
+ border: 2px solid var(--color-accent);
1019
+ border-radius: 8px;
1020
+ margin-top: 20px;
1021
+ padding: 10px;
1022
+ background-color: var(--color-foreground);
1023
+ height: calc(100vh - 300px); /* Adjust this value as needed */
1024
+ }
1025
+
1026
+ #visualization-plot {
1027
+ width: 100%;
1028
+ height: 100%;
1029
+ }
1030
+
1031
+ #vis-controls-row {
1032
+ display: flex;
1033
+ justify-content: space-between;
1034
+ align-items: center;
1035
+ margin-top: 10px;
1036
+ }
1037
+
1038
+ #vis-controls-row > * {
1039
+ flex: 1;
1040
+ margin: 0 5px;
1041
+ }
1042
+
1043
+ #vis-status {
1044
+ margin-top: 10px;
1045
+ }
1046
+
1047
+ #log-container {
1048
+ background-color: var(--color-foreground);
1049
+ border: 1px solid var(--color-accent);
1050
+ border-radius: 8px;
1051
+ padding: 10px;
1052
+ margin-top: 20px;
1053
+ max-height: auto;
1054
+ overflow-y: auto;
1055
+ }
1056
+
1057
+ .setting-accordion .label-wrap {
1058
+ cursor: pointer;
1059
+ }
1060
+
1061
+ .setting-accordion .icon {
1062
+ transition: transform 0.3s ease;
1063
+ }
1064
+
1065
+ .setting-accordion[open] .icon {
1066
+ transform: rotate(90deg);
1067
+ }
1068
+
1069
+ .gr-form.gr-box {
1070
+ border: none !important;
1071
+ background: none !important;
1072
+ }
1073
+
1074
+ .model-params {
1075
+ border-top: 1px solid var(--color-accent);
1076
+ margin-top: 10px;
1077
+ padding-top: 10px;
1078
+ }
1079
+ """
1080
+
1081
+ def list_output_files(root_dir):
1082
+ output_dir = os.path.join(root_dir, "output")
1083
+ files = []
1084
+ for root, _, filenames in os.walk(output_dir):
1085
+ for filename in filenames:
1086
+ files.append(os.path.join(root, filename))
1087
+ return files
1088
+
1089
+ def update_file_list():
1090
+ files = list_input_files()
1091
+ return gr.update(choices=[f["path"] for f in files])
1092
+
1093
+ def update_file_content(file_path):
1094
+ if not file_path:
1095
+ return ""
1096
+ try:
1097
+ with open(file_path, 'r', encoding='utf-8') as file:
1098
+ content = file.read()
1099
+ return content
1100
+ except Exception as e:
1101
+ logging.error(f"Error reading file: {str(e)}")
1102
+ return f"Error reading file: {str(e)}"
1103
+
1104
+ def list_output_folders(root_dir):
1105
+ output_dir = os.path.join(root_dir, "output")
1106
+ folders = [f for f in os.listdir(output_dir) if os.path.isdir(os.path.join(output_dir, f))]
1107
+ return sorted(folders, reverse=True)
1108
+
1109
+ def list_folder_contents(folder_path):
1110
+ contents = []
1111
+ for item in os.listdir(folder_path):
1112
+ item_path = os.path.join(folder_path, item)
1113
+ if os.path.isdir(item_path):
1114
+ contents.append(f"[DIR] {item}")
1115
+ else:
1116
+ _, ext = os.path.splitext(item)
1117
+ contents.append(f"[{ext[1:].upper()}] {item}")
1118
+ return contents
1119
+
1120
+ def update_output_folder_list():
1121
+ root_dir = "./"
1122
+ folders = list_output_folders(root_dir)
1123
+ return gr.update(choices=folders, value=folders[0] if folders else None)
1124
+
1125
+ def update_folder_content_list(folder_name):
1126
+ root_dir = "./"
1127
+ if not folder_name:
1128
+ return gr.update(choices=[])
1129
+ contents = list_folder_contents(os.path.join(root_dir, "output", folder_name, "artifacts"))
1130
+ return gr.update(choices=contents)
1131
+
1132
+ def handle_content_selection(folder_name, selected_item):
1133
+ root_dir = "./"
1134
+ if isinstance(selected_item, list) and selected_item:
1135
+ selected_item = selected_item[0] # Take the first item if it's a list
1136
+
1137
+ if isinstance(selected_item, str) and selected_item.startswith("[DIR]"):
1138
+ dir_name = selected_item[6:] # Remove "[DIR] " prefix
1139
+ sub_contents = list_folder_contents(os.path.join(root_dir, "output", folder_name, dir_name))
1140
+ return gr.update(choices=sub_contents), "", ""
1141
+ elif isinstance(selected_item, str):
1142
+ file_name = selected_item.split("] ")[1] if "]" in selected_item else selected_item # Remove file type prefix if present
1143
+ file_path = os.path.join(root_dir, "output", folder_name, "artifacts", file_name)
1144
+ file_size = os.path.getsize(file_path)
1145
+ file_type = os.path.splitext(file_name)[1]
1146
+ file_info = f"File: {file_name}\nSize: {file_size} bytes\nType: {file_type}"
1147
+ content = read_file_content(file_path)
1148
+ return gr.update(), file_info, content
1149
+ else:
1150
+ return gr.update(), "", ""
1151
+
1152
+ def initialize_selected_folder(folder_name):
1153
+ root_dir = "./"
1154
+ if not folder_name:
1155
+ return "Please select a folder first.", gr.update(choices=[])
1156
+ folder_path = os.path.join(root_dir, "output", folder_name, "artifacts")
1157
+ if not os.path.exists(folder_path):
1158
+ return f"Artifacts folder not found in '{folder_name}'.", gr.update(choices=[])
1159
+ contents = list_folder_contents(folder_path)
1160
+ return f"Folder '{folder_name}/artifacts' initialized with {len(contents)} items.", gr.update(choices=contents)
1161
+
1162
+
1163
+ settings = load_settings()
1164
+ default_model = settings['llm']['model']
1165
+ cli_args = gr.State({})
1166
+ stop_indexing = threading.Event()
1167
+ indexing_thread = None
1168
+
1169
+ def start_indexing(*args):
1170
+ global indexing_thread, stop_indexing
1171
+ stop_indexing = threading.Event() # Reset the stop_indexing event
1172
+ indexing_thread = threading.Thread(target=run_indexing, args=args)
1173
+ indexing_thread.start()
1174
+ return gr.update(interactive=False), gr.update(interactive=True), gr.update(interactive=False)
1175
+
1176
+ def stop_indexing_process():
1177
+ global indexing_thread
1178
+ logging.info("Stop indexing requested")
1179
+ stop_indexing.set()
1180
+ if indexing_thread and indexing_thread.is_alive():
1181
+ logging.info("Waiting for indexing thread to finish")
1182
+ indexing_thread.join(timeout=10)
1183
+ logging.info("Indexing thread finished" if not indexing_thread.is_alive() else "Indexing thread did not finish within timeout")
1184
+ indexing_thread = None # Reset the thread
1185
+ return gr.update(interactive=True), gr.update(interactive=False), gr.update(interactive=True)
1186
+
1187
+ def refresh_indexing():
1188
+ global indexing_thread, stop_indexing
1189
+ if indexing_thread and indexing_thread.is_alive():
1190
+ logging.info("Cannot refresh: Indexing is still running")
1191
+ return gr.update(interactive=False), gr.update(interactive=True), gr.update(interactive=False), "Cannot refresh: Indexing is still running"
1192
+ else:
1193
+ stop_indexing = threading.Event() # Reset the stop_indexing event
1194
+ indexing_thread = None # Reset the thread
1195
+ return gr.update(interactive=True), gr.update(interactive=False), gr.update(interactive=True), "Indexing process refreshed. You can start indexing again."
1196
+
1197
+
1198
+
1199
+ def run_indexing(root_dir, config_file, verbose, nocache, resume, reporter, emit_formats, custom_args):
1200
+ cmd = ["python", "-m", "graphrag.index", "--root", "./indexing"]
1201
+
1202
+ # Add custom CLI arguments
1203
+ if custom_args:
1204
+ cmd.extend(custom_args.split())
1205
+
1206
+ logging.info(f"Executing command: {' '.join(cmd)}")
1207
+
1208
+ process = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True, bufsize=1, encoding='utf-8', universal_newlines=True)
1209
+
1210
+
1211
+ output = []
1212
+ progress_value = 0
1213
+ iterations_completed = 0
1214
+
1215
+ while True:
1216
+ if stop_indexing.is_set():
1217
+ process.terminate()
1218
+ process.wait(timeout=5)
1219
+ if process.poll() is None:
1220
+ process.kill()
1221
+ return ("\n".join(output + ["Indexing stopped by user."]),
1222
+ "Indexing stopped.",
1223
+ 100,
1224
+ gr.update(interactive=True),
1225
+ gr.update(interactive=False),
1226
+ gr.update(interactive=True),
1227
+ str(iterations_completed))
1228
+
1229
+ try:
1230
+ line = process.stdout.readline()
1231
+ if not line and process.poll() is not None:
1232
+ break
1233
+
1234
+ if line:
1235
+ line = line.strip()
1236
+ output.append(line)
1237
+
1238
+ if "Processing file" in line:
1239
+ progress_value += 1
1240
+ iterations_completed += 1
1241
+ elif "Indexing completed" in line:
1242
+ progress_value = 100
1243
+ elif "ERROR" in line:
1244
+ line = f"🚨 ERROR: {line}"
1245
+
1246
+ yield ("\n".join(output),
1247
+ line,
1248
+ progress_value,
1249
+ gr.update(interactive=False),
1250
+ gr.update(interactive=True),
1251
+ gr.update(interactive=False),
1252
+ str(iterations_completed))
1253
+ except Exception as e:
1254
+ logging.error(f"Error during indexing: {str(e)}")
1255
+ return ("\n".join(output + [f"Error: {str(e)}"]),
1256
+ "Error occurred during indexing.",
1257
+ 100,
1258
+ gr.update(interactive=True),
1259
+ gr.update(interactive=False),
1260
+ gr.update(interactive=True),
1261
+ str(iterations_completed))
1262
+
1263
+ if process.returncode != 0 and not stop_indexing.is_set():
1264
+ final_output = "\n".join(output + [f"Error: Process exited with return code {process.returncode}"])
1265
+ final_progress = "Indexing failed. Check output for details."
1266
+ else:
1267
+ final_output = "\n".join(output)
1268
+ final_progress = "Indexing completed successfully!"
1269
+
1270
+ return (final_output,
1271
+ final_progress,
1272
+ 100,
1273
+ gr.update(interactive=True),
1274
+ gr.update(interactive=False),
1275
+ gr.update(interactive=True),
1276
+ str(iterations_completed))
1277
+
1278
+ global_vector_store_wrapper = None
1279
+
1280
+ def create_gradio_interface():
1281
+ global global_vector_store_wrapper
1282
+ llm_models, embeddings_models, llm_service_type, embeddings_service_type, llm_api_base, embeddings_api_base, text_embedder = initialize_models()
1283
+ settings = load_settings()
1284
+
1285
+
1286
+ log_output = gr.TextArea(label="Logs", elem_id="log-output", interactive=False, visible=False)
1287
+
1288
+ with gr.Blocks(css=custom_css, theme=gr.themes.Base()) as demo:
1289
+ gr.Markdown("# GraphRAG Local UI", elem_id="title")
1290
+
1291
+ with gr.Row(elem_id="main-container"):
1292
+ with gr.Column(scale=1, elem_id="left-column"):
1293
+ with gr.Tabs():
1294
+ with gr.TabItem("Data Management"):
1295
+ with gr.Accordion("File Upload (.txt)", open=True):
1296
+ file_upload = gr.File(label="Upload .txt File", file_types=[".txt"])
1297
+ upload_btn = gr.Button("Upload File", variant="primary")
1298
+ upload_output = gr.Textbox(label="Upload Status", visible=False)
1299
+
1300
+ with gr.Accordion("File Management", open=True):
1301
+ file_list = gr.Dropdown(label="Select File", choices=[], interactive=True)
1302
+ refresh_btn = gr.Button("Refresh File List", variant="secondary")
1303
+
1304
+ file_content = gr.TextArea(label="File Content", lines=10)
1305
+
1306
+ with gr.Row():
1307
+ delete_btn = gr.Button("Delete Selected File", variant="stop")
1308
+ save_btn = gr.Button("Save Changes", variant="primary")
1309
+
1310
+ operation_status = gr.Textbox(label="Operation Status", visible=False)
1311
+
1312
+
1313
+
1314
+ with gr.TabItem("Indexing"):
1315
+ root_dir = gr.Textbox(label="Root Directory", value="./")
1316
+ config_file = gr.File(label="Config File (optional)")
1317
+ with gr.Row():
1318
+ verbose = gr.Checkbox(label="Verbose", value=True)
1319
+ nocache = gr.Checkbox(label="No Cache", value=True)
1320
+ with gr.Row():
1321
+ resume = gr.Textbox(label="Resume Timestamp (optional)")
1322
+ reporter = gr.Dropdown(label="Reporter", choices=["rich", "print", "none"], value=None)
1323
+ with gr.Row():
1324
+ emit_formats = gr.CheckboxGroup(label="Emit Formats", choices=["json", "csv", "parquet"], value=None)
1325
+ with gr.Row():
1326
+ run_index_button = gr.Button("Run Indexing")
1327
+ stop_index_button = gr.Button("Stop Indexing", variant="stop")
1328
+ refresh_index_button = gr.Button("Refresh Indexing", variant="secondary")
1329
+
1330
+ with gr.Accordion("Custom CLI Arguments", open=True):
1331
+ custom_cli_args = gr.Textbox(
1332
+ label="Custom CLI Arguments",
1333
+ placeholder="--arg1 value1 --arg2 value2",
1334
+ lines=3
1335
+ )
1336
+ cli_guide = gr.Markdown(
1337
+ textwrap.dedent("""
1338
+ ### CLI Argument Key Guide:
1339
+ - `--root <path>`: Set the root directory for the project
1340
+ - `--config <path>`: Specify a custom configuration file
1341
+ - `--verbose`: Enable verbose output
1342
+ - `--nocache`: Disable caching
1343
+ - `--resume <timestamp>`: Resume from a specific timestamp
1344
+ - `--reporter <type>`: Set the reporter type (rich, print, none)
1345
+ - `--emit <formats>`: Specify output formats (json, csv, parquet)
1346
+
1347
+ Example: `--verbose --nocache --emit json,csv`
1348
+ """)
1349
+ )
1350
+
1351
+ index_output = gr.Textbox(label="Indexing Output", lines=20, max_lines=30)
1352
+ index_progress = gr.Textbox(label="Indexing Progress", lines=3)
1353
+ iterations_completed = gr.Textbox(label="Iterations Completed", value="0")
1354
+ refresh_status = gr.Textbox(label="Refresh Status", visible=True)
1355
+
1356
+ run_index_button.click(
1357
+ fn=start_indexing,
1358
+ inputs=[root_dir, config_file, verbose, nocache, resume, reporter, emit_formats, custom_cli_args],
1359
+ outputs=[run_index_button, stop_index_button, refresh_index_button]
1360
+ ).then(
1361
+ fn=run_indexing,
1362
+ inputs=[root_dir, config_file, verbose, nocache, resume, reporter, emit_formats, custom_cli_args],
1363
+ outputs=[index_output, index_progress, run_index_button, stop_index_button, refresh_index_button, iterations_completed]
1364
+ )
1365
+
1366
+ stop_index_button.click(
1367
+ fn=stop_indexing_process,
1368
+ outputs=[run_index_button, stop_index_button, refresh_index_button]
1369
+ )
1370
+
1371
+ refresh_index_button.click(
1372
+ fn=refresh_indexing,
1373
+ outputs=[run_index_button, stop_index_button, refresh_index_button, refresh_status]
1374
+ )
1375
+
1376
+ with gr.TabItem("Indexing Outputs/Visuals"):
1377
+ output_folder_list = gr.Dropdown(label="Select Output Folder (Select GraphML File to Visualize)", choices=list_output_folders("./indexing"), interactive=True)
1378
+ refresh_folder_btn = gr.Button("Refresh Folder List", variant="secondary")
1379
+ initialize_folder_btn = gr.Button("Initialize Selected Folder", variant="primary")
1380
+ folder_content_list = gr.Dropdown(label="Select File or Directory", choices=[], interactive=True)
1381
+ file_info = gr.Textbox(label="File Information", interactive=False)
1382
+ output_content = gr.TextArea(label="File Content", lines=20, interactive=False)
1383
+ initialization_status = gr.Textbox(label="Initialization Status")
1384
+
1385
+ with gr.TabItem("LLM Settings"):
1386
+ llm_base_url = gr.Textbox(label="LLM API Base URL", value=os.getenv("LLM_API_BASE"))
1387
+ llm_api_key = gr.Textbox(label="LLM API Key", value=os.getenv("LLM_API_KEY"), type="password")
1388
+ llm_service_type = gr.Radio(
1389
+ label="LLM Service Type",
1390
+ choices=["openai", "ollama"],
1391
+ value="openai",
1392
+ visible=False # Hide this if you want to always use OpenAI
1393
+ )
1394
+
1395
+ llm_model_dropdown = gr.Dropdown(
1396
+ label="LLM Model",
1397
+ choices=[], # Start with an empty list
1398
+ value=settings['llm'].get('model'),
1399
+ allow_custom_value=True
1400
+ )
1401
+ refresh_llm_models_btn = gr.Button("Refresh LLM Models", variant="secondary")
1402
+
1403
+ embeddings_base_url = gr.Textbox(label="Embeddings API Base URL", value=os.getenv("EMBEDDINGS_API_BASE"))
1404
+ embeddings_api_key = gr.Textbox(label="Embeddings API Key", value=os.getenv("EMBEDDINGS_API_KEY"), type="password")
1405
+ embeddings_service_type = gr.Radio(
1406
+ label="Embeddings Service Type",
1407
+ choices=["openai", "ollama"],
1408
+ value=settings.get('embeddings', {}).get('llm', {}).get('type', 'openai'),
1409
+ visible=False,
1410
+ )
1411
+
1412
+ embeddings_model_dropdown = gr.Dropdown(
1413
+ label="Embeddings Model",
1414
+ choices=[],
1415
+ value=settings.get('embeddings', {}).get('llm', {}).get('model'),
1416
+ allow_custom_value=True
1417
+ )
1418
+ refresh_embeddings_models_btn = gr.Button("Refresh Embedding Models", variant="secondary")
1419
+ system_message = gr.Textbox(
1420
+ lines=5,
1421
+ label="System Message",
1422
+ value=os.getenv("SYSTEM_MESSAGE", "You are a helpful AI assistant.")
1423
+ )
1424
+ context_window = gr.Slider(
1425
+ label="Context Window",
1426
+ minimum=512,
1427
+ maximum=32768,
1428
+ step=512,
1429
+ value=int(os.getenv("CONTEXT_WINDOW", 4096))
1430
+ )
1431
+ temperature = gr.Slider(
1432
+ label="Temperature",
1433
+ minimum=0.0,
1434
+ maximum=2.0,
1435
+ step=0.1,
1436
+ value=float(settings['llm'].get('TEMPERATURE', 0.5))
1437
+ )
1438
+ max_tokens = gr.Slider(
1439
+ label="Max Tokens",
1440
+ minimum=1,
1441
+ maximum=8192,
1442
+ step=1,
1443
+ value=int(settings['llm'].get('MAX_TOKENS', 1024))
1444
+ )
1445
+ update_settings_btn = gr.Button("Update LLM Settings", variant="primary")
1446
+ llm_settings_status = gr.Textbox(label="Status", interactive=False)
1447
+
1448
+ llm_base_url.change(
1449
+ fn=update_model_choices,
1450
+ inputs=[llm_base_url, llm_api_key, llm_service_type, gr.Textbox(value='llm', visible=False)],
1451
+ outputs=llm_model_dropdown
1452
+ )
1453
+ # Update Embeddings model choices when service type or base URL changes
1454
+ embeddings_service_type.change(
1455
+ fn=update_embeddings_model_choices,
1456
+ inputs=[embeddings_base_url, embeddings_api_key, embeddings_service_type],
1457
+ outputs=embeddings_model_dropdown
1458
+ )
1459
+
1460
+ embeddings_base_url.change(
1461
+ fn=update_model_choices,
1462
+ inputs=[embeddings_base_url, embeddings_api_key, embeddings_service_type, gr.Textbox(value='embeddings', visible=False)],
1463
+ outputs=embeddings_model_dropdown
1464
+ )
1465
+
1466
+ update_settings_btn.click(
1467
+ fn=update_llm_settings,
1468
+ inputs=[
1469
+ llm_model_dropdown,
1470
+ embeddings_model_dropdown,
1471
+ context_window,
1472
+ system_message,
1473
+ temperature,
1474
+ max_tokens,
1475
+ llm_base_url,
1476
+ llm_api_key,
1477
+ embeddings_base_url,
1478
+ embeddings_api_key,
1479
+ embeddings_service_type
1480
+ ],
1481
+ outputs=[llm_settings_status]
1482
+ )
1483
+
1484
+
1485
+ refresh_llm_models_btn.click(
1486
+ fn=update_model_choices,
1487
+ inputs=[llm_base_url, llm_api_key, llm_service_type, gr.Textbox(value='llm', visible=False)],
1488
+ outputs=[llm_model_dropdown]
1489
+ ).then(
1490
+ fn=update_logs,
1491
+ outputs=[log_output]
1492
+ )
1493
+
1494
+ refresh_embeddings_models_btn.click(
1495
+ fn=update_model_choices,
1496
+ inputs=[embeddings_base_url, embeddings_api_key, embeddings_service_type, gr.Textbox(value='embeddings', visible=False)],
1497
+ outputs=[embeddings_model_dropdown]
1498
+ ).then(
1499
+ fn=update_logs,
1500
+ outputs=[log_output]
1501
+ )
1502
+
1503
+ with gr.TabItem("YAML Settings"):
1504
+ settings = load_settings()
1505
+ with gr.Group():
1506
+ for key, value in settings.items():
1507
+ if key != 'llm':
1508
+ create_setting_component(key, value)
1509
+
1510
+ with gr.Group(elem_id="log-container"):
1511
+ gr.Markdown("### Logs")
1512
+ log_output = gr.TextArea(label="Logs", elem_id="log-output", interactive=False)
1513
+
1514
+ with gr.Column(scale=2, elem_id="right-column"):
1515
+ with gr.Group(elem_id="chat-container"):
1516
+ chatbot = gr.Chatbot(label="Chat History", elem_id="chatbot")
1517
+ with gr.Row(elem_id="chat-input-row"):
1518
+ with gr.Column(scale=1):
1519
+ query_input = gr.Textbox(
1520
+ label="Input",
1521
+ placeholder="Enter your query here...",
1522
+ elem_id="query-input"
1523
+ )
1524
+ query_btn = gr.Button("Send Query", variant="primary")
1525
+
1526
+ with gr.Accordion("Query Parameters", open=True):
1527
+ query_type = gr.Radio(
1528
+ ["global", "local", "direct"],
1529
+ label="Query Type",
1530
+ value="global",
1531
+ info="Global: community-based search, Local: entity-based search, Direct: LLM chat"
1532
+ )
1533
+ preset_dropdown = gr.Dropdown(
1534
+ label="Preset Query Options",
1535
+ choices=[
1536
+ "Default Global Search",
1537
+ "Default Local Search",
1538
+ "Detailed Global Analysis",
1539
+ "Detailed Local Analysis",
1540
+ "Quick Global Summary",
1541
+ "Quick Local Summary",
1542
+ "Global Bullet Points",
1543
+ "Local Bullet Points",
1544
+ "Comprehensive Global Report",
1545
+ "Comprehensive Local Report",
1546
+ "High-Level Global Overview",
1547
+ "High-Level Local Overview",
1548
+ "Focused Global Insight",
1549
+ "Focused Local Insight",
1550
+ "Custom Query"
1551
+ ],
1552
+ value="Default Global Search",
1553
+ info="Select a preset or choose 'Custom Query' for manual configuration"
1554
+ )
1555
+ selected_folder = gr.Dropdown(
1556
+ label="Select Index Folder to Chat With",
1557
+ choices=list_output_folders("./indexing"),
1558
+ value=None,
1559
+ interactive=True
1560
+ )
1561
+ refresh_folder_btn = gr.Button("Refresh Folders", variant="secondary")
1562
+ clear_chat_btn = gr.Button("Clear Chat", variant="secondary")
1563
+
1564
+ with gr.Group(visible=False) as custom_options:
1565
+ community_level = gr.Slider(
1566
+ label="Community Level",
1567
+ minimum=1,
1568
+ maximum=10,
1569
+ value=2,
1570
+ step=1,
1571
+ info="Higher values use reports on smaller communities"
1572
+ )
1573
+ response_type = gr.Dropdown(
1574
+ label="Response Type",
1575
+ choices=[
1576
+ "Multiple Paragraphs",
1577
+ "Single Paragraph",
1578
+ "Single Sentence",
1579
+ "List of 3-7 Points",
1580
+ "Single Page",
1581
+ "Multi-Page Report"
1582
+ ],
1583
+ value="Multiple Paragraphs",
1584
+ info="Specify the desired format of the response"
1585
+ )
1586
+ custom_cli_args = gr.Textbox(
1587
+ label="Custom CLI Arguments",
1588
+ placeholder="--arg1 value1 --arg2 value2",
1589
+ info="Additional CLI arguments for advanced users"
1590
+ )
1591
+
1592
+ def update_custom_options(preset):
1593
+ if preset == "Custom Query":
1594
+ return gr.update(visible=True)
1595
+ else:
1596
+ return gr.update(visible=False)
1597
+
1598
+ preset_dropdown.change(fn=update_custom_options, inputs=[preset_dropdown], outputs=[custom_options])
1599
+
1600
+
1601
+
1602
+
1603
+ with gr.Group(elem_id="visualization-container"):
1604
+ vis_output = gr.Plot(label="Graph Visualization", elem_id="visualization-plot")
1605
+ with gr.Row(elem_id="vis-controls-row"):
1606
+ vis_btn = gr.Button("Visualize Graph", variant="secondary")
1607
+
1608
+ # Add new controls for customization
1609
+ with gr.Accordion("Visualization Settings", open=False):
1610
+ layout_type = gr.Dropdown(["3D Spring", "2D Spring", "Circular"], label="Layout Type", value="3D Spring")
1611
+ node_size = gr.Slider(1, 20, 7, label="Node Size", step=1)
1612
+ edge_width = gr.Slider(0.1, 5, 0.5, label="Edge Width", step=0.1)
1613
+ node_color_attribute = gr.Dropdown(["Degree", "Random"], label="Node Color Attribute", value="Degree")
1614
+ color_scheme = gr.Dropdown(["Viridis", "Plasma", "Inferno", "Magma", "Cividis"], label="Color Scheme", value="Viridis")
1615
+ show_labels = gr.Checkbox(label="Show Node Labels", value=True)
1616
+ label_size = gr.Slider(5, 20, 10, label="Label Size", step=1)
1617
+
1618
+
1619
+ # Event handlers
1620
+ upload_btn.click(fn=upload_file, inputs=[file_upload], outputs=[upload_output, file_list, log_output])
1621
+ refresh_btn.click(fn=update_file_list, outputs=[file_list]).then(
1622
+ fn=update_logs,
1623
+ outputs=[log_output]
1624
+ )
1625
+ file_list.change(fn=update_file_content, inputs=[file_list], outputs=[file_content]).then(
1626
+ fn=update_logs,
1627
+ outputs=[log_output]
1628
+ )
1629
+ delete_btn.click(fn=delete_file, inputs=[file_list], outputs=[operation_status, file_list, log_output])
1630
+ save_btn.click(fn=save_file_content, inputs=[file_list, file_content], outputs=[operation_status, log_output])
1631
+
1632
+ refresh_folder_btn.click(
1633
+ fn=lambda: gr.update(choices=list_output_folders("./indexing")),
1634
+ outputs=[selected_folder]
1635
+ )
1636
+
1637
+ clear_chat_btn.click(
1638
+ fn=lambda: ([], ""),
1639
+ outputs=[chatbot, query_input]
1640
+ )
1641
+
1642
+ refresh_folder_btn.click(
1643
+ fn=update_output_folder_list,
1644
+ outputs=[output_folder_list]
1645
+ ).then(
1646
+ fn=update_logs,
1647
+ outputs=[log_output]
1648
+ )
1649
+
1650
+ output_folder_list.change(
1651
+ fn=update_folder_content_list,
1652
+ inputs=[output_folder_list],
1653
+ outputs=[folder_content_list]
1654
+ ).then(
1655
+ fn=update_logs,
1656
+ outputs=[log_output]
1657
+ )
1658
+
1659
+ folder_content_list.change(
1660
+ fn=handle_content_selection,
1661
+ inputs=[output_folder_list, folder_content_list],
1662
+ outputs=[folder_content_list, file_info, output_content]
1663
+ ).then(
1664
+ fn=update_logs,
1665
+ outputs=[log_output]
1666
+ )
1667
+
1668
+ initialize_folder_btn.click(
1669
+ fn=initialize_selected_folder,
1670
+ inputs=[output_folder_list],
1671
+ outputs=[initialization_status, folder_content_list]
1672
+ ).then(
1673
+ fn=update_logs,
1674
+ outputs=[log_output]
1675
+ )
1676
+
1677
+ vis_btn.click(
1678
+ fn=update_visualization,
1679
+ inputs=[
1680
+ output_folder_list,
1681
+ folder_content_list,
1682
+ layout_type,
1683
+ node_size,
1684
+ edge_width,
1685
+ node_color_attribute,
1686
+ color_scheme,
1687
+ show_labels,
1688
+ label_size
1689
+ ],
1690
+ outputs=[vis_output, gr.Textbox(label="Visualization Status")]
1691
+ )
1692
+
1693
+ query_btn.click(
1694
+ fn=send_message,
1695
+ inputs=[
1696
+ query_type,
1697
+ query_input,
1698
+ chatbot,
1699
+ system_message,
1700
+ temperature,
1701
+ max_tokens,
1702
+ preset_dropdown,
1703
+ community_level,
1704
+ response_type,
1705
+ custom_cli_args,
1706
+ selected_folder
1707
+ ],
1708
+ outputs=[chatbot, query_input, log_output]
1709
+ )
1710
+
1711
+ query_input.submit(
1712
+ fn=send_message,
1713
+ inputs=[
1714
+ query_type,
1715
+ query_input,
1716
+ chatbot,
1717
+ system_message,
1718
+ temperature,
1719
+ max_tokens,
1720
+ preset_dropdown,
1721
+ community_level,
1722
+ response_type,
1723
+ custom_cli_args,
1724
+ selected_folder
1725
+ ],
1726
+ outputs=[chatbot, query_input, log_output]
1727
+ )
1728
+ refresh_llm_models_btn.click(
1729
+ fn=update_model_choices,
1730
+ inputs=[llm_base_url, llm_api_key, llm_service_type, gr.Textbox(value='llm', visible=False)],
1731
+ outputs=[llm_model_dropdown]
1732
+ )
1733
+
1734
+ # Update Embeddings model choices
1735
+ refresh_embeddings_models_btn.click(
1736
+ fn=update_model_choices,
1737
+ inputs=[embeddings_base_url, embeddings_api_key, embeddings_service_type, gr.Textbox(value='embeddings', visible=False)],
1738
+ outputs=[embeddings_model_dropdown]
1739
+ )
1740
+
1741
+ # Add this JavaScript to enable Shift+Enter functionality
1742
+ demo.load(js="""
1743
+ function addShiftEnterListener() {
1744
+ const queryInput = document.getElementById('query-input');
1745
+ if (queryInput) {
1746
+ queryInput.addEventListener('keydown', function(event) {
1747
+ if (event.key === 'Enter' && event.shiftKey) {
1748
+ event.preventDefault();
1749
+ const submitButton = queryInput.closest('.gradio-container').querySelector('button.primary');
1750
+ if (submitButton) {
1751
+ submitButton.click();
1752
+ }
1753
+ }
1754
+ });
1755
+ }
1756
+ }
1757
+ document.addEventListener('DOMContentLoaded', addShiftEnterListener);
1758
+ """)
1759
+
1760
+ return demo.queue()
1761
+
1762
+ async def main():
1763
+ api_port = 8088
1764
+ gradio_port = 7860
1765
+
1766
+
1767
+ print(f"Starting API server on port {api_port}")
1768
+ start_api_server(api_port)
1769
+
1770
+ # Wait for the API server to start in a separate thread
1771
+ threading.Thread(target=wait_for_api_server, args=(api_port,)).start()
1772
+
1773
+ # Create the Gradio app
1774
+ demo = create_gradio_interface()
1775
+
1776
+ print(f"Starting Gradio app on port {gradio_port}")
1777
+ # Launch the Gradio app
1778
+ demo.launch(server_port=gradio_port, share=True)
1779
+
1780
+
1781
+ demo = create_gradio_interface()
1782
+ app = demo.app
1783
+
1784
+ if __name__ == "__main__":
1785
+ initialize_data()
1786
+ demo.launch(server_port=7860, share=True)
css ADDED
@@ -0,0 +1,242 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ html, body {
2
+ margin: 0;
3
+ padding: 0;
4
+ height: 100vh;
5
+ overflow: hidden;
6
+ }
7
+
8
+ .gradio-container {
9
+ margin: 0 !important;
10
+ padding: 0 !important;
11
+ width: 100vw !important;
12
+ max-width: 100vw !important;
13
+ height: 100vh !important;
14
+ max-height: 100vh !important;
15
+ overflow: auto;
16
+ display: flex;
17
+ flex-direction: column;
18
+ }
19
+
20
+ #main-container {
21
+ flex: 1;
22
+ display: flex;
23
+ overflow: hidden;
24
+ }
25
+
26
+ #left-column, #right-column {
27
+ height: 100%;
28
+ overflow-y: auto;
29
+ padding: 10px;
30
+ }
31
+
32
+ #left-column {
33
+ flex: 1;
34
+ }
35
+
36
+ #right-column {
37
+ flex: 2;
38
+ display: flex;
39
+ flex-direction: column;
40
+ }
41
+
42
+ #chat-container {
43
+ flex: 0 0 auto; /* Don't allow this to grow */
44
+ height: 100%;
45
+ display: flex;
46
+ flex-direction: column;
47
+ overflow: hidden;
48
+ border: 1px solid var(--color-accent);
49
+ border-radius: 8px;
50
+ padding: 10px;
51
+ overflow-y: auto;
52
+ }
53
+
54
+ #chatbot {
55
+ overflow-y: hidden;
56
+ height: 100%;
57
+ }
58
+
59
+ #chat-input-row {
60
+ margin-top: 10px;
61
+ }
62
+
63
+ #visualization-plot {
64
+ width: 100%;
65
+ aspect-ratio: 1 / 1;
66
+ max-height: 600px; /* Adjust this value as needed */
67
+ }
68
+
69
+ #vis-controls-row {
70
+ display: flex;
71
+ justify-content: space-between;
72
+ align-items: center;
73
+ margin-top: 10px;
74
+ }
75
+
76
+ #vis-controls-row > * {
77
+ flex: 1;
78
+ margin: 0 5px;
79
+ }
80
+
81
+ #vis-status {
82
+ margin-top: 10px;
83
+ }
84
+
85
+ /* Chat input styling */
86
+ #chat-input-row {
87
+ display: flex;
88
+ flex-direction: column;
89
+ }
90
+
91
+ #chat-input-row > div {
92
+ width: 100% !important;
93
+ }
94
+
95
+ #chat-input-row input[type="text"] {
96
+ width: 100% !important;
97
+ }
98
+
99
+ /* Adjust padding for all containers */
100
+ .gr-box, .gr-form, .gr-panel {
101
+ padding: 10px !important;
102
+ }
103
+
104
+ /* Ensure all textboxes and textareas have full height */
105
+ .gr-textbox, .gr-textarea {
106
+ height: auto !important;
107
+ min-height: 100px !important;
108
+ }
109
+
110
+ /* Ensure all dropdowns have full width */
111
+ .gr-dropdown {
112
+ width: 100% !important;
113
+ }
114
+
115
+ :root {
116
+ --color-background: #2C3639;
117
+ --color-foreground: #3F4E4F;
118
+ --color-accent: #A27B5C;
119
+ --color-text: #DCD7C9;
120
+ }
121
+
122
+ body, .gradio-container {
123
+ background-color: var(--color-background);
124
+ color: var(--color-text);
125
+ }
126
+
127
+ .gr-button {
128
+ background-color: var(--color-accent);
129
+ color: var(--color-text);
130
+ }
131
+
132
+ .gr-input, .gr-textarea, .gr-dropdown {
133
+ background-color: var(--color-foreground);
134
+ color: var(--color-text);
135
+ border: 1px solid var(--color-accent);
136
+ }
137
+
138
+ .gr-panel {
139
+ background-color: var(--color-foreground);
140
+ border: 1px solid var(--color-accent);
141
+ }
142
+
143
+ .gr-box {
144
+ border-radius: 8px;
145
+ margin-bottom: 10px;
146
+ background-color: var(--color-foreground);
147
+ }
148
+
149
+ .gr-padded {
150
+ padding: 10px;
151
+ }
152
+
153
+ .gr-form {
154
+ background-color: var(--color-foreground);
155
+ }
156
+
157
+ .gr-input-label, .gr-radio-label {
158
+ color: var(--color-text);
159
+ }
160
+
161
+ .gr-checkbox-label {
162
+ color: var(--color-text);
163
+ }
164
+
165
+ .gr-markdown {
166
+ color: var(--color-text);
167
+ }
168
+
169
+ .gr-accordion {
170
+ background-color: var(--color-foreground);
171
+ border: 1px solid var(--color-accent);
172
+ }
173
+
174
+ .gr-accordion-header {
175
+ background-color: var(--color-accent);
176
+ color: var(--color-text);
177
+ }
178
+
179
+ #visualization-container {
180
+ display: flex;
181
+ flex-direction: column;
182
+ border: 2px solid var(--color-accent);
183
+ border-radius: 8px;
184
+ margin-top: 20px;
185
+ padding: 10px;
186
+ background-color: var(--color-foreground);
187
+ height: calc(100vh - 300px); /* Adjust this value as needed */
188
+ }
189
+
190
+ #visualization-plot {
191
+ width: 100%;
192
+ height: 100%;
193
+ }
194
+
195
+ #vis-controls-row {
196
+ display: flex;
197
+ justify-content: space-between;
198
+ align-items: center;
199
+ margin-top: 10px;
200
+ }
201
+
202
+ #vis-controls-row > * {
203
+ flex: 1;
204
+ margin: 0 5px;
205
+ }
206
+
207
+ #vis-status {
208
+ margin-top: 10px;
209
+ }
210
+
211
+ #log-container {
212
+ background-color: var(--color-foreground);
213
+ border: 1px solid var(--color-accent);
214
+ border-radius: 8px;
215
+ padding: 10px;
216
+ margin-top: 20px;
217
+ max-height: auto;
218
+ overflow-y: auto;
219
+ }
220
+
221
+ .setting-accordion .label-wrap {
222
+ cursor: pointer;
223
+ }
224
+
225
+ .setting-accordion .icon {
226
+ transition: transform 0.3s ease;
227
+ }
228
+
229
+ .setting-accordion[open] .icon {
230
+ transform: rotate(90deg);
231
+ }
232
+
233
+ .gr-form.gr-box {
234
+ border: none !important;
235
+ background: none !important;
236
+ }
237
+
238
+ .model-params {
239
+ border-top: 1px solid var(--color-accent);
240
+ margin-top: 10px;
241
+ padding-top: 10px;
242
+ }
embedding_proxy.py ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ from fastapi import FastAPI, HTTPException
3
+ import uvicorn
4
+ import httpx
5
+ from pydantic import BaseModel
6
+ from typing import List, Union
7
+
8
+ app = FastAPI()
9
+
10
+ OLLAMA_URL = "http://localhost:11434" # Default Ollama URL
11
+
12
+ class EmbeddingRequest(BaseModel):
13
+ input: Union[str, List[str]]
14
+ model: str
15
+
16
+ class EmbeddingResponse(BaseModel):
17
+ object: str
18
+ data: List[dict]
19
+ model: str
20
+ usage: dict
21
+
22
+ @app.post("/v1/embeddings")
23
+ async def create_embedding(request: EmbeddingRequest):
24
+ async with httpx.AsyncClient() as client:
25
+ if isinstance(request.input, str):
26
+ request.input = [request.input]
27
+
28
+ ollama_requests = [{"model": request.model, "prompt": text} for text in request.input]
29
+
30
+ embeddings = []
31
+
32
+
33
+ for i, ollama_request in enumerate(ollama_requests):
34
+ response = await client.post(f"{OLLAMA_URL}/api/embeddings", json=ollama_request)
35
+ if response.status_code != 200:
36
+ raise HTTPException(status_code=response.status_code, detail="Ollama API error")
37
+
38
+ result = response.json()
39
+ embeddings.append({
40
+ "object": "embedding",
41
+ "embedding": result["embedding"],
42
+ "index": i
43
+ })
44
+
45
+
46
+ return EmbeddingResponse(
47
+ object="list",
48
+ data=embeddings,
49
+ model=request.model,
50
+
51
+ )
52
+
53
+ if __name__ == "__main__":
54
+ import argparse
55
+ parser = argparse.ArgumentParser(description="Run the embedding proxy server")
56
+ parser.add_argument("--port", type=int, default=11435, help="Port to run the server on")
57
+ parser.add_argument("--host", type=str, default="http://localhost:11434", help="URL of the Ollama server")
58
+ parser.add_argument("--reload", action="store_true", help="Enable auto-reload for development")
59
+ args = parser.parse_args()
60
+
61
+ OLLAMA_URL = args.host
62
+ uvicorn.run("embedding_proxy:app", host="0.0.0.0", port=args.port, reload=args.reload)
env-example.txt ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ LLM_PROVIDER=openai
2
+ LLM_API_BASE=http://localhost:11434/v1
3
+ LLM_MODEL='mistral-large:123b-instruct-2407-q4_0'
4
+ LLM_API_KEY=12345
5
+
6
+ EMBEDDINGS_PROVIDER=openai
7
+ EMBEDDINGS_API_BASE=http://localhost:11434
8
+ EMBEDDINGS_MODEL='snowflake-arctic-embed:335m'
9
+ EMBEDDINGS_API_KEY=12345
10
+
11
+
12
+ GRAPHRAG_API_KEY=12345
13
+ ROOT_DIR=indexing
14
+ INPUT_DIR=${ROOT_DIR}/output/${timestamp}/artifacts
15
+ LLM_SERVICE_TYPE=openai_chat
16
+ EMBEDDINGS_SERVICE_TYPE=openai_embedding
17
+
18
+ API_URL=http://localhost:8012
19
+ API_PORT=8012
graphrag/.github/ISSUE_TEMPLATE.md ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ### Description
2
+
3
+ <!-- A clear and concise description of the issue or feature request. -->
4
+
5
+ ### Environment
6
+
7
+ - GraphRAG version: <!-- Specify the GraphRAG version (e.g., v0.1.1) -->
8
+ - Python version: <!-- Specify the Python version (e.g., 3.8) -->
9
+ - Operating System: <!-- Specify the OS (e.g., Windows 10, Ubuntu 20.04) -->
10
+
11
+ ### Steps to Reproduce (for bugs)
12
+
13
+ <!-- Provide detailed steps to reproduce the issue. Include code snippets, configuration files, or any other relevant information. -->
14
+
15
+ 1. Step 1
16
+ 2. Step 2
17
+ 3. ...
18
+
19
+ ### Expected Behavior
20
+
21
+ <!-- Describe what you expected to happen. -->
22
+
23
+ ### Actual Behavior
24
+
25
+ <!-- Describe what actually happened. Include any error messages, stack traces, or unexpected behavior. -->
26
+
27
+ ### Screenshots / Logs (if applicable)
28
+
29
+ <!-- If relevant, include screenshots or logs that help illustrate the issue. -->
30
+
31
+ ### GraphRAG Configuration
32
+
33
+ <!-- Include the GraphRAG configuration used for this run. -->
34
+
35
+ ### Additional Information
36
+
37
+ <!-- Include any additional information that might be helpful, such as specific configurations, data samples, or context about the environment. -->
38
+
39
+ ### Possible Solution (if you have one)
40
+
41
+ <!-- If you have suggestions on how to address the issue, provide them here. -->
42
+
43
+ ### Is this a Bug or Feature Request?
44
+
45
+ <!-- Choose one: Bug | Feature Request -->
46
+
47
+ ### Any related issues?
48
+
49
+ <!-- If this is related to another issue, reference it here. -->
50
+
51
+ ### Any relevant discussions?
52
+
53
+ <!-- If there are any discussions or forum threads related to this issue, provide links. -->
54
+
55
+ ### Checklist
56
+
57
+ <!-- Please check the items that you have completed -->
58
+
59
+ - [ ] I have searched for similar issues and didn't find any duplicates.
60
+ - [ ] I have provided a clear and concise description of the issue.
61
+ - [ ] I have included the necessary environment details.
62
+ - [ ] I have outlined the steps to reproduce the issue.
63
+ - [ ] I have included any relevant logs or screenshots.
64
+ - [ ] I have included the GraphRAG configuration for this run.
65
+ - [ ] I have indicated whether this is a bug or a feature request.
66
+
67
+ ### Additional Comments
68
+
69
+ <!-- Any additional comments or context that you think would be helpful. -->
graphrag/.github/ISSUE_TEMPLATE/bug_report.yml ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Bug Report
2
+ description: File a bug report
3
+ title: "[Bug]: <title>"
4
+ labels: ["bug", "triage"]
5
+
6
+ body:
7
+ - type: textarea
8
+ id: description
9
+ attributes:
10
+ label: Describe the bug
11
+ description: A clear and concise description of what the bug is.
12
+ placeholder: What went wrong?
13
+ - type: textarea
14
+ id: reproduce
15
+ attributes:
16
+ label: Steps to reproduce
17
+ description: |
18
+ Steps to reproduce the behavior:
19
+
20
+ 1. Step 1
21
+ 2. Step 2
22
+ 3. ...
23
+ 4. See error
24
+ placeholder: How can we replicate the issue?
25
+ - type: textarea
26
+ id: expected_behavior
27
+ attributes:
28
+ label: Expected Behavior
29
+ description: A clear and concise description of what you expected to happen.
30
+ placeholder: What should have happened?
31
+ - type: textarea
32
+ id: configused
33
+ attributes:
34
+ label: GraphRAG Config Used
35
+ description: The GraphRAG configuration used for the run.
36
+ placeholder: The settings.yaml content or GraphRAG configuration
37
+ - type: textarea
38
+ id: screenshotslogs
39
+ attributes:
40
+ label: Logs and screenshots
41
+ description: If applicable, add screenshots and logs to help explain your problem.
42
+ placeholder: Add logs and screenshots here
43
+ - type: textarea
44
+ id: additional_information
45
+ attributes:
46
+ label: Additional Information
47
+ description: |
48
+ - GraphRAG Version: e.g., v0.1.1
49
+ - Operating System: e.g., Windows 10, Ubuntu 20.04
50
+ - Python Version: e.g., 3.8
51
+ - Related Issues: e.g., #1
52
+ - Any other relevant information.
53
+ value: |
54
+ - GraphRAG Version:
55
+ - Operating System:
56
+ - Python Version:
57
+ - Related Issues:
graphrag/.github/ISSUE_TEMPLATE/config.yml ADDED
@@ -0,0 +1 @@
 
 
1
+ blank_issues_enabled: true
graphrag/.github/ISSUE_TEMPLATE/feature_request.yml ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Feature Request
2
+ description: File a feature request
3
+ labels: ["enhancement"]
4
+ title: "[Feature Request]: <title>"
5
+
6
+ body:
7
+ - type: textarea
8
+ id: problem_description
9
+ attributes:
10
+ label: Is your feature request related to a problem? Please describe.
11
+ description: A clear and concise description of what the problem is.
12
+ placeholder: What problem are you trying to solve?
13
+
14
+ - type: textarea
15
+ id: solution_description
16
+ attributes:
17
+ label: Describe the solution you'd like
18
+ description: A clear and concise description of what you want to happen.
19
+ placeholder: How do you envision the solution?
20
+
21
+ - type: textarea
22
+ id: additional_context
23
+ attributes:
24
+ label: Additional context
25
+ description: Add any other context or screenshots about the feature request here.
26
+ placeholder: Any additional information
graphrag/.github/ISSUE_TEMPLATE/general_issue.yml ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: General Issue
2
+ description: File a general issue
3
+ title: "[Issue]: <title> "
4
+ labels: ["triage"]
5
+
6
+ body:
7
+ - type: textarea
8
+ id: description
9
+ attributes:
10
+ label: Describe the issue
11
+ description: A clear and concise description of what the issue is.
12
+ placeholder: What went wrong?
13
+ - type: textarea
14
+ id: reproduce
15
+ attributes:
16
+ label: Steps to reproduce
17
+ description: |
18
+ Steps to reproduce the behavior:
19
+
20
+ 1. Step 1
21
+ 2. Step 2
22
+ 3. ...
23
+ 4. See error
24
+ placeholder: How can we replicate the issue?
25
+ - type: textarea
26
+ id: configused
27
+ attributes:
28
+ label: GraphRAG Config Used
29
+ description: The GraphRAG configuration used for the run.
30
+ placeholder: The settings.yaml content or GraphRAG configuration
31
+ - type: textarea
32
+ id: screenshotslogs
33
+ attributes:
34
+ label: Logs and screenshots
35
+ description: If applicable, add screenshots and logs to help explain your problem.
36
+ placeholder: Add logs and screenshots here
37
+ - type: textarea
38
+ id: additional_information
39
+ attributes:
40
+ label: Additional Information
41
+ description: |
42
+ - GraphRAG Version: e.g., v0.1.1
43
+ - Operating System: e.g., Windows 10, Ubuntu 20.04
44
+ - Python Version: e.g., 3.8
45
+ - Related Issues: e.g., #1
46
+ - Any other relevant information.
47
+ value: |
48
+ - GraphRAG Version:
49
+ - Operating System:
50
+ - Python Version:
51
+ - Related Issues:
graphrag/.github/dependabot.yml ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # To get started with Dependabot version updates, you'll need to specify which
2
+ # package ecosystems to update and where the package manifests are located.
3
+ # Please see the documentation for all configuration options:
4
+ # https://docs.github.com/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file
5
+ version: 2
6
+ updates:
7
+ - package-ecosystem: "npm" # See documentation for possible values
8
+ directory: "docsite/" # Location of package manifests
9
+ schedule:
10
+ interval: "weekly"
11
+ - package-ecosystem: "pip" # See documentation for possible values
12
+ directory: "/" # Location of package manifests
13
+ schedule:
14
+ interval: "weekly"
15
+ - package-ecosystem: "github-actions"
16
+ # Workflow files stored in the default location of `.github/workflows`. (You don't need to specify `/.github/workflows` for `directory`. You can use `directory: "/"`.)
17
+ directory: "/"
18
+ schedule:
19
+ interval: "weekly"
graphrag/.github/pull_request_template.md ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--
2
+ Thanks for contributing to GraphRAG!
3
+
4
+ Please do not make *Draft* pull requests, as they still notify anyone watching the repo.
5
+
6
+ Create a pull request when it is ready for review and feedback.
7
+
8
+ About this template
9
+
10
+ The following template aims to help contributors write a good description for their pull requests.
11
+ We'd like you to provide a description of the changes in your pull request (i.e. bugs fixed or features added), the motivation behind the changes, and complete the checklist below before opening a pull request.
12
+
13
+ Feel free to discard it if you need to (e.g. when you just fix a typo). -->
14
+
15
+ ## Description
16
+
17
+ [Provide a brief description of the changes made in this pull request.]
18
+
19
+ ## Related Issues
20
+
21
+ [Reference any related issues or tasks that this pull request addresses.]
22
+
23
+ ## Proposed Changes
24
+
25
+ [List the specific changes made in this pull request.]
26
+
27
+ ## Checklist
28
+
29
+ - [ ] I have tested these changes locally.
30
+ - [ ] I have reviewed the code changes.
31
+ - [ ] I have updated the documentation (if necessary).
32
+ - [ ] I have added appropriate unit tests (if applicable).
33
+
34
+ ## Additional Notes
35
+
36
+ [Add any additional notes or context that may be helpful for the reviewer(s).]
graphrag/.github/workflows/gh-pages.yml ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: gh-pages
2
+ on:
3
+ push:
4
+ branches: [main]
5
+
6
+ permissions:
7
+ contents: write
8
+
9
+ env:
10
+ POETRY_VERSION: 1.8.3
11
+ PYTHON_VERSION: "3.11"
12
+ NODE_VERSION: 18.x
13
+
14
+ jobs:
15
+ build:
16
+ runs-on: ubuntu-latest
17
+ env:
18
+ GH_PAGES: 1
19
+ DEBUG: 1
20
+ GRAPHRAG_LLM_TYPE: "azure_openai_chat"
21
+ GRAPHRAG_EMBEDDING_TYPE: "azure_openai_embedding"
22
+ GRAPHRAG_API_KEY: ${{ secrets.OPENAI_API_KEY }}
23
+ GRAPHRAG_API_BASE: ${{ secrets.GRAPHRAG_API_BASE }}
24
+ GRAPHRAG_API_VERSION: ${{ secrets.GRAPHRAG_API_VERSION }}
25
+ GRAPHRAG_LLM_DEPLOYMENT_NAME: ${{ secrets.GRAPHRAG_LLM_DEPLOYMENT_NAME }}
26
+ GRAPHRAG_EMBEDDING_DEPLOYMENT_NAME: ${{ secrets.GRAPHRAG_EMBEDDING_DEPLOYMENT_NAME }}
27
+ GRAPHRAG_CACHE_TYPE: "blob"
28
+ GRAPHRAG_CACHE_CONNECTION_STRING: ${{ secrets.BLOB_STORAGE_CONNECTION_STRING }}
29
+ GRAPHRAG_CACHE_CONTAINER_NAME: "cicache"
30
+ GRAPHRAG_CACHE_BASE_DIR": "cache"
31
+ GRAPHRAG_LLM_MODEL: gpt-3.5-turbo-16k
32
+ GRAPHRAG_EMBEDDING_MODEL: text-embedding-ada-002
33
+ # We have Windows + Linux runners in 3.10 and 3.11, so we need to divide the rate limits by 4
34
+ GRAPHRAG_LLM_TPM: 45_000 # 180,000 / 4
35
+ GRAPHRAG_LLM_RPM: 270 # 1,080 / 4
36
+ GRAPHRAG_EMBEDDING_TPM: 87_500 # 350,000 / 4
37
+ GRAPHRAG_EMBEDDING_RPM: 525 # 2,100 / 4
38
+ GRAPHRAG_CHUNK_SIZE: 1200
39
+ GRAPHRAG_CHUNK_OVERLAP: 0
40
+ # Azure AI Search config
41
+ AZURE_AI_SEARCH_URL_ENDPOINT: ${{ secrets.AZURE_AI_SEARCH_URL_ENDPOINT }}
42
+ AZURE_AI_SEARCH_API_KEY: ${{ secrets.AZURE_AI_SEARCH_API_KEY }}
43
+
44
+ steps:
45
+ - uses: actions/checkout@v4
46
+ with:
47
+ persist-credentials: false
48
+
49
+ - name: Set up Python ${{ env.PYTHON_VERSION }}
50
+ uses: actions/setup-python@v5
51
+ with:
52
+ python-version: ${{ env.PYTHON_VERSION }}
53
+
54
+ - name: Install Poetry ${{ env.POETRY_VERSION }}
55
+ uses: abatilo/[email protected]
56
+ with:
57
+ poetry-version: ${{ env.POETRY_VERSION }}
58
+
59
+ - name: Use Node ${{ env.NODE_VERSION }}
60
+ uses: actions/setup-node@v4
61
+ with:
62
+ node-version: ${{ env.NODE_VERSION }}
63
+
64
+ - name: Install Yarn dependencies
65
+ run: yarn install
66
+ working-directory: docsite
67
+
68
+ - name: Install Poetry dependencies
69
+ run: poetry install
70
+
71
+ - name: Install Azurite
72
+ id: azuright
73
+ uses: potatoqualitee/[email protected]
74
+
75
+ - name: Generate Indexer Outputs
76
+ run: |
77
+ poetry run poe test_smoke
78
+ zip -jrm docsite/data/operation_dulce/dataset.zip tests/fixtures/min-csv/output/*/artifacts/*.parquet
79
+
80
+ - name: Build Jupyter Notebooks
81
+ run: poetry run poe convert_docsite_notebooks
82
+
83
+ - name: Build docsite
84
+ run: yarn build
85
+ working-directory: docsite
86
+ env:
87
+ DOCSITE_BASE_URL: "graphrag"
88
+
89
+ - name: List docsite files
90
+ run: find docsite/_site
91
+
92
+ - name: Deploy to GitHub Pages
93
+ uses: JamesIves/[email protected]
94
+ with:
95
+ branch: gh-pages
96
+ folder: docsite/_site
97
+ clean: true
graphrag/.github/workflows/javascript-ci.yml ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: JavaScript CI
2
+ on:
3
+ push:
4
+ branches: [main]
5
+ pull_request:
6
+ branches: [main]
7
+
8
+ env:
9
+ NODE_VERSION: 18.x
10
+
11
+ jobs:
12
+ javascript-ci:
13
+ runs-on: ubuntu-latest
14
+ strategy:
15
+ fail-fast: false
16
+ steps:
17
+ - name: Use Node ${{ env.NODE_VERSION }}
18
+ uses: actions/setup-node@v4
19
+ with:
20
+ node-version: ${{ env.NODE_VERSION }}
21
+
22
+ - uses: actions/checkout@v4
23
+
24
+ - run: yarn install
25
+ working-directory: docsite
26
+ name: Install Dependencies
27
+
28
+ - run: yarn build
29
+ working-directory: docsite
30
+ name: Build Docsite
graphrag/.github/workflows/python-ci.yml ADDED
@@ -0,0 +1,122 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Python CI
2
+ on:
3
+ push:
4
+ branches: [main]
5
+ pull_request:
6
+ branches: [main]
7
+
8
+ permissions:
9
+ contents: read
10
+ pull-requests: read
11
+
12
+ concurrency:
13
+ group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
14
+ # Only run the for the latest commit
15
+ cancel-in-progress: true
16
+
17
+ env:
18
+ POETRY_VERSION: 1.8.3
19
+
20
+ jobs:
21
+ python-ci:
22
+ strategy:
23
+ matrix:
24
+ python-version: ["3.10", "3.11", "3.12"]
25
+ os: [ubuntu-latest, windows-latest]
26
+ env:
27
+ DEBUG: 1
28
+ GRAPHRAG_LLM_TYPE: "azure_openai_chat"
29
+ GRAPHRAG_EMBEDDING_TYPE: "azure_openai_embedding"
30
+ GRAPHRAG_API_KEY: ${{ secrets.OPENAI_API_KEY }}
31
+ GRAPHRAG_API_BASE: ${{ secrets.GRAPHRAG_API_BASE }}
32
+ GRAPHRAG_API_VERSION: ${{ secrets.GRAPHRAG_API_VERSION }}
33
+ GRAPHRAG_LLM_DEPLOYMENT_NAME: ${{ secrets.GRAPHRAG_LLM_DEPLOYMENT_NAME }}
34
+ GRAPHRAG_EMBEDDING_DEPLOYMENT_NAME: ${{ secrets.GRAPHRAG_EMBEDDING_DEPLOYMENT_NAME }}
35
+ GRAPHRAG_CACHE_TYPE: "blob"
36
+ GRAPHRAG_CACHE_CONNECTION_STRING: ${{ secrets.BLOB_STORAGE_CONNECTION_STRING }}
37
+ GRAPHRAG_CACHE_CONTAINER_NAME: "cicache"
38
+ GRAPHRAG_CACHE_BASE_DIR": "cache"
39
+ GRAPHRAG_LLM_MODEL: gpt-3.5-turbo-16k
40
+ GRAPHRAG_EMBEDDING_MODEL: text-embedding-ada-002
41
+ # We have Windows + Linux runners in 3.10 and 3.11, so we need to divide the rate limits by 4
42
+ GRAPHRAG_LLM_TPM: 45_000 # 180,000 / 4
43
+ GRAPHRAG_LLM_RPM: 270 # 1,080 / 4
44
+ GRAPHRAG_EMBEDDING_TPM: 87_500 # 350,000 / 4
45
+ GRAPHRAG_EMBEDDING_RPM: 525 # 2,100 / 4
46
+ GRAPHRAG_CHUNK_SIZE: 1200
47
+ GRAPHRAG_CHUNK_OVERLAP: 0
48
+ # Azure AI Search config
49
+ AZURE_AI_SEARCH_URL_ENDPOINT: ${{ secrets.AZURE_AI_SEARCH_URL_ENDPOINT }}
50
+ AZURE_AI_SEARCH_API_KEY: ${{ secrets.AZURE_AI_SEARCH_API_KEY }}
51
+
52
+ runs-on: ${{ matrix.os }}
53
+ steps:
54
+ - uses: actions/checkout@v4
55
+
56
+ - uses: dorny/paths-filter@v3
57
+ id: changes
58
+ with:
59
+ filters: |
60
+ python:
61
+ - 'graphrag/**/*'
62
+ - 'poetry.lock'
63
+ - 'pyproject.toml'
64
+ - '**/*.py'
65
+ - '**/*.toml'
66
+ - '**/*.ipynb'
67
+ - '.github/workflows/python*.yml'
68
+ - 'tests/smoke/*'
69
+
70
+ - name: Set up Python ${{ matrix.python-version }}
71
+ uses: actions/setup-python@v5
72
+ with:
73
+ python-version: ${{ matrix.python-version }}
74
+
75
+ - name: Install Poetry
76
+ uses: abatilo/[email protected]
77
+ with:
78
+ poetry-version: $POETRY_VERSION
79
+
80
+ - name: Install dependencies
81
+ shell: bash
82
+ run: poetry self add setuptools && poetry run python -m pip install gensim && poetry install
83
+
84
+ - name: Check Semversioner
85
+ run: |
86
+ poetry run semversioner check
87
+
88
+ - name: Check
89
+ run: |
90
+ poetry run poe check
91
+
92
+ - name: Build
93
+ run: |
94
+ poetry build
95
+
96
+ - name: Install Azurite
97
+ id: azuright
98
+ uses: potatoqualitee/[email protected]
99
+
100
+ - name: Unit Test
101
+ run: |
102
+ poetry run poe test_unit
103
+
104
+ - name: Integration Test
105
+ run: |
106
+ poetry run poe test_integration
107
+
108
+ - name: Smoke Test
109
+ if: steps.changes.outputs.python == 'true'
110
+ run: |
111
+ poetry run poe test_smoke
112
+
113
+ - uses: actions/upload-artifact@v4
114
+ if: always()
115
+ with:
116
+ name: smoke-test-artifacts-${{ matrix.python-version }}-${{ matrix.poetry-version }}-${{ runner.os }}
117
+ path: tests/fixtures/*/output
118
+
119
+ - name: E2E Test
120
+ if: steps.changes.outputs.python == 'true'
121
+ run: |
122
+ ./scripts/e2e-test.sh
graphrag/.github/workflows/python-publish.yml ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Python Publish
2
+ on:
3
+ release:
4
+ types: [created]
5
+ push:
6
+ branches: [main]
7
+
8
+ env:
9
+ POETRY_VERSION: "1.8.3"
10
+ PYTHON_VERSION: "3.10"
11
+
12
+ jobs:
13
+ publish:
14
+ name: Upload release to PyPI
15
+ if: github.ref == 'refs/heads/main'
16
+ runs-on: ubuntu-latest
17
+ environment:
18
+ name: pypi
19
+ url: https://pypi.org/p/graphrag
20
+ permissions:
21
+ id-token: write # IMPORTANT: this permission is mandatory for trusted publishing
22
+
23
+ steps:
24
+ - uses: actions/checkout@v4
25
+ with:
26
+ fetch-depth: 0
27
+ fetch-tags: true
28
+
29
+ - name: Set up Python
30
+ uses: actions/setup-python@v5
31
+ with:
32
+ python-version: ${{ env.PYTHON_VERSION }}
33
+
34
+ - name: Install Poetry
35
+ uses: abatilo/[email protected]
36
+ with:
37
+ poetry-version: ${{ env.POETRY_VERSION }}
38
+
39
+ - name: Install dependencies
40
+ shell: bash
41
+ run: poetry install
42
+
43
+ - name: Build Distributable
44
+ shell: bash
45
+ run: poetry build
46
+
47
+ - name: Publish package distributions to PyPI
48
+ uses: pypa/gh-action-pypi-publish@release/v1
49
+ with:
50
+ packages-dir: dist
51
+ skip-existing: true
52
+ verbose: true
graphrag/.github/workflows/semver.yml ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Semver Check
2
+ on:
3
+ pull_request:
4
+ branches: [main]
5
+
6
+ jobs:
7
+ semver:
8
+ runs-on: ubuntu-latest
9
+ steps:
10
+ - uses: actions/checkout@v4
11
+ with:
12
+ fetch-depth: 0
13
+
14
+ - name: Check Semver
15
+ run: ./scripts/semver-check.sh
graphrag/.github/workflows/spellcheck.yml ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Spellcheck
2
+ on:
3
+ push:
4
+ branches: [main]
5
+ pull_request:
6
+ paths:
7
+ - '**/*'
8
+ jobs:
9
+ spellcheck:
10
+ runs-on: ubuntu-latest
11
+ steps:
12
+ - uses: actions/checkout@v4
13
+
14
+ - name: Spellcheck
15
+ run: ./scripts/spellcheck.sh
graphrag/.gitignore ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Node Artifacts
2
+ */node_modules/
3
+ docsite/*/src/**/*.js
4
+ docsite/*/lib/
5
+ docsite/*/storybook-static/
6
+ docsite/*/docsTemp/
7
+ docsite/*/build/
8
+ .swc/
9
+ dist/
10
+ .idea
11
+ # https://yarnpkg.com/advanced/qa#which-files-should-be-gitignored
12
+ docsite/.yarn/*
13
+ !docsite/.yarn/patches
14
+ !docsite/.yarn/releases
15
+ !docsite/.yarn/plugins
16
+ !docsite/.yarn/sdks
17
+ !docsite/.yarn/versions
18
+ docsite/.pnp.*
19
+
20
+ .yarn/*
21
+ !.yarn/patches
22
+ !.yarn/releases
23
+ !.yarn/plugins
24
+ !.yarn/sdks
25
+ !.yarn/versions
26
+ .pnp.*
27
+
28
+ # Python Artifacts
29
+ python/*/lib/
30
+ # Test Output
31
+ .coverage
32
+ coverage/
33
+ licenses.txt
34
+ examples_notebooks/*/lancedb
35
+ examples_notebooks/*/data
36
+ tests/fixtures/cache
37
+ tests/fixtures/*/cache
38
+ tests/fixtures/*/output
39
+ lancedb/
40
+
41
+ # Random
42
+ .DS_Store
43
+ *.log*
44
+ .venv
45
+ .conda
46
+ .tmp
47
+
48
+
49
+ .env
50
+ build.zip
51
+
52
+ .turbo
53
+
54
+ __pycache__
55
+
56
+ .pipeline
57
+
58
+ # Azurite
59
+ temp_azurite/
60
+ __azurite*.json
61
+ __blobstorage*.json
62
+ __blobstorage__/
63
+
64
+ # Getting started example
65
+ ragtest/
66
+ .ragtest/
67
+ .pipelines
68
+ .pipeline
graphrag/.semversioner/0.1.0.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "changes": [
3
+ {
4
+ "description": "Initial Release",
5
+ "type": "minor"
6
+ }
7
+ ],
8
+ "created_at": "2024-07-01T21:48:50+00:00",
9
+ "version": "0.1.0"
10
+ }
graphrag/.semversioner/next-release/minor-20240710183748086411.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "type": "minor",
3
+ "description": "Add dynamic community report rating to the prompt tuning engine"
4
+ }
graphrag/.semversioner/next-release/patch-20240701233152787373.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "type": "patch",
3
+ "description": "Fix docsite base url"
4
+ }
graphrag/.semversioner/next-release/patch-20240703152422358587.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "type": "patch",
3
+ "description": "Add cli flag to overlay default values onto a provided config."
4
+ }
graphrag/.semversioner/next-release/patch-20240703182750529114.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "type": "patch",
3
+ "description": "Fix broken prompt tuning link on docs"
4
+ }
graphrag/.semversioner/next-release/patch-20240704181236015699.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "type": "patch",
3
+ "description": "Fix for --limit exceeding the dataframe lenght"
4
+ }
graphrag/.semversioner/next-release/patch-20240705184142723331.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "type": "patch",
3
+ "description": "Add Minute-based Rate Limiting and fix rpm, tpm settings"
4
+ }
graphrag/.semversioner/next-release/patch-20240705235656897489.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "type": "patch",
3
+ "description": "Add N parameter support"
4
+ }
graphrag/.semversioner/next-release/patch-20240707063053679262.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "type": "patch",
3
+ "description": "fix community_report doesn't work in settings.yaml"
4
+ }
graphrag/.semversioner/next-release/patch-20240709225514193665.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "type": "patch",
3
+ "description": "Add language support to prompt tuning"
4
+ }
graphrag/.semversioner/next-release/patch-20240710114442871595.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "type": "patch",
3
+ "description": "Modify defaults for CHUNK_SIZE, CHUNK_OVERLAP and GLEANINGS to reduce time and LLM calls"
4
+ }
graphrag/.semversioner/next-release/patch-20240710165603516866.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "type": "patch",
3
+ "description": "Fixed an issue where base OpenAI embeddings can't work with Azure OpenAI LLM"
4
+ }
graphrag/.semversioner/next-release/patch-20240711004716103302.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "type": "patch",
3
+ "description": "Fix encoding model parameter on prompt tune"
4
+ }
graphrag/.semversioner/next-release/patch-20240711092703710242.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "type": "patch",
3
+ "description": "support non-open ai model config to prompt tune"
4
+ }
graphrag/.semversioner/next-release/patch-20240711223132221685.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "type": "patch",
3
+ "description": "Fix delta none on query calls"
4
+ }
graphrag/.semversioner/next-release/patch-20240712035356859335.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "type": "patch",
3
+ "description": "fix llm response content is None in query"
4
+ }
graphrag/.semversioner/next-release/patch-20240712210400518089.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "type": "patch",
3
+ "description": "Add exception handling on file load"
4
+ }
graphrag/.semversioner/next-release/patch-20240712235357550877.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "type": "patch",
3
+ "description": "Add llm params to local and global search"
4
+ }
graphrag/.semversioner/next-release/patch-20240716225953784804.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "type": "patch",
3
+ "description": "Fix for Ruff 0.5.2"
4
+ }
graphrag/.vsts-ci.yml ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: GraphRAG CI
2
+ pool:
3
+ vmImage: ubuntu-latest
4
+
5
+ trigger:
6
+ batch: true
7
+ branches:
8
+ include:
9
+ - main
10
+
11
+ variables:
12
+ isMain: $[eq(variables['Build.SourceBranch'], 'refs/heads/main')]
13
+ pythonVersion: "3.10"
14
+ poetryVersion: "1.6.1"
15
+ nodeVersion: "18.x"
16
+ artifactsFullFeedName: "Resilience/resilience_python"
17
+
18
+ stages:
19
+ - stage: Compliance
20
+ dependsOn: []
21
+ jobs:
22
+ - job: compliance
23
+ displayName: Compliance
24
+ pool:
25
+ vmImage: windows-latest
26
+ steps:
27
+ - task: CredScan@3
28
+ inputs:
29
+ outputFormat: sarif
30
+ debugMode: false
31
+
32
+ - task: ComponentGovernanceComponentDetection@0
33
+ inputs:
34
+ scanType: "Register"
35
+ verbosity: "Verbose"
36
+ alertWarningLevel: "High"
37
+
38
+ - task: PublishSecurityAnalysisLogs@3
39
+ inputs:
40
+ ArtifactName: "CodeAnalysisLogs"
41
+ ArtifactType: "Container"
graphrag/CODEOWNERS ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ # These owners will be the default owners for everything in
2
+ # the repo. Unless a later match takes precedence,
3
+ # @global-owner1 and @global-owner2 will be requested for
4
+ # review when someone opens a pull request.
5
+ * @microsoft/societal-resilience
6
+ * @microsoft/graphrag-core-team
graphrag/CODE_OF_CONDUCT.md ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ # Microsoft Open Source Code of Conduct
2
+
3
+ This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
4
+
5
+ Resources:
6
+
7
+ - [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/)
8
+ - [Microsoft Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/)
9
+ - Contact [[email protected]](mailto:[email protected]) with questions or concerns