Spaces:

sichaolong
/

graph-rag-local-ui-scl

Runtime error

App Files Files Community

sichaolong commited on Aug 19, 2024

Commit

e331e72

verified ·

1 Parent(s): 2d119be

Upload folder using huggingface_hub

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.DS_Store +0 -0
API_README.md +170 -0
EMBEDDING_PROXY_README.md +36 -0
INDEX_APP_README.md +127 -0
LICENSE +21 -0
README.md +263 -7
__pycache__/api.cpython-310.pyc +0 -0
__pycache__/embedding_proxy.cpython-310.pyc +0 -0
__pycache__/web.cpython-310.pyc +0 -0
api.py +943 -0
app.py +1786 -0
css +242 -0
embedding_proxy.py +62 -0
env-example.txt +19 -0
graphrag/.github/ISSUE_TEMPLATE.md +69 -0
graphrag/.github/ISSUE_TEMPLATE/bug_report.yml +57 -0
graphrag/.github/ISSUE_TEMPLATE/config.yml +1 -0
graphrag/.github/ISSUE_TEMPLATE/feature_request.yml +26 -0
graphrag/.github/ISSUE_TEMPLATE/general_issue.yml +51 -0
graphrag/.github/dependabot.yml +19 -0
graphrag/.github/pull_request_template.md +36 -0
graphrag/.github/workflows/gh-pages.yml +97 -0
graphrag/.github/workflows/javascript-ci.yml +30 -0
graphrag/.github/workflows/python-ci.yml +122 -0
graphrag/.github/workflows/python-publish.yml +52 -0
graphrag/.github/workflows/semver.yml +15 -0
graphrag/.github/workflows/spellcheck.yml +15 -0
graphrag/.gitignore +68 -0
graphrag/.semversioner/0.1.0.json +10 -0
graphrag/.semversioner/next-release/minor-20240710183748086411.json +4 -0
graphrag/.semversioner/next-release/patch-20240701233152787373.json +4 -0
graphrag/.semversioner/next-release/patch-20240703152422358587.json +4 -0
graphrag/.semversioner/next-release/patch-20240703182750529114.json +4 -0
graphrag/.semversioner/next-release/patch-20240704181236015699.json +4 -0
graphrag/.semversioner/next-release/patch-20240705184142723331.json +4 -0
graphrag/.semversioner/next-release/patch-20240705235656897489.json +4 -0
graphrag/.semversioner/next-release/patch-20240707063053679262.json +4 -0
graphrag/.semversioner/next-release/patch-20240709225514193665.json +4 -0
graphrag/.semversioner/next-release/patch-20240710114442871595.json +4 -0
graphrag/.semversioner/next-release/patch-20240710165603516866.json +4 -0
graphrag/.semversioner/next-release/patch-20240711004716103302.json +4 -0
graphrag/.semversioner/next-release/patch-20240711092703710242.json +4 -0
graphrag/.semversioner/next-release/patch-20240711223132221685.json +4 -0
graphrag/.semversioner/next-release/patch-20240712035356859335.json +4 -0
graphrag/.semversioner/next-release/patch-20240712210400518089.json +4 -0
graphrag/.semversioner/next-release/patch-20240712235357550877.json +4 -0
graphrag/.semversioner/next-release/patch-20240716225953784804.json +4 -0
graphrag/.vsts-ci.yml +41 -0
graphrag/CODEOWNERS +6 -0
graphrag/CODE_OF_CONDUCT.md +9 -0

.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

API_README.md ADDED Viewed

	@@ -0,0 +1,170 @@

+# GraphRAG API
+This README provides a detailed guide on the `api.py` file, which serves as the API interface for the GraphRAG (Graph Retrieval-Augmented Generation) system. GraphRAG is a powerful tool that combines graph-based knowledge representation with retrieval-augmented generation techniques to provide context-aware responses to queries.
+## Table of Contents
+1. [Overview](#overview)
+2. [Setup](#setup)
+3. [API Endpoints](#api-endpoints)
+4. [Data Models](#data-models)
+5. [Core Functionality](#core-functionality)
+6. [Usage Examples](#usage-examples)
+7. [Configuration](#configuration)
+8. [Troubleshooting](#troubleshooting)
+## Overview
+The `api.py` file implements a FastAPI-based server that provides various endpoints for interacting with the GraphRAG system. It supports different types of queries, including direct chat, GraphRAG-specific queries, DuckDuckGo searches, and a combined full-model search.
+Key features:
+- Multiple query types (local and global searches)
+- Context caching for improved performance
+- Background tasks for long-running operations
+- Customizable settings through environment variables and config files
+- Integration with external services (e.g., Ollama for LLM interactions)
+## Setup
+1. Install dependencies:
+   ```
+   pip install -r requirements.txt
+   ```
+2. Set up environment variables:
+   Create a `.env` file in the `indexing` directory with the following variables:
+   ```
+   LLM_API_BASE=<your_llm_api_base_url>
+   LLM_MODEL=<your_llm_model>
+   LLM_PROVIDER=<llm_provider>
+   EMBEDDINGS_API_BASE=<your_embeddings_api_base_url>
+   EMBEDDINGS_MODEL=<your_embeddings_model>
+   EMBEDDINGS_PROVIDER=<embeddings_provider>
+   INPUT_DIR=./indexing/output
+   ROOT_DIR=indexing
+   API_PORT=8012
+   ```
+3. Run the API server:
+   ```
+   python api.py --host 0.0.0.0 --port 8012
+   ```
+## API Endpoints
+### `/v1/chat/completions` (POST)
+Main endpoint for chat completions. Supports different models:
+- `direct-chat`: Direct interaction with the LLM
+- `graphrag-local-search:latest`: Local search using GraphRAG
+- `graphrag-global-search:latest`: Global search using GraphRAG
+- `duckduckgo-search:latest`: Web search using DuckDuckGo
+- `full-model:latest`: Combined search using all available models
+### `/v1/prompt_tune` (POST)
+Initiates prompt tuning process in the background.
+### `/v1/prompt_tune_status` (GET)
+Retrieves the status and logs of the prompt tuning process.
+### `/v1/index` (POST)
+Starts the indexing process for GraphRAG in the background.
+### `/v1/index_status` (GET)
+Retrieves the status and logs of the indexing process.
+### `/health` (GET)
+Health check endpoint.
+### `/v1/models` (GET)
+Lists available models.
+## Data Models
+The API uses several Pydantic models for request and response handling:
+- `Message`: Represents a chat message with role and content.
+- `QueryOptions`: Options for GraphRAG queries, including query type, preset, and community level.
+- `ChatCompletionRequest`: Request model for chat completions.
+- `ChatCompletionResponse`: Response model for chat completions.
+- `PromptTuneRequest`: Request model for prompt tuning.
+- `IndexingRequest`: Request model for indexing.
+## Core Functionality
+### Context Loading
+The `load_context` function loads necessary data for GraphRAG queries, including entities, relationships, reports, text units, and covariates.
+### Search Engine Setup
+`setup_search_engines` initializes both local and global search engines using the loaded context data.
+### Query Execution
+Different query types are handled by separate functions:
+- `run_direct_chat`: Sends queries directly to the LLM.
+- `run_graphrag_query`: Executes GraphRAG queries (local or global).
+- `run_duckduckgo_search`: Performs web searches using DuckDuckGo.
+- `run_full_model_search`: Combines results from all search types.
+### Background Tasks
+Long-running tasks like prompt tuning and indexing are executed as background tasks to prevent blocking the API.
+## Usage Examples
+### Sending a GraphRAG Query
+```python
+import requests
+url = "http://localhost:8012/v1/chat/completions"
+payload = {
+    "model": "graphrag-local-search:latest",
+    "messages": [{"role": "user", "content": "What is GraphRAG?"}],
+    "query_options": {
+        "query_type": "local-search",
+        "selected_folder": "your_indexed_folder",
+        "community_level": 2,
+        "response_type": "Multiple Paragraphs"
+    }
+}
+response = requests.post(url, json=payload)
+print(response.json())
+```
+### Starting Indexing Process
+```python
+import requests
+url = "http://localhost:8012/v1/index"
+payload = {
+    "llm_model": "your_llm_model",
+    "embed_model": "your_embed_model",
+    "root": "./indexing",
+    "verbose": True,
+    "emit": ["parquet", "csv"]
+}
+response = requests.post(url, json=payload)
+print(response.json())
+```
+## Configuration
+The API can be configured through:
+1. Environment variables
+2. A `config.yaml` file (path specified by `GRAPHRAG_CONFIG` environment variable)
+3. Command-line arguments when starting the server
+Key configuration options:
+- `llm_model`: The language model to use
+- `embedding_model`: The embedding model for vector representations
+- `community_level`: Depth of community analysis in GraphRAG
+- `token_limit`: Maximum tokens for context
+- `api_key`: API key for LLM service
+- `api_base`: Base URL for LLM API
+- `api_type`: Type of API (e.g., "openai")
+## Troubleshooting
+1. If you encounter connection errors with Ollama, ensure the service is running and accessible.
+2. For "context loading failed" errors, check that the indexed data is present in the specified output folder.
+3. If prompt tuning or indexing processes fail, review the logs using the respective status endpoints.
+4. For performance issues, consider adjusting the `community_level` and `token_limit` settings.
+For more detailed information on GraphRAG's indexing and querying processes, refer to the official GraphRAG documentation.

EMBEDDING_PROXY_README.md ADDED Viewed

	@@ -0,0 +1,36 @@

+# Using Ollama Embeddings with GraphRAG: A Quick Guide
+## Problem
+GraphRAG is designed to work with OpenAI-compatible APIs for both language models and embeddings and Ollama currently has their own way of doing embeddings.
+## Solution: Embeddings Proxy
+To bridge this gap, let's use an embeddings proxy. This proxy acts as a middleware between GraphRAG and Ollama, translating Ollama's embedding responses into a format that GraphRAG expects.
+## Use the Embeddings Proxy
+1. **Set up the proxy:**
+   - Save the provided `embedding_proxy.py` script to your project directory.
+   - Install required dependencies (not needed if you've already done this in the normal setup): `pip install fastapi uvicorn httpx`
+2. **Run the proxy:**
+   ```bash
+   python embedding_proxy.py --port 11435 --host http://localhost:11434
+   ```
+   This starts the proxy on port 11435, connecting to Ollama at localhost:11434.
+3. **Configure GraphRAG:**
+   Update your `settings.yaml` file to use the proxy for embeddings:
+   ```yaml
+   embeddings:
+     llm:
+       api_key: ${GRAPHRAG_API_KEY}
+       type: openai_embedding
+       model: nomic-embed-text:latest
+       api_base: http://localhost:11435  # Point to your proxy
+   ```
+4. **Run GraphRAG:**
+   With the proxy running and the configuration updated, you can now run GraphRAG as usual. It will use Ollama for embeddings through the proxy.

INDEX_APP_README.md ADDED Viewed

	@@ -0,0 +1,127 @@

+# GraphRAG Indexer Application
+## Table of Contents
+1. [Introduction](#introduction)
+2. [Setup](#setup)
+3. [Application Structure](#application-structure)
+4. [Indexing](#indexing)
+5. [Prompt Tuning](#prompt-tuning)
+6. [Data Management](#data-management)
+7. [Configuration](#configuration)
+8. [API Integration](#api-integration)
+9. [Troubleshooting](#troubleshooting)
+## Introduction
+The GraphRAG Indexer Application is a Gradio-based user interface for managing the indexing and prompt tuning processes of the GraphRAG (Graph Retrieval-Augmented Generation) system. This application provides an intuitive way to configure, run, and monitor indexing and prompt tuning tasks, as well as manage related data files.
+## Setup
+1. Ensure you have Python 3.7+ installed.
+2. Install required dependencies:
+   ```
+   pip install gradio requests pydantic python-dotenv pyyaml pandas lancedb
+   ```
+3. Set up environment variables in `indexing/.env`:
+   ```
+   API_BASE_URL=http://localhost:8012
+   LLM_API_BASE=http://localhost:11434
+   EMBEDDINGS_API_BASE=http://localhost:11434
+   ROOT_DIR=indexing
+   ```
+4. Run the application:
+   ```
+   python index_app.py
+   ```
+## Application Structure
+The application is divided into three main tabs:
+1. Indexing
+2. Prompt Tuning
+3. Data Management
+Each tab provides specific functionality related to its purpose.
+## Indexing
+The Indexing tab allows users to configure and run the GraphRAG indexing process.
+### Features:
+- Select LLM and Embedding models
+- Set root directory for indexing
+- Configure verbose and cache options
+- Advanced options for resuming, reporting, and output formats
+- Run indexing and check status
+### Usage:
+1. Select the desired LLM and Embedding models from the dropdowns.
+2. Set the root directory for indexing.
+3. Configure additional options as needed.
+4. Click "Run Indexing" to start the process.
+5. Use "Check Indexing Status" to monitor progress.
+## Prompt Tuning
+The Prompt Tuning tab enables users to configure and run prompt tuning for GraphRAG.
+### Features:
+- Set root directory and domain
+- Choose tuning method (random, top, all)
+- Configure limit, language, max tokens, and chunk size
+- Option to exclude entity types
+- Run prompt tuning and check status
+### Usage:
+1. Set the root directory and optional domain.
+2. Choose the tuning method and configure parameters.
+3. Click "Run Prompt Tuning" to start the process.
+4. Use "Check Prompt Tuning Status" to monitor progress.
+## Data Management
+The Data Management tab provides tools for managing input files and viewing output folders.
+### Features:
+- File upload functionality
+- File list management (view, refresh, delete)
+- Output folder exploration
+- File content viewing and editing
+### Usage:
+1. Use the File Upload section to add new input files.
+2. Manage existing files in the File Management section.
+3. Explore output folders and their contents in the Output Folders section.
+## Configuration
+The application uses a combination of environment variables and a `config.yaml` file for configuration. Key settings include:
+- LLM and Embedding models
+- API endpoints
+- Community level for GraphRAG
+- Token limits
+- API keys and types
+To modify these settings, edit the `.env` file or create a `config.yaml` file in the root directory.
+## API Integration
+The application integrates with a backend API for executing indexing and prompt tuning tasks. Key API endpoints used:
+- `/v1/index`: Start indexing process
+- `/v1/index_status`: Check indexing status
+- `/v1/prompt_tune`: Start prompt tuning process
+- `/v1/prompt_tune_status`: Check prompt tuning status
+These endpoints are called using the `requests` library, with appropriate error handling and logging.
+## Troubleshooting
+Common issues and solutions:
+1. **Model loading fails**: Ensure the LLM_API_BASE is correctly set and the API is accessible.
+2. **Indexing or Prompt Tuning doesn't start**: Check API connectivity and verify that all required fields are filled.
+3. **File management issues**: Ensure proper read/write permissions in the ROOT_DIR.
+For any persistent issues, check the application logs (visible in the console) for detailed error messages.

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2024 Beckett
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md CHANGED Viewed

@@ -1,12 +1,268 @@
 ---
-title: Graph Rag Local Ui Scl
-emoji: 📊
-colorFrom: yellow
-colorTo: indigo
 sdk: gradio
 sdk_version: 4.41.0
-app_file: app.py
-pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: graph-rag-local-ui-scl
+app_file: index_app.py
 sdk: gradio
 sdk_version: 4.41.0
 ---
+#  🕸️ GraphRAG Local
+Welcome to **GraphRAG Local with Index/Prompt-Tuning and Querying/Chat UIs**! This project is an adaptation of Microsoft's [GraphRAG](https://github.com/microsoft/graphrag), tailored to support local models and featuring a comprehensive interactive user interface ecosystem.
+## 📄 Research Paper
+For more details on the original GraphRAG implementation, please refer to the [GraphRAG paper](https://arxiv.org/pdf/2404.16130).
+## 🌟 Features
+- **API-Centric Architecture:** A robust FastAPI-based server (`api.py`) serving as the core of the GraphRAG operations.
+- **Dedicated Indexing and Prompt Tuning UI:** A separate Gradio-based interface (`index_app.py`) for managing indexing and prompt tuning processes.
+- **Local Model Support:** Leverage local models for LLM and embeddings, including compatibility with Ollama and OpenAI-compatible APIs.
+- **Cost-Effective:** Eliminate dependency on costly cloud-based models by using your own local models.
+- **Interactive UI:** User-friendly interface for managing data, running queries, and visualizing results (main app).
+- **Real-time Graph Visualization:** Visualize your knowledge graph in 2D or 3D using Plotly (main app).
+- **File Management:** Upload, view, edit, and delete input files directly from the UI.
+- **Settings Management:** Easily update and manage your GraphRAG settings through the UI.
+- **Output Exploration:** Browse and view indexing outputs and artifacts.
+- **Logging:** Real-time logging for better debugging and monitoring.
+- **Flexible Querying:** Support for global, local, and direct chat queries with customizable parameters (main app).
+- **Customizable Visualization:** Adjust graph layout, node sizes, colors, and more to suit your preferences (main app).
+![GraphRAG UI](uiv3.png)
+## 🗺️ Roadmap
+### **Important Note:** *Updates have been slow due to the day job and lack of immediate time, but I promise I am working on errors/issues in the background when able to. Please feel free to contribute/create a PR if you want to help out and find a great solution to an issue presented.*
+**The GraphRAG Local UI ecosystem is currently undergoing a major transition. While the main app remains functional, I am actively developing separate applications for Indexing/Prompt Tuning and Querying/Chat, all built around a robust central API. Users should expect some changes and potential instability during this transition period.**
+*While it is currently functional, it has only been primarily tested on a Mac Studio M2.*
+My vision for the GraphRAG Local UI ecosystem is to become the ultimate set of tools for working with GraphRAG and local LLMs, incorporating as many cool features and knowledge graph tools as possible. I am continuously working on improvements and new features.
+### Recent Updates
+- [x] New API-centric architecture (`api.py`)
+- [x] Dedicated Indexing and Prompt Tuning UI (`index_app.py`)
+- [x] Improved file management and output exploration
+- [x] Background task handling for long-running operations
+- [x] Enhanced configuration options through environment variables and YAML files
+### Upcoming Features
+- [ ] Dedicated Querying/Chat UI that interacts with the API
+- [ ] Dockerfile for easier deployment
+- [ ] Launch your own GraphRAG API server for use in external applications
+- [ ] Experimental: Mixture of Agents for Indexing/Query of knowledge graph
+- [ ] Support for more file formats (CSV, PDF, etc.)
+- [ ] Web search/Scraping capabilities
+- [ ] Advanced graph analysis tools
+- [ ] Integration with popular knowledge management tools
+- [ ] Collaborative features for team-based knowledge graph building
+I am committed to making the GraphRAG Local UI ecosystem the most comprehensive and user-friendly toolset for working with knowledge graphs and LLMs. Your feedback and suggestions are much needed in shaping the future of this project.
+Feel free to open an Issue if you run into an error, and I will try to address it as soon as possible to minimize any downtime you might experience.
+---
+## 📦 Installation and Setup
+Follow these steps to set up and run the GraphRAG Local UI ecosystem:
+1. **Create and activate a new conda environment:**
+    ```bash
+    conda create -n graphrag-local -y
+    conda activate graphrag-local
+    ```
+2. **Install the required packages:**
+    First install the GraphRAG dir from this repo (has changes not present in the Microsoft repo):
+    ```bash
+    pip install -e ./graphrag
+    ```
+    Then install the rest of the dependencies:
+    ```bash
+    pip install -r requirements.txt
+    ```
+3. **Launch the API server:**
+    ```bash
+    python api.py --host 0.0.0.0 --port 8012 --reload
+    ```
+4. **If using Ollama for embeddings, launch the embedding proxy:**
+    ```bash
+    python embedding_proxy.py --port 11435 --host http://localhost:11434
+    ```
+    Note: For detailed instructions on using Ollama embeddings with GraphRAG, refer to the EMBEDDING_PROXY_README.md file.
+5. **Launch the Indexing and Prompt Tuning UI:**
+    ```bash
+    gradio index_app.py
+    ```
+6. **Launch the main interactive UI (legacy app):**
+    ```bash
+    gradio app.py
+    ```
+    or
+    ```bash
+    python app.py
+    ```
+7. **Access the UIs:**
+    - Indexing and Prompt Tuning UI: Open your web browser and navigate to `http://localhost:7861`
+    - Main UI (legacy): Open your web browser and navigate to `http://localhost:7860`
+---
+## 🚀 Getting Started with GraphRAG Local
+GraphRAG is designed for flexibility, allowing you to quickly create and initialize your own indexing directory. Follow these steps to set up your environment:
+### 1. Create the Indexing Directory
+This repo comes with a pre-made Indexing folder but you may want to make your own, so here are the steps. First, create the required directory structure for your input data and indexing results:
+```bash
+mkdir -p ./indexing/input
+```
+This directory will store:
+- Input .txt files for indexing
+- Output results
+- Prompts for Prompt Tuning
+### 2. Add Sample Data (Optional)
+If you want to start with sample data, copy it to your new input directory:
+```bash
+cp input/* ./indexing/input
+```
+You can also add your own .txt files to this directory for indexing.
+### 3. Initialize the Indexing Folder
+Run the following command to initialize the ./indexing folder with the required files:
+```bash
+python -m graphrag.index --init --root ./indexing
+```
+### 4. Configure Settings
+Move the pre-configured `settings.yaml` file to your indexing directory:
+```bash
+mv settings.yaml ./indexing
+```
+This file contains the main configuration, pre-set for use with local models.
+### 5. Customization
+You can customize your setup by modifying the following environment variables:
+- `ROOT_DIR`: Points to your main indexing directory
+- `INPUT_DIR`: Specifies the location of your input files
+### 📚 Additional Resources
+For more detailed information and advanced usage, refer to the [official GraphRAG documentation](https://microsoft.github.io/graphrag/posts/get_started/).
+---
+## 🖥️ GraphRAG Application Ecosystem
+The GraphRAG Local UI ecosystem consists of three main components, each serving a specific purpose in the knowledge graph creation and querying process:
+### 1. Core API (`api.py`)
+The `api.py` file serves as the backbone of the GraphRAG system, providing a robust FastAPI-based server that handles all core operations.
+Key features:
+- Manages indexing and prompt tuning processes
+- Handles various query types (local, global, and direct chat)
+- Integrates with local LLM and embedding models
+- Provides endpoints for file management and system configuration
+Usage:
+```bash
+python api.py --host 0.0.0.0 --port 8012 --reload
+```
+Note: If using Ollama for embeddings, make sure to run the embedding proxy (`embedding_proxy.py`) alongside `api.py`. Refer to the EMBEDDING_PROXY_README.md for detailed instructions.
+### 2. Indexing and Prompt Tuning UI (`index_app.py`)
+#### Workflow Integration
+1. Start the Core API (`api.py`) to enable backend functionality.
+2. If using Ollama for embeddings, start the embedding proxy (`embedding_proxy.py`).
+3. Use the Indexing and Prompt Tuning UI (`index_app.py`) to prepare your data and fine-tune the system.
+4. (Optional) Use the Main Interactive UI (`app.py`) for visualization and legacy features.
+This modular approach allows for greater flexibility and easier maintenance of the GraphRAG system. As development continues, the functionality of `app.py` will be gradually integrated into new, specialized interfaces that interact with the core API.
+### 2. Indexing and Prompt Tuning UI (`index_app.py`)
+The `index_app.py` file provides a user-friendly Gradio interface for managing the indexing and prompt tuning processes.
+Key features:
+- Configure and run indexing tasks
+- Set up and execute prompt tuning
+- Manage input files and explore output data
+- Adjust LLM and embedding settings
+Usage:
+```bash
+python index_app.py
+```
+Access the UI at `http://localhost:7861`
+### 3. Main Interactive UI (Legacy App) (`app.py`)
+The `app.py` file is the pre-existing main application, which is being phased out but still provides useful functionality.
+Key features:
+- Visualize knowledge graphs in 2D or 3D
+- Run queries and view results
+- Manage GraphRAG settings
+- Explore indexed data
+Usage:
+```bash
+python app.py
+```
+or
+```bash
+gradio app.py
+```
+Access the UI at `http://localhost:7860`
+### Workflow Integration
+1. Start the Core API (`api.py`) to enable backend functionality.
+2. Use the Indexing and Prompt Tuning UI (`index_app.py`) to prepare your data and fine-tune the system.
+3. (Optional) Use the Main Interactive UI (`app.py`) for visualization and legacy features.
+This modular approach allows for greater flexibility and easier maintenance of the GraphRAG system. As development continues, the functionality of `app.py` will be gradually integrated into new, specialized interfaces that interact with the core API.
+---
+## 📚 Citations
+- Original GraphRAG repository by Microsoft: [GraphRAG](https://github.com/microsoft/graphrag)
+- This project took inspiration and used the GraphRAG4OpenWebUI repository by win4r (https://github.com/win4r/GraphRAG4OpenWebUI) as a starting point for the API implementation.
+---
+## Troubleshooting
+- If you encounter any issues with the new API or Indexing UI, please check the console logs for detailed error messages.
+- For the main app, if you can't run `gradio app.py`, try running `pip install --upgrade gradio` and then exit out and start a new terminal. It should then load and launch properly as a Gradio app.
+- On Windows, if you run into an encoding/UTF error, you can change it to the correct format in the YAML Settings menu.
+For any issues or feature requests, please open an issue on the GitHub repository. Happy knowledge graphing!

__pycache__/api.cpython-310.pyc ADDED Viewed

Binary file (27.4 kB). View file

__pycache__/embedding_proxy.cpython-310.pyc ADDED Viewed

Binary file (2.39 kB). View file

__pycache__/web.cpython-310.pyc ADDED Viewed

Binary file (4.84 kB). View file

api.py ADDED Viewed

	@@ -0,0 +1,943 @@

+from dotenv import load_dotenv
+import os
+import asyncio
+import tempfile
+from collections import deque
+import time
+import uuid
+import json
+import re
+import pandas as pd
+import tiktoken
+import logging
+import yaml
+import shutil
+from fastapi import Body
+from fastapi import FastAPI, HTTPException, Request, BackgroundTasks, Depends
+from fastapi.responses import JSONResponse, StreamingResponse
+from pydantic import BaseModel, Field
+from typing import List, Optional, Dict, Any, Union
+from contextlib import asynccontextmanager
+from web import DuckDuckGoSearchAPIWrapper
+from functools import lru_cache
+import requests
+import subprocess
+import argparse
+# GraphRAG related imports
+from graphrag.query.context_builder.entity_extraction import EntityVectorStoreKey
+from graphrag.query.indexer_adapters import (
+    read_indexer_covariates,
+    read_indexer_entities,
+    read_indexer_relationships,
+    read_indexer_reports,
+    read_indexer_text_units,
+)
+from graphrag.query.input.loaders.dfs import store_entity_semantic_embeddings
+from graphrag.query.llm.oai.chat_openai import ChatOpenAI
+from graphrag.query.llm.oai.embedding import OpenAIEmbedding
+from graphrag.query.llm.oai.typing import OpenaiApiType
+from graphrag.query.question_gen.local_gen import LocalQuestionGen
+from graphrag.query.structured_search.local_search.mixed_context import LocalSearchMixedContext
+from graphrag.query.structured_search.local_search.search import LocalSearch
+from graphrag.query.structured_search.global_search.community_context import GlobalCommunityContext
+from graphrag.query.structured_search.global_search.search import GlobalSearch
+from graphrag.vector_stores.lancedb import LanceDBVectorStore
+# Set up logging
+logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
+logger = logging.getLogger(__name__)
+# Load environment variables
+load_dotenv('indexing/.env')
+LLM_API_BASE = os.getenv('LLM_API_BASE', '')
+LLM_MODEL = os.getenv('LLM_MODEL')
+LLM_PROVIDER = os.getenv('LLM_PROVIDER', 'openai').lower()
+EMBEDDINGS_API_BASE = os.getenv('EMBEDDINGS_API_BASE', '')
+EMBEDDINGS_MODEL = os.getenv('EMBEDDINGS_MODEL')
+EMBEDDINGS_PROVIDER = os.getenv('EMBEDDINGS_PROVIDER', 'openai').lower()
+INPUT_DIR = os.getenv('INPUT_DIR', './indexing/output')
+ROOT_DIR = os.getenv('ROOT_DIR', 'indexing')
+PORT = int(os.getenv('API_PORT', 8012))
+LANCEDB_URI = f"{INPUT_DIR}/lancedb"
+COMMUNITY_REPORT_TABLE = "create_final_community_reports"
+ENTITY_TABLE = "create_final_nodes"
+ENTITY_EMBEDDING_TABLE = "create_final_entities"
+RELATIONSHIP_TABLE = "create_final_relationships"
+COVARIATE_TABLE = "create_final_covariates"
+TEXT_UNIT_TABLE = "create_final_text_units"
+COMMUNITY_LEVEL = 2
+# Global variables for storing search engines and question generator
+local_search_engine = None
+global_search_engine = None
+question_generator = None
+# Data models
+class Message(BaseModel):
+    role: str
+    content: str
+class QueryOptions(BaseModel):
+    query_type: str
+    preset: Optional[str] = None
+    community_level: Optional[int] = None
+    response_type: Optional[str] = None
+    custom_cli_args: Optional[str] = None
+    selected_folder: Optional[str] = None
+class ChatCompletionRequest(BaseModel):
+    model: str
+    messages: List[Message]
+    temperature: Optional[float] = 0.7
+    max_tokens: Optional[int] = None
+    stream: Optional[bool] = False
+    query_options: Optional[QueryOptions] = None
+class ChatCompletionResponseChoice(BaseModel):
+    index: int
+    message: Message
+    finish_reason: Optional[str] = None
+class Usage(BaseModel):
+    prompt_tokens: int
+    completion_tokens: int
+    total_tokens: int
+class ChatCompletionResponse(BaseModel):
+    id: str = Field(default_factory=lambda: f"chatcmpl-{uuid.uuid4().hex}")
+    object: str = "chat.completion"
+    created: int = Field(default_factory=lambda: int(time.time()))
+    model: str
+    choices: List[ChatCompletionResponseChoice]
+    usage: Usage
+    system_fingerprint: Optional[str] = None
+def list_output_folders():
+    return [f for f in os.listdir(INPUT_DIR) if os.path.isdir(os.path.join(INPUT_DIR, f))]
+def list_folder_contents(folder_name):
+    folder_path = os.path.join(INPUT_DIR, folder_name, "artifacts")
+    if not os.path.exists(folder_path):
+        return []
+    return [item for item in os.listdir(folder_path) if item.endswith('.parquet')]
+def normalize_api_base(api_base: str) -> str:
+    """Normalize the API base URL by removing trailing slashes and /v1 or /api suffixes."""
+    api_base = api_base.rstrip('/')
+    if api_base.endswith('/v1') or api_base.endswith('/api'):
+        api_base = api_base[:-3]
+    return api_base
+def get_models_endpoint(api_base: str, api_type: str) -> str:
+    """Get the appropriate models endpoint based on the API type."""
+    normalized_base = normalize_api_base(api_base)
+    if api_type.lower() == 'openai':
+        return f"{normalized_base}/v1/models"
+    elif api_type.lower() == 'azure':
+        return f"{normalized_base}/openai/deployments?api-version=2022-12-01"
+    else:  # For other API types (e.g., local LLMs)
+        return f"{normalized_base}/models"
+async def fetch_available_models(settings: Dict[str, Any]) -> List[str]:
+    """Fetch available models from the API."""
+    api_base = settings['api_base']
+    api_type = settings['api_type']
+    api_key = settings['api_key']
+    models_endpoint = get_models_endpoint(api_base, api_type)
+    headers = {"Authorization": f"Bearer {api_key}"} if api_key else {}
+    try:
+        response = requests.get(models_endpoint, headers=headers, timeout=10)
+        response.raise_for_status()
+        data = response.json()
+        if api_type.lower() == 'openai':
+            return [model['id'] for model in data['data']]
+        elif api_type.lower() == 'azure':
+            return [model['id'] for model in data['value']]
+        else:
+            # Adjust this based on the actual response format of your local LLM API
+            return [model['name'] for model in data['models']]
+    except requests.exceptions.RequestException as e:
+        logger.error(f"Error fetching models: {str(e)}")
+        return []
+def load_settings():
+    config_path = os.getenv('GRAPHRAG_CONFIG', 'config.yaml')
+    if os.path.exists(config_path):
+        with open(config_path, 'r') as config_file:
+            config = yaml.safe_load(config_file)
+    else:
+        config = {}
+    settings = {
+        'llm_model': os.getenv('LLM_MODEL', config.get('llm_model')),
+        'embedding_model': os.getenv('EMBEDDINGS_MODEL', config.get('embedding_model')),
+        'community_level': int(os.getenv('COMMUNITY_LEVEL', config.get('community_level', 2))),
+        'token_limit': int(os.getenv('TOKEN_LIMIT', config.get('token_limit', 4096))),
+        'api_key': os.getenv('GRAPHRAG_API_KEY', config.get('api_key')),
+        'api_base': os.getenv('LLM_API_BASE', config.get('api_base')),
+        'embeddings_api_base': os.getenv('EMBEDDINGS_API_BASE', config.get('embeddings_api_base')),
+        'api_type': os.getenv('API_TYPE', config.get('api_type', 'openai')),
+    }
+    return settings
+    return settings
+async def setup_llm_and_embedder(settings):
+    logger.info("Setting up LLM and embedder")
+    try:
+        llm = ChatOpenAI(
+            api_key=settings['api_key'],
+            api_base=f"{settings['api_base']}/v1",
+            model=settings['llm_model'],
+            api_type=OpenaiApiType[settings['api_type'].capitalize()],
+            max_retries=20,
+        )
+        token_encoder = tiktoken.get_encoding("cl100k_base")
+        text_embedder = OpenAIEmbedding(
+            api_key=settings['api_key'],
+            api_base=f"{settings['embeddings_api_base']}/v1",
+            api_type=OpenaiApiType[settings['api_type'].capitalize()],
+            model=settings['embedding_model'],
+            deployment_name=settings['embedding_model'],
+            max_retries=20,
+        )
+        logger.info("LLM and embedder setup complete")
+        return llm, token_encoder, text_embedder
+    except Exception as e:
+        logger.error(f"Error setting up LLM and embedder: {str(e)}")
+        raise HTTPException(status_code=500, detail=f"Failed to set up LLM and embedder: {str(e)}")
+async def load_context(selected_folder, settings):
+    """
+    Load context data including entities, relationships, reports, text units, and covariates
+    """
+    logger.info("Loading context data")
+    try:
+        input_dir = os.path.join(INPUT_DIR, selected_folder, "artifacts")
+        entity_df = pd.read_parquet(f"{input_dir}/{ENTITY_TABLE}.parquet")
+        entity_embedding_df = pd.read_parquet(f"{input_dir}/{ENTITY_EMBEDDING_TABLE}.parquet")
+        entities = read_indexer_entities(entity_df, entity_embedding_df, settings['community_level'])
+        description_embedding_store = LanceDBVectorStore(collection_name="entity_description_embeddings")
+        description_embedding_store.connect(db_uri=LANCEDB_URI)
+        store_entity_semantic_embeddings(entities=entities, vectorstore=description_embedding_store)
+        relationship_df = pd.read_parquet(f"{input_dir}/{RELATIONSHIP_TABLE}.parquet")
+        relationships = read_indexer_relationships(relationship_df)
+        report_df = pd.read_parquet(f"{input_dir}/{COMMUNITY_REPORT_TABLE}.parquet")
+        reports = read_indexer_reports(report_df, entity_df, COMMUNITY_LEVEL)
+        text_unit_df = pd.read_parquet(f"{input_dir}/{TEXT_UNIT_TABLE}.parquet")
+        text_units = read_indexer_text_units(text_unit_df)
+        covariate_df = pd.read_parquet(f"{input_dir}/{COVARIATE_TABLE}.parquet")
+        claims = read_indexer_covariates(covariate_df)
+        logger.info(f"Number of claim records: {len(claims)}")
+        covariates = {"claims": claims}
+        logger.info("Context data loading complete")
+        return entities, relationships, reports, text_units, description_embedding_store, covariates
+    except Exception as e:
+        logger.error(f"Error loading context data: {str(e)}")
+        raise
+async def setup_search_engines(llm, token_encoder, text_embedder, entities, relationships, reports, text_units,
+                               description_embedding_store, covariates):
+    """
+    Set up local and global search engines
+    """
+    logger.info("Setting up search engines")
+    # Set up local search engine
+    local_context_builder = LocalSearchMixedContext(
+        community_reports=reports,
+        text_units=text_units,
+        entities=entities,
+        relationships=relationships,
+        covariates=covariates,
+        entity_text_embeddings=description_embedding_store,
+        embedding_vectorstore_key=EntityVectorStoreKey.ID,
+        text_embedder=text_embedder,
+        token_encoder=token_encoder,
+    )
+    local_context_params = {
+        "text_unit_prop": 0.5,
+        "community_prop": 0.1,
+        "conversation_history_max_turns": 5,
+        "conversation_history_user_turns_only": True,
+        "top_k_mapped_entities": 10,
+        "top_k_relationships": 10,
+        "include_entity_rank": True,
+        "include_relationship_weight": True,
+        "include_community_rank": False,
+        "return_candidate_context": False,
+        "embedding_vectorstore_key": EntityVectorStoreKey.ID,
+        "max_tokens": 12_000,
+    }
+    local_llm_params = {
+        "max_tokens": 2_000,
+        "temperature": 0.0,
+    }
+    local_search_engine = LocalSearch(
+        llm=llm,
+        context_builder=local_context_builder,
+        token_encoder=token_encoder,
+        llm_params=local_llm_params,
+        context_builder_params=local_context_params,
+        response_type="multiple paragraphs",
+    )
+    # Set up global search engine
+    global_context_builder = GlobalCommunityContext(
+        community_reports=reports,
+        entities=entities,
+        token_encoder=token_encoder,
+    )
+    global_context_builder_params = {
+        "use_community_summary": False,
+        "shuffle_data": True,
+        "include_community_rank": True,
+        "min_community_rank": 0,
+        "community_rank_name": "rank",
+        "include_community_weight": True,
+        "community_weight_name": "occurrence weight",
+        "normalize_community_weight": True,
+        "max_tokens": 12_000,
+        "context_name": "Reports",
+    }
+    map_llm_params = {
+        "max_tokens": 1000,
+        "temperature": 0.0,
+        "response_format": {"type": "json_object"},
+    }
+    reduce_llm_params = {
+        "max_tokens": 2000,
+        "temperature": 0.0,
+    }
+    global_search_engine = GlobalSearch(
+        llm=llm,
+        context_builder=global_context_builder,
+        token_encoder=token_encoder,
+        max_data_tokens=12_000,
+        map_llm_params=map_llm_params,
+        reduce_llm_params=reduce_llm_params,
+        allow_general_knowledge=False,
+        json_mode=True,
+        context_builder_params=global_context_builder_params,
+        concurrent_coroutines=32,
+        response_type="multiple paragraphs",
+    )
+    logger.info("Search engines setup complete")
+    return local_search_engine, global_search_engine, local_context_builder, local_llm_params, local_context_params
+def format_response(response):
+    """
+    Format the response by adding appropriate line breaks and paragraph separations.
+    """
+    paragraphs = re.split(r'\n{2,}', response)
+    formatted_paragraphs = []
+    for para in paragraphs:
+        if '```' in para:
+            parts = para.split('```')
+            for i, part in enumerate(parts):
+                if i % 2 == 1:  # This is a code block
+                    parts[i] = f"\n```\n{part.strip()}\n```\n"
+            para = ''.join(parts)
+        else:
+            para = para.replace('. ', '.\n')
+        formatted_paragraphs.append(para.strip())
+    return '\n\n'.join(formatted_paragraphs)
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    global settings
+    try:
+        logger.info("Loading settings...")
+        settings = load_settings()
+        logger.info("Settings loaded successfully.")
+    except Exception as e:
+        logger.error(f"Error loading settings: {str(e)}")
+        raise
+    yield
+    logger.info("Shutting down...")
+app = FastAPI(lifespan=lifespan)
+# Create a cache for loaded contexts
+context_cache = {}
+@lru_cache()
+def get_settings():
+    return load_settings()
+async def get_context(selected_folder: str, settings: dict = Depends(get_settings)):
+    if selected_folder not in context_cache:
+        try:
+            llm, token_encoder, text_embedder = await setup_llm_and_embedder(settings)
+            entities, relationships, reports, text_units, description_embedding_store, covariates = await load_context(selected_folder, settings)
+            local_search_engine, global_search_engine, local_context_builder, local_llm_params, local_context_params = await setup_search_engines(
+                llm, token_encoder, text_embedder, entities, relationships, reports, text_units,
+                description_embedding_store, covariates
+            )
+            question_generator = LocalQuestionGen(
+                llm=llm,
+                context_builder=local_context_builder,
+                token_encoder=token_encoder,
+                llm_params=local_llm_params,
+                context_builder_params=local_context_params,
+            )
+            context_cache[selected_folder] = {
+                "local_search_engine": local_search_engine,
+                "global_search_engine": global_search_engine,
+                "question_generator": question_generator
+            }
+        except Exception as e:
+            logger.error(f"Error loading context for folder {selected_folder}: {str(e)}")
+            raise HTTPException(status_code=500, detail=f"Failed to load context for folder {selected_folder}")
+    return context_cache[selected_folder]
+@app.post("/v1/chat/completions")
+async def chat_completions(request: ChatCompletionRequest):
+    try:
+        logger.info(f"Received request for model: {request.model}")
+        if request.model == "direct-chat":
+            logger.info("Routing to direct chat")
+            return await run_direct_chat(request)
+        elif request.model.startswith("graphrag-"):
+            logger.info("Routing to GraphRAG query")
+            if not request.query_options or not request.query_options.selected_folder:
+                raise HTTPException(status_code=400, detail="Selected folder is required for GraphRAG queries")
+            return await run_graphrag_query(request)
+        elif request.model == "duckduckgo-search:latest":
+            logger.info("Routing to DuckDuckGo search")
+            return await run_duckduckgo_search(request)
+        elif request.model == "full-model:latest":
+            logger.info("Routing to full model search")
+            return await run_full_model_search(request)
+        else:
+            raise HTTPException(status_code=400, detail=f"Invalid model specified: {request.model}")
+    except HTTPException as he:
+        logger.error(f"HTTP Exception: {str(he)}")
+        raise he
+    except Exception as e:
+        logger.error(f"Error in chat completion: {str(e)}", exc_info=True)
+        raise HTTPException(status_code=500, detail=str(e))
+async def run_direct_chat(request: ChatCompletionRequest) -> ChatCompletionResponse:
+    try:
+        if not LLM_API_BASE:
+            raise ValueError("LLM_API_BASE environment variable is not set")
+        headers = {"Content-Type": "application/json"}
+        payload = {
+            "model": LLM_MODEL,
+            "messages": [{"role": msg.role, "content": msg.content} for msg in request.messages],
+            "stream": False
+        }
+        # Optional parameters
+        if request.temperature is not None:
+            payload["temperature"] = request.temperature
+        if request.max_tokens is not None:
+            payload["max_tokens"] = request.max_tokens
+        full_url = f"{normalize_api_base(LLM_API_BASE)}/v1/chat/completions"
+        logger.info(f"Sending request to: {full_url}")
+        logger.info(f"Payload: {payload}")
+        try:
+            response = requests.post(full_url, json=payload, headers=headers, timeout=10)
+            response.raise_for_status()
+        except requests.exceptions.RequestException as req_ex:
+            logger.error(f"Request to LLM API failed: {str(req_ex)}")
+            if isinstance(req_ex, requests.exceptions.ConnectionError):
+                raise HTTPException(status_code=503, detail="Unable to connect to LLM API. Please check your API settings.")
+            elif isinstance(req_ex, requests.exceptions.Timeout):
+                raise HTTPException(status_code=504, detail="Request to LLM API timed out")
+            else:
+                raise HTTPException(status_code=500, detail=f"Request to LLM API failed: {str(req_ex)}")
+        result = response.json()
+        logger.info(f"Received response: {result}")
+        content = result['choices'][0]['message']['content']
+        return ChatCompletionResponse(
+            model=LLM_MODEL,
+            choices=[
+                ChatCompletionResponseChoice(
+                    index=0,
+                    message=Message(
+                        role="assistant",
+                        content=content
+                    ),
+                    finish_reason=None
+                )
+            ],
+            usage=None
+        )
+    except HTTPException as he:
+        logger.error(f"HTTP Exception in direct chat: {str(he)}")
+        raise he
+    except Exception as e:
+        logger.error(f"Unexpected error in direct chat: {str(e)}")
+        raise HTTPException(status_code=500, detail=f"An unexpected error occurred during the direct chat: {str(e)}")
+def get_embeddings(text: str) -> List[float]:
+    settings = load_settings()
+    embeddings_api_base = settings['embeddings_api_base']
+    headers = {"Content-Type": "application/json"}
+    if EMBEDDINGS_PROVIDER == 'ollama':
+        payload = {
+            "model": EMBEDDINGS_MODEL,
+            "prompt": text
+        }
+        full_url = f"{embeddings_api_base}/api/embeddings"
+    else:  # OpenAI-compatible API
+        payload = {
+            "model": EMBEDDINGS_MODEL,
+            "input": text
+        }
+        full_url = f"{embeddings_api_base}/v1/embeddings"
+    try:
+        response = requests.post(full_url, json=payload, headers=headers)
+        response.raise_for_status()
+    except requests.exceptions.RequestException as req_ex:
+        logger.error(f"Request to Embeddings API failed: {str(req_ex)}")
+        raise HTTPException(status_code=500, detail=f"Failed to get embeddings: {str(req_ex)}")
+    result = response.json()
+    if EMBEDDINGS_PROVIDER == 'ollama':
+        return result['embedding']
+    else:
+        return result['data'][0]['embedding']
+async def run_graphrag_query(request: ChatCompletionRequest) -> ChatCompletionResponse:
+    try:
+        query_options = request.query_options
+        query = request.messages[-1].content  # Get the last user message as the query
+        cmd = ["python", "-m", "graphrag.query"]
+        cmd.extend(["--data", f"./indexing/output/{query_options.selected_folder}/artifacts"])
+        cmd.extend(["--method", query_options.query_type.split('-')[1]])  # 'global' or 'local'
+        if query_options.community_level:
+            cmd.extend(["--community_level", str(query_options.community_level)])
+        if query_options.response_type:
+            cmd.extend(["--response_type", query_options.response_type])
+        # Handle preset CLI args
+        if query_options.preset and query_options.preset != "Custom Query":
+            preset_args = get_preset_args(query_options.preset)
+            cmd.extend(preset_args)
+        # Handle custom CLI args
+        if query_options.custom_cli_args:
+            cmd.extend(query_options.custom_cli_args.split())
+        cmd.append(query)
+        logger.info(f"Executing GraphRAG query: {' '.join(cmd)}")
+        result = subprocess.run(cmd, capture_output=True, text=True)
+        if result.returncode != 0:
+            raise Exception(f"GraphRAG query failed: {result.stderr}")
+        return ChatCompletionResponse(
+            model=request.model,
+            choices=[
+                ChatCompletionResponseChoice(
+                    index=0,
+                    message=Message(
+                        role="assistant",
+                        content=result.stdout
+                    ),
+                    finish_reason="stop"
+                )
+            ],
+            usage=Usage(
+                prompt_tokens=0,
+                completion_tokens=0,
+                total_tokens=0
+            )
+        )
+    except Exception as e:
+        logger.error(f"Error in GraphRAG query: {str(e)}")
+        raise HTTPException(status_code=500, detail=f"An error occurred during the GraphRAG query: {str(e)}")
+def get_preset_args(preset: str) -> List[str]:
+    preset_args = {
+        "Default Global Search": ["--community_level", "2", "--response_type", "Multiple Paragraphs"],
+        "Default Local Search": ["--community_level", "2", "--response_type", "Multiple Paragraphs"],
+        "Detailed Global Analysis": ["--community_level", "3", "--response_type", "Multi-Page Report"],
+        "Detailed Local Analysis": ["--community_level", "3", "--response_type", "Multi-Page Report"],
+        "Quick Global Summary": ["--community_level", "1", "--response_type", "Single Paragraph"],
+        "Quick Local Summary": ["--community_level", "1", "--response_type", "Single Paragraph"],
+        "Global Bullet Points": ["--community_level", "2", "--response_type", "List of 3-7 Points"],
+        "Local Bullet Points": ["--community_level", "2", "--response_type", "List of 3-7 Points"],
+        "Comprehensive Global Report": ["--community_level", "4", "--response_type", "Multi-Page Report"],
+        "Comprehensive Local Report": ["--community_level", "4", "--response_type", "Multi-Page Report"],
+        "High-Level Global Overview": ["--community_level", "1", "--response_type", "Single Page"],
+        "High-Level Local Overview": ["--community_level", "1", "--response_type", "Single Page"],
+        "Focused Global Insight": ["--community_level", "3", "--response_type", "Single Paragraph"],
+        "Focused Local Insight": ["--community_level", "3", "--response_type", "Single Paragraph"],
+    }
+    return preset_args.get(preset, [])
+ddg_search = DuckDuckGoSearchAPIWrapper(max_results=5)
+async def run_duckduckgo_search(request: ChatCompletionRequest) -> ChatCompletionResponse:
+    query = request.messages[-1].content
+    results = ddg_search.results(query, max_results=5)
+    if not results:
+        content = "No results found for the given query."
+    else:
+        content = "DuckDuckGo Search Results:\n\n"
+        for result in results:
+            content += f"Title: {result['title']}\n"
+            content += f"Snippet: {result['snippet']}\n"
+            content += f"Link: {result['link']}\n"
+            if 'date' in result:
+                content += f"Date: {result['date']}\n"
+            if 'source' in result:
+                content += f"Source: {result['source']}\n"
+            content += "\n"
+    return ChatCompletionResponse(
+        model=request.model,
+        choices=[
+            ChatCompletionResponseChoice(
+                index=0,
+                message=Message(
+                    role="assistant",
+                    content=content
+                ),
+                finish_reason="stop"
+            )
+        ],
+        usage=Usage(
+            prompt_tokens=0,
+            completion_tokens=0,
+            total_tokens=0
+        )
+    )
+async def run_full_model_search(request: ChatCompletionRequest) -> ChatCompletionResponse:
+    query = request.messages[-1].content
+    # Run all search types
+    graphrag_global = await run_graphrag_query(ChatCompletionRequest(model="graphrag-global-search:latest", messages=request.messages, query_options=request.query_options))
+    graphrag_local = await run_graphrag_query(ChatCompletionRequest(model="graphrag-local-search:latest", messages=request.messages, query_options=request.query_options))
+    duckduckgo = await run_duckduckgo_search(request)
+    # Combine results
+    combined_content = f"""Full Model Search Results:
+Global Search:
+{graphrag_global.choices[0].message.content}
+Local Search:
+{graphrag_local.choices[0].message.content}
+DuckDuckGo Search:
+{duckduckgo.choices[0].message.content}
+"""
+    return ChatCompletionResponse(
+        model=request.model,
+        choices=[
+            ChatCompletionResponseChoice(
+                index=0,
+                message=Message(
+                    role="assistant",
+                    content=combined_content
+                ),
+                finish_reason="stop"
+            )
+        ],
+        usage=Usage(
+            prompt_tokens=0,
+            completion_tokens=0,
+            total_tokens=0
+        )
+    )
+@app.get("/health")
+async def health_check():
+    return {"status": "ok"}
+@app.get("/v1/models")
+async def list_models():
+    settings = load_settings()
+    try:
+        api_models = await fetch_available_models(settings)
+    except Exception as e:
+        logger.error(f"Error fetching API models: {str(e)}")
+        api_models = []
+    # Include the hardcoded models
+    hardcoded_models = [
+        {"id": "graphrag-local-search:latest", "object": "model", "owned_by": "graphrag"},
+        {"id": "graphrag-global-search:latest", "object": "model", "owned_by": "graphrag"},
+        {"id": "duckduckgo-search:latest", "object": "model", "owned_by": "duckduckgo"},
+        {"id": "full-model:latest", "object": "model", "owned_by": "combined"},
+    ]
+    # Combine API models with hardcoded models
+    all_models = [{"id": model, "object": "model", "owned_by": "api"} for model in api_models] + hardcoded_models
+    return JSONResponse(content={"data": all_models})
+class PromptTuneRequest(BaseModel):
+    root: str = "./{ROOT_DIR}"
+    domain: Optional[str] = None
+    method: str = "random"
+    limit: int = 15
+    language: Optional[str] = None
+    max_tokens: int = 2000
+    chunk_size: int = 200
+    no_entity_types: bool = False
+    output: str = "./{ROOT_DIR}/prompts"
+class PromptTuneResponse(BaseModel):
+    status: str
+    message: str
+# Global variable to store the latest logs
+prompt_tune_logs = deque(maxlen=100)
+async def run_prompt_tuning(request: PromptTuneRequest):
+    cmd = ["python", "-m", "graphrag.prompt_tune"]
+    # Create a temporary directory for output
+    with tempfile.TemporaryDirectory() as temp_output:
+        # Expand environment variables in the root path
+        root_path = os.path.expandvars(request.root)
+        cmd.extend(["--root", root_path])
+        cmd.extend(["--method", request.method])
+        cmd.extend(["--limit", str(request.limit)])
+        if request.domain:
+            cmd.extend(["--domain", request.domain])
+        if request.language:
+            cmd.extend(["--language", request.language])
+        cmd.extend(["--max-tokens", str(request.max_tokens)])
+        cmd.extend(["--chunk-size", str(request.chunk_size)])
+        if request.no_entity_types:
+            cmd.append("--no-entity-types")
+        # Use the temporary directory for output
+        cmd.extend(["--output", temp_output])
+        logger.info(f"Executing prompt tuning command: {' '.join(cmd)}")
+        try:
+            process = await asyncio.create_subprocess_exec(
+                *cmd,
+                stdout=asyncio.subprocess.PIPE,
+                stderr=asyncio.subprocess.PIPE
+            )
+            async def read_stream(stream):
+                while True:
+                    line = await stream.readline()
+                    if not line:
+                        break
+                    line = line.decode().strip()
+                    prompt_tune_logs.append(line)
+                    logger.info(line)
+            await asyncio.gather(
+                read_stream(process.stdout),
+                read_stream(process.stderr)
+            )
+            await process.wait()
+            if process.returncode == 0:
+                logger.info("Prompt tuning completed successfully")
+                # Replace the existing template files with the newly generated prompts
+                dest_dir = os.path.join(ROOT_DIR, "prompts")
+                for filename in os.listdir(temp_output):
+                    if filename.endswith(".txt"):
+                        source_file = os.path.join(temp_output, filename)
+                        dest_file = os.path.join(dest_dir, filename)
+                        shutil.move(source_file, dest_file)
+                        logger.info(f"Replaced {filename} in {dest_file}")
+                return PromptTuneResponse(status="success", message="Prompt tuning completed successfully. Existing prompts have been replaced.")
+            else:
+                logger.error("Prompt tuning failed")
+                return PromptTuneResponse(status="error", message="Prompt tuning failed. Check logs for details.")
+        except Exception as e:
+            logger.error(f"Prompt tuning failed: {str(e)}")
+            return PromptTuneResponse(status="error", message=f"Prompt tuning failed: {str(e)}")
+@app.post("/v1/prompt_tune")
+async def prompt_tune(request: PromptTuneRequest, background_tasks: BackgroundTasks):
+    background_tasks.add_task(run_prompt_tuning, request)
+    return {"status": "started", "message": "Prompt tuning process has been started in the background"}
+@app.get("/v1/prompt_tune_status")
+async def prompt_tune_status():
+    return {
+        "status": "running" if prompt_tune_logs else "idle",
+        "logs": list(prompt_tune_logs)
+    }
+class IndexingRequest(BaseModel):
+    llm_model: str
+    embed_model: str
+    llm_api_base: str
+    embed_api_base: str
+    root: str
+    verbose: bool = False
+    nocache: bool = False
+    resume: Optional[str] = None
+    reporter: str = "rich"
+    emit: List[str] = ["parquet"]
+    custom_args: Optional[str] = None
+    llm_params: Dict[str, Any] = Field(default_factory=dict)
+    embed_params: Dict[str, Any] = Field(default_factory=dict)
+# Global variable to store the latest indexing logs
+indexing_logs = deque(maxlen=100)
+async def run_indexing(request: IndexingRequest):
+    cmd = ["python", "-m", "graphrag.index"]
+    cmd.extend(["--root", request.root])
+    if request.verbose:
+        cmd.append("--verbose")
+    if request.nocache:
+        cmd.append("--nocache")
+    if request.resume:
+        cmd.extend(["--resume", request.resume])
+    cmd.extend(["--reporter", request.reporter])
+    cmd.extend(["--emit", ",".join(request.emit)])
+    # Set environment variables for LLM and embedding models
+    env: Dict[str, Any] = os.environ.copy()
+    env["GRAPHRAG_LLM_MODEL"] = request.llm_model
+    env["GRAPHRAG_EMBED_MODEL"] = request.embed_model
+    env["GRAPHRAG_LLM_API_BASE"] = LLM_API_BASE
+    env["GRAPHRAG_EMBED_API_BASE"] = EMBEDDINGS_API_BASE
+    # Set environment variables for LLM parameters
+    for key, value in request.llm_params.items():
+        env[f"GRAPHRAG_LLM_{key.upper()}"] = str(value)
+    # Set environment variables for embedding parameters
+    for key, value in request.embed_params.items():
+        env[f"GRAPHRAG_EMBED_{key.upper()}"] = str(value)
+    # Add custom CLI arguments
+    if request.custom_args:
+        cmd.extend(request.custom_args.split())
+    logger.info(f"Executing indexing command: {' '.join(cmd)}")
+    logger.info(f"Environment variables: {env}")
+    try:
+        process = await asyncio.create_subprocess_exec(
+            *cmd,
+            stdout=asyncio.subprocess.PIPE,
+            stderr=asyncio.subprocess.PIPE,
+            env=env
+        )
+        async def read_stream(stream):
+            while True:
+                line = await stream.readline()
+                if not line:
+                    break
+                line = line.decode().strip()
+                indexing_logs.append(line)
+                logger.info(line)
+        await asyncio.gather(
+            read_stream(process.stdout),
+            read_stream(process.stderr)
+        )
+        await process.wait()
+        if process.returncode == 0:
+            logger.info("Indexing completed successfully")
+            return {"status": "success", "message": "Indexing completed successfully"}
+        else:
+            logger.error("Indexing failed")
+            return {"status": "error", "message": "Indexing failed. Check logs for details."}
+    except Exception as e:
+        logger.error(f"Indexing failed: {str(e)}")
+        return {"status": "error", "message": f"Indexing failed: {str(e)}"}
+@app.post("/v1/index")
+async def start_indexing(request: IndexingRequest, background_tasks: BackgroundTasks):
+    background_tasks.add_task(run_indexing, request)
+    return {"status": "started", "message": "Indexing process has been started in the background"}
+@app.get("/v1/index_status")
+async def indexing_status():
+    return {
+        "status": "running" if indexing_logs else "idle",
+        "logs": list(indexing_logs)
+    }
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Launch the GraphRAG API server")
+    parser.add_argument("--host", type=str, default="127.0.0.1", help="Host to bind the server to")
+    parser.add_argument("--port", type=int, default=PORT, help="Port to bind the server to")
+    parser.add_argument("--reload", action="store_true", help="Enable auto-reload mode")
+    args = parser.parse_args()
+    import uvicorn
+    uvicorn.run(
+        "api:app",
+        host=args.host,
+        port=args.port,
+        reload=args.reload
+    )

app.py ADDED Viewed

	@@ -0,0 +1,1786 @@

+import gradio as gr
+from gradio.helpers import Progress
+import asyncio
+import subprocess
+import yaml
+import os
+import networkx as nx
+import plotly.graph_objects as go
+import numpy as np
+import plotly.io as pio
+import lancedb
+import random
+import io
+import shutil
+import logging
+import queue
+import threading
+import time
+from collections import deque
+import re
+import glob
+from datetime import datetime
+import json
+import requests
+import aiohttp
+from openai import OpenAI
+from openai import AsyncOpenAI
+import pyarrow.parquet as pq
+import pandas as pd
+import sys
+import colorsys
+from dotenv import load_dotenv, set_key
+import argparse
+import socket
+import tiktoken
+from graphrag.query.context_builder.entity_extraction import EntityVectorStoreKey
+from graphrag.query.indexer_adapters import (
+    read_indexer_covariates,
+    read_indexer_entities,
+    read_indexer_relationships,
+    read_indexer_reports,
+    read_indexer_text_units,
+)
+from graphrag.llm.openai import create_openai_chat_llm
+from graphrag.llm.openai.factories import create_openai_embedding_llm
+from graphrag.query.input.loaders.dfs import store_entity_semantic_embeddings
+from graphrag.query.llm.oai.chat_openai import ChatOpenAI
+from graphrag.llm.openai.openai_configuration import OpenAIConfiguration
+from graphrag.llm.openai.openai_embeddings_llm import OpenAIEmbeddingsLLM
+from graphrag.query.llm.oai.typing import OpenaiApiType
+from graphrag.query.structured_search.local_search.mixed_context import LocalSearchMixedContext
+from graphrag.query.structured_search.local_search.search import LocalSearch
+from graphrag.query.structured_search.global_search.community_context import GlobalCommunityContext
+from graphrag.query.structured_search.global_search.search import GlobalSearch
+from graphrag.vector_stores.lancedb import LanceDBVectorStore
+import textwrap
+# Suppress warnings
+import warnings
+warnings.filterwarnings("ignore", category=UserWarning, module="gradio_client.documentation")
+load_dotenv('indexing/.env')
+# Set default values for API-related environment variables
+os.environ.setdefault("LLM_API_BASE", os.getenv("LLM_API_BASE"))
+os.environ.setdefault("LLM_API_KEY", os.getenv("LLM_API_KEY"))
+os.environ.setdefault("LLM_MODEL", os.getenv("LLM_MODEL"))
+os.environ.setdefault("EMBEDDINGS_API_BASE", os.getenv("EMBEDDINGS_API_BASE"))
+os.environ.setdefault("EMBEDDINGS_API_KEY", os.getenv("EMBEDDINGS_API_KEY"))
+os.environ.setdefault("EMBEDDINGS_MODEL", os.getenv("EMBEDDINGS_MODEL"))
+# Add the project root to the Python path
+project_root = os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))
+sys.path.insert(0, project_root)
+# Set up logging
+log_queue = queue.Queue()
+logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
+llm = None
+text_embedder = None
+class QueueHandler(logging.Handler):
+    def __init__(self, log_queue):
+        super().__init__()
+        self.log_queue = log_queue
+    def emit(self, record):
+        self.log_queue.put(self.format(record))
+queue_handler = QueueHandler(log_queue)
+logging.getLogger().addHandler(queue_handler)
+def initialize_models():
+    global llm, text_embedder
+    llm_api_base = os.getenv("LLM_API_BASE")
+    llm_api_key = os.getenv("LLM_API_KEY")
+    embeddings_api_base = os.getenv("EMBEDDINGS_API_BASE")
+    embeddings_api_key = os.getenv("EMBEDDINGS_API_KEY")
+    llm_service_type = os.getenv("LLM_SERVICE_TYPE", "openai_chat").lower()  # Provide a default and lower it
+    embeddings_service_type = os.getenv("EMBEDDINGS_SERVICE_TYPE", "openai").lower()  # Provide a default and lower it
+    llm_model = os.getenv("LLM_MODEL")
+    embeddings_model = os.getenv("EMBEDDINGS_MODEL")
+    logging.info("Fetching models...")
+    models = fetch_models(llm_api_base, llm_api_key, llm_service_type)
+    # Use the same models list for both LLM and embeddings
+    llm_models = models
+    embeddings_models = models
+    # Initialize LLM
+    if llm_service_type == "openai_chat":
+        llm = ChatOpenAI(
+            api_key=llm_api_key,
+            api_base=f"{llm_api_base}/v1",
+            model=llm_model,
+            api_type=OpenaiApiType.OpenAI,
+            max_retries=20,
+        )
+    # Initialize OpenAI client for embeddings
+    openai_client = OpenAI(
+        api_key=embeddings_api_key or "dummy_key",
+        base_url=f"{embeddings_api_base}/v1"
+    )
+    # Initialize text embedder using OpenAIEmbeddingsLLM
+    text_embedder = OpenAIEmbeddingsLLM(
+        client=openai_client,
+        configuration={
+            "model": embeddings_model,
+            "api_type": "open_ai",
+            "api_base": embeddings_api_base,
+            "api_key": embeddings_api_key or None,
+            "provider": embeddings_service_type
+        }
+    )
+    return llm_models, embeddings_models, llm_service_type, embeddings_service_type, llm_api_base, embeddings_api_base, text_embedder
+def find_latest_output_folder():
+    root_dir = "./indexing/output"
+    folders = [f for f in os.listdir(root_dir) if os.path.isdir(os.path.join(root_dir, f))]
+    if not folders:
+        raise ValueError("No output folders found")
+    # Sort folders by creation time, most recent first
+    sorted_folders = sorted(folders, key=lambda x: os.path.getctime(os.path.join(root_dir, x)), reverse=True)
+    latest_folder = None
+    timestamp = None
+    for folder in sorted_folders:
+        try:
+            # Try to parse the folder name as a timestamp
+            timestamp = datetime.strptime(folder, "%Y%m%d-%H%M%S")
+            latest_folder = folder
+            break
+        except ValueError:
+            # If the folder name is not a valid timestamp, skip it
+            continue
+    if latest_folder is None:
+        raise ValueError("No valid timestamp folders found")
+    latest_path = os.path.join(root_dir, latest_folder)
+    artifacts_path = os.path.join(latest_path, "artifacts")
+    if not os.path.exists(artifacts_path):
+        raise ValueError(f"Artifacts folder not found in {latest_path}")
+    return latest_path, latest_folder
+def initialize_data():
+    global entity_df, relationship_df, text_unit_df, report_df, covariate_df
+    tables = {
+        "entity_df": "create_final_nodes",
+        "relationship_df": "create_final_edges",
+        "text_unit_df": "create_final_text_units",
+        "report_df": "create_final_reports",
+        "covariate_df": "create_final_covariates"
+    }
+    timestamp = None  # Initialize timestamp to None
+    try:
+        latest_output_folder, timestamp = find_latest_output_folder()
+        artifacts_folder = os.path.join(latest_output_folder, "artifacts")
+        for df_name, file_prefix in tables.items():
+            file_pattern = os.path.join(artifacts_folder, f"{file_prefix}*.parquet")
+            matching_files = glob.glob(file_pattern)
+            if matching_files:
+                latest_file = max(matching_files, key=os.path.getctime)
+                df = pd.read_parquet(latest_file)
+                globals()[df_name] = df
+                logging.info(f"Successfully loaded {df_name} from {latest_file}")
+            else:
+                logging.warning(f"No matching file found for {df_name} in {artifacts_folder}. Initializing as an empty DataFrame.")
+                globals()[df_name] = pd.DataFrame()
+    except Exception as e:
+        logging.error(f"Error initializing data: {str(e)}")
+        for df_name in tables.keys():
+            globals()[df_name] = pd.DataFrame()
+    return timestamp
+# Call initialize_data and store the timestamp
+current_timestamp = initialize_data()
+def find_available_port(start_port, max_attempts=100):
+    for port in range(start_port, start_port + max_attempts):
+        with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
+            try:
+                s.bind(('', port))
+                return port
+            except OSError:
+                continue
+    raise IOError("No free ports found")
+def start_api_server(port):
+    subprocess.Popen([sys.executable, "api_server.py", "--port", str(port)])
+def wait_for_api_server(port):
+    max_retries = 30
+    for _ in range(max_retries):
+        try:
+            response = requests.get(f"http://localhost:{port}")
+            if response.status_code == 200:
+                print(f"API server is up and running on port {port}")
+                return
+            else:
+                print(f"Unexpected response from API server: {response.status_code}")
+        except requests.ConnectionError:
+            time.sleep(1)
+    print("Failed to connect to API server")
+def load_settings():
+    try:
+        with open("indexing/settings.yaml", "r") as f:
+            return yaml.safe_load(f) or {}
+    except FileNotFoundError:
+        return {}
+def update_setting(key, value):
+    settings = load_settings()
+    try:
+        settings[key] = json.loads(value)
+    except json.JSONDecodeError:
+        settings[key] = value
+    try:
+        with open("indexing/settings.yaml", "w") as f:
+            yaml.dump(settings, f, default_flow_style=False)
+        return f"Setting '{key}' updated successfully"
+    except Exception as e:
+        return f"Error updating setting '{key}': {str(e)}"
+def create_setting_component(key, value):
+    with gr.Accordion(key, open=False):
+        if isinstance(value, (dict, list)):
+            value_str = json.dumps(value, indent=2)
+            lines = value_str.count('\n') + 1
+        else:
+            value_str = str(value)
+            lines = 1
+        text_area = gr.TextArea(value=value_str, label="Value", lines=lines, max_lines=20)
+        update_btn = gr.Button("Update", variant="primary")
+        status = gr.Textbox(label="Status", visible=False)
+        update_btn.click(
+            fn=update_setting,
+            inputs=[gr.Textbox(value=key, visible=False), text_area],
+            outputs=[status]
+        ).then(
+            fn=lambda: gr.update(visible=True),
+            outputs=[status]
+        )
+def get_openai_client():
+    return OpenAI(
+        base_url=os.getenv("LLM_API_BASE"),
+        api_key=os.getenv("LLM_API_KEY"),
+        llm_model = os.getenv("LLM_MODEL")
+    )
+async def chat_with_openai(messages, model, temperature, max_tokens, api_base):
+    client = AsyncOpenAI(
+        base_url=api_base,
+        api_key=os.getenv("LLM_API_KEY")
+    )
+    try:
+        response = await client.chat.completions.create(
+            model=model,
+            messages=messages,
+            temperature=temperature,
+            max_tokens=max_tokens
+        )
+        return response.choices[0].message.content
+    except Exception as e:
+        logging.error(f"Error in chat_with_openai: {str(e)}")
+        return f"An error occurred: {str(e)}"
+        return f"Error: {str(e)}"
+def chat_with_llm(query, history, system_message, temperature, max_tokens, model, api_base):
+    try:
+        messages = [{"role": "system", "content": system_message}]
+        for item in history:
+            if isinstance(item, tuple) and len(item) == 2:
+                human, ai = item
+                messages.append({"role": "user", "content": human})
+                messages.append({"role": "assistant", "content": ai})
+        messages.append({"role": "user", "content": query})
+        logging.info(f"Sending chat request to {api_base} with model {model}")
+        client = OpenAI(base_url=api_base, api_key=os.getenv("LLM_API_KEY", "dummy-key"))
+        response = client.chat.completions.create(
+            model=model,
+            messages=messages,
+            temperature=temperature,
+            max_tokens=max_tokens
+        )
+        return response.choices[0].message.content
+    except Exception as e:
+        logging.error(f"Error in chat_with_llm: {str(e)}")
+        logging.error(f"Attempted with model: {model}, api_base: {api_base}")
+        raise RuntimeError(f"Chat request failed: {str(e)}")
+def run_graphrag_query(cli_args):
+    try:
+        command = ' '.join(cli_args)
+        logging.info(f"Executing command: {command}")
+        result = subprocess.run(cli_args, capture_output=True, text=True, check=True)
+        return result.stdout.strip()
+    except subprocess.CalledProcessError as e:
+        logging.error(f"Error running GraphRAG query: {e}")
+        logging.error(f"Command output (stdout): {e.stdout}")
+        logging.error(f"Command output (stderr): {e.stderr}")
+        raise RuntimeError(f"GraphRAG query failed: {e.stderr}")
+def parse_query_response(response: str):
+    try:
+        # Split the response into metadata and content
+        parts = response.split("\n\n", 1)
+        if len(parts) < 2:
+            return response  # Return original response if it doesn't contain metadata
+        metadata_str, content = parts
+        metadata = json.loads(metadata_str)
+        # Extract relevant information from metadata
+        query_type = metadata.get("query_type", "Unknown")
+        execution_time = metadata.get("execution_time", "N/A")
+        tokens_used = metadata.get("tokens_used", "N/A")
+        # Remove unwanted lines from the content
+        content_lines = content.split('\n')
+        filtered_content = '\n'.join([line for line in content_lines if not line.startswith("INFO:") and not line.startswith("creating llm client")])
+        # Format the parsed response
+        parsed_response = f"""
+Query Type: {query_type}
+Execution Time: {execution_time} seconds
+Tokens Used: {tokens_used}
+{filtered_content.strip()}
+"""
+        return parsed_response
+    except Exception as e:
+        print(f"Error parsing query response: {str(e)}")
+        return response
+def send_message(query_type, query, history, system_message, temperature, max_tokens, preset, community_level, response_type, custom_cli_args, selected_folder):
+    try:
+        if query_type in ["global", "local"]:
+            cli_args = construct_cli_args(query_type, preset, community_level, response_type, custom_cli_args, query, selected_folder)
+            logging.info(f"Executing {query_type} search with command: {' '.join(cli_args)}")
+            result = run_graphrag_query(cli_args)
+            parsed_result = parse_query_response(result)
+            logging.info(f"Parsed query result: {parsed_result}")
+        else:  # Direct chat
+            llm_model = os.getenv("LLM_MODEL")
+            api_base = os.getenv("LLM_API_BASE")
+            logging.info(f"Executing direct chat with model: {llm_model}")
+            try:
+                result = chat_with_llm(query, history, system_message, temperature, max_tokens, llm_model, api_base)
+                parsed_result = result  # No parsing needed for direct chat
+                logging.info(f"Direct chat result: {parsed_result[:100]}...")  # Log first 100 chars of result
+            except Exception as chat_error:
+                logging.error(f"Error in chat_with_llm: {str(chat_error)}")
+                raise RuntimeError(f"Direct chat failed: {str(chat_error)}")
+        history.append((query, parsed_result))
+    except Exception as e:
+        error_message = f"An error occurred: {str(e)}"
+        logging.error(error_message)
+        logging.exception("Exception details:")
+        history.append((query, error_message))
+    return history, gr.update(value=""), update_logs()
+def construct_cli_args(query_type, preset, community_level, response_type, custom_cli_args, query, selected_folder):
+    if not selected_folder:
+        raise ValueError("No folder selected. Please select an output folder before querying.")
+    artifacts_folder = os.path.join("./indexing/output", selected_folder, "artifacts")
+    if not os.path.exists(artifacts_folder):
+        raise ValueError(f"Artifacts folder not found in {artifacts_folder}")
+    base_args = [
+        "python", "-m", "graphrag.query",
+        "--data", artifacts_folder,
+        "--method", query_type,
+    ]
+    # Apply preset configurations
+    if preset.startswith("Default"):
+        base_args.extend(["--community_level", "2", "--response_type", "Multiple Paragraphs"])
+    elif preset.startswith("Detailed"):
+        base_args.extend(["--community_level", "4", "--response_type", "Multi-Page Report"])
+    elif preset.startswith("Quick"):
+        base_args.extend(["--community_level", "1", "--response_type", "Single Paragraph"])
+    elif preset.startswith("Bullet"):
+        base_args.extend(["--community_level", "2", "--response_type", "List of 3-7 Points"])
+    elif preset.startswith("Comprehensive"):
+        base_args.extend(["--community_level", "5", "--response_type", "Multi-Page Report"])
+    elif preset.startswith("High-Level"):
+        base_args.extend(["--community_level", "1", "--response_type", "Single Page"])
+    elif preset.startswith("Focused"):
+        base_args.extend(["--community_level", "3", "--response_type", "Multiple Paragraphs"])
+    elif preset == "Custom Query":
+        base_args.extend([
+            "--community_level", str(community_level),
+            "--response_type", f'"{response_type}"',
+        ])
+        if custom_cli_args:
+            base_args.extend(custom_cli_args.split())
+    # Add the query at the end
+    base_args.append(query)
+    return base_args
+def upload_file(file):
+    if file is not None:
+        input_dir = os.path.join("indexing", "input")
+        os.makedirs(input_dir, exist_ok=True)
+        # Get the original filename from the uploaded file
+        original_filename = file.name
+        # Create the destination path
+        destination_path = os.path.join(input_dir, os.path.basename(original_filename))
+        # Move the uploaded file to the destination path
+        shutil.move(file.name, destination_path)
+        logging.info(f"File uploaded and moved to: {destination_path}")
+        status = f"File uploaded: {os.path.basename(original_filename)}"
+    else:
+        status = "No file uploaded"
+    # Get the updated file list
+    updated_file_list = [f["path"] for f in list_input_files()]
+    return status, gr.update(choices=updated_file_list), update_logs()
+def list_input_files():
+    input_dir = os.path.join("indexing", "input")
+    files = []
+    if os.path.exists(input_dir):
+        files = os.listdir(input_dir)
+    return [{"name": f, "path": os.path.join(input_dir, f)} for f in files]
+def delete_file(file_path):
+    try:
+        os.remove(file_path)
+        logging.info(f"File deleted: {file_path}")
+        status = f"File deleted: {os.path.basename(file_path)}"
+    except Exception as e:
+        logging.error(f"Error deleting file: {str(e)}")
+        status = f"Error deleting file: {str(e)}"
+    # Get the updated file list
+    updated_file_list = [f["path"] for f in list_input_files()]
+    return status, gr.update(choices=updated_file_list), update_logs()
+def read_file_content(file_path):
+    try:
+        if file_path.endswith('.parquet'):
+            df = pd.read_parquet(file_path)
+            # Get basic information about the DataFrame
+            info = f"Parquet File: {os.path.basename(file_path)}\n"
+            info += f"Rows: {len(df)}, Columns: {len(df.columns)}\n\n"
+            info += "Column Names:\n" + "\n".join(df.columns) + "\n\n"
+            # Display first few rows
+            info += "First 5 rows:\n"
+            info += df.head().to_string() + "\n\n"
+            # Display basic statistics
+            info += "Basic Statistics:\n"
+            info += df.describe().to_string()
+            return info
+        else:
+            with open(file_path, 'r', encoding='utf-8', errors='replace') as file:
+                content = file.read()
+        return content
+    except Exception as e:
+        logging.error(f"Error reading file: {str(e)}")
+        return f"Error reading file: {str(e)}"
+def save_file_content(file_path, content):
+    try:
+        with open(file_path, 'w') as file:
+            file.write(content)
+        logging.info(f"File saved: {file_path}")
+        status = f"File saved: {os.path.basename(file_path)}"
+    except Exception as e:
+        logging.error(f"Error saving file: {str(e)}")
+        status = f"Error saving file: {str(e)}"
+    return status, update_logs()
+def manage_data():
+    db = lancedb.connect("./indexing/lancedb")
+    tables = db.table_names()
+    table_info = ""
+    if tables:
+        table = db[tables[0]]
+        table_info = f"Table: {tables[0]}\nSchema: {table.schema}"
+    input_files = list_input_files()
+    return {
+        "database_info": f"Tables: {', '.join(tables)}\n\n{table_info}",
+        "input_files": input_files
+    }
+def find_latest_graph_file(root_dir):
+    pattern = os.path.join(root_dir, "output", "*", "artifacts", "*.graphml")
+    graph_files = glob.glob(pattern)
+    if not graph_files:
+        # If no files found, try excluding .DS_Store
+        output_dir = os.path.join(root_dir, "output")
+        run_dirs = [d for d in os.listdir(output_dir) if os.path.isdir(os.path.join(output_dir, d)) and d != ".DS_Store"]
+        if run_dirs:
+            latest_run = max(run_dirs)
+            pattern = os.path.join(root_dir, "output", latest_run, "artifacts", "*.graphml")
+            graph_files = glob.glob(pattern)
+    if not graph_files:
+        return None
+    # Sort files by modification time, most recent first
+    latest_file = max(graph_files, key=os.path.getmtime)
+    return latest_file
+def update_visualization(folder_name, file_name, layout_type, node_size, edge_width, node_color_attribute, color_scheme, show_labels, label_size):
+    root_dir = "./indexing"
+    if not folder_name or not file_name:
+        return None, "Please select a folder and a GraphML file."
+    file_name = file_name.split("] ")[1] if "]" in file_name else file_name  # Remove file type prefix
+    graph_path = os.path.join(root_dir, "output", folder_name, "artifacts", file_name)
+    if not graph_path.endswith('.graphml'):
+        return None, "Please select a GraphML file for visualization."
+    try:
+        # Load the GraphML file
+        graph = nx.read_graphml(graph_path)
+        # Create layout based on user selection
+        if layout_type == "3D Spring":
+            pos = nx.spring_layout(graph, dim=3, seed=42, k=0.5)
+        elif layout_type == "2D Spring":
+            pos = nx.spring_layout(graph, dim=2, seed=42, k=0.5)
+        else:  # Circular
+            pos = nx.circular_layout(graph)
+        # Extract node positions
+        if layout_type == "3D Spring":
+            x_nodes = [pos[node][0] for node in graph.nodes()]
+            y_nodes = [pos[node][1] for node in graph.nodes()]
+            z_nodes = [pos[node][2] for node in graph.nodes()]
+        else:
+            x_nodes = [pos[node][0] for node in graph.nodes()]
+            y_nodes = [pos[node][1] for node in graph.nodes()]
+            z_nodes = [0] * len(graph.nodes())  # Set all z-coordinates to 0 for 2D layouts
+        # Extract edge positions
+        x_edges, y_edges, z_edges = [], [], []
+        for edge in graph.edges():
+            x_edges.extend([pos[edge[0]][0], pos[edge[1]][0], None])
+            y_edges.extend([pos[edge[0]][1], pos[edge[1]][1], None])
+            if layout_type == "3D Spring":
+                z_edges.extend([pos[edge[0]][2], pos[edge[1]][2], None])
+            else:
+                z_edges.extend([0, 0, None])
+        # Generate node colors based on user selection
+        if node_color_attribute == "Degree":
+            node_colors = [graph.degree(node) for node in graph.nodes()]
+        else:  # Random
+            node_colors = [random.random() for _ in graph.nodes()]
+        node_colors = np.array(node_colors)
+        node_colors = (node_colors - node_colors.min()) / (node_colors.max() - node_colors.min())
+        # Create the trace for edges
+        edge_trace = go.Scatter3d(
+            x=x_edges, y=y_edges, z=z_edges,
+            mode='lines',
+            line=dict(color='lightgray', width=edge_width),
+            hoverinfo='none'
+        )
+        # Create the trace for nodes
+        node_trace = go.Scatter3d(
+            x=x_nodes, y=y_nodes, z=z_nodes,
+            mode='markers+text' if show_labels else 'markers',
+            marker=dict(
+                size=node_size,
+                color=node_colors,
+                colorscale=color_scheme,
+                colorbar=dict(
+                    title='Node Degree' if node_color_attribute == "Degree" else "Random Value",
+                    thickness=10,
+                    x=1.1,
+                    tickvals=[0, 1],
+                    ticktext=['Low', 'High']
+                ),
+                line=dict(width=1)
+            ),
+            text=[node for node in graph.nodes()],
+            textposition="top center",
+            textfont=dict(size=label_size, color='black'),
+            hoverinfo='text'
+        )
+        # Create the plot
+        fig = go.Figure(data=[edge_trace, node_trace])
+        # Update layout for better visualization
+        fig.update_layout(
+            title=f'{layout_type} Graph Visualization: {os.path.basename(graph_path)}',
+            showlegend=False,
+            scene=dict(
+                xaxis=dict(showbackground=False, showticklabels=False, title=''),
+                yaxis=dict(showbackground=False, showticklabels=False, title=''),
+                zaxis=dict(showbackground=False, showticklabels=False, title='')
+            ),
+            margin=dict(l=0, r=0, b=0, t=40),
+            annotations=[
+                dict(
+                    showarrow=False,
+                    text=f"Interactive {layout_type} visualization of GraphML data",
+                    xref="paper",
+                    yref="paper",
+                    x=0,
+                    y=0
+                )
+            ],
+            autosize=True
+        )
+        fig.update_layout(autosize=True)
+        fig.update_layout(height=600)  # Set a fixed height
+        return fig, f"Graph visualization generated successfully. Using file: {graph_path}"
+    except Exception as e:
+        return go.Figure(), f"Error visualizing graph: {str(e)}"
+def update_logs():
+    logs = []
+    while not log_queue.empty():
+        logs.append(log_queue.get())
+    return "\n".join(logs)
+def fetch_models(base_url, api_key, service_type):
+    try:
+        if service_type.lower() == "ollama":
+            response = requests.get(f"{base_url}/tags", timeout=10)
+        else:  # OpenAI Compatible
+            headers = {
+                "Authorization": f"Bearer {api_key}",
+                "Content-Type": "application/json"
+            }
+            response = requests.get(f"{base_url}/models", headers=headers, timeout=10)
+        logging.info(f"Raw API response: {response.text}")
+        if response.status_code == 200:
+            data = response.json()
+            if service_type.lower() == "ollama":
+                models = [model.get('name', '') for model in data.get('models', data) if isinstance(model, dict)]
+            else:  # OpenAI Compatible
+                models = [model.get('id', '') for model in data.get('data', []) if isinstance(model, dict)]
+            models = [model for model in models if model]  # Remove empty strings
+            if not models:
+                logging.warning(f"No models found in {service_type} API response")
+                return ["No models available"]
+            logging.info(f"Successfully fetched {service_type} models: {models}")
+            return models
+        else:
+            logging.error(f"Error fetching {service_type} models. Status code: {response.status_code}, Response: {response.text}")
+            return ["Error fetching models"]
+    except requests.RequestException as e:
+        logging.error(f"Exception while fetching {service_type} models: {str(e)}")
+        return ["Error: Connection failed"]
+    except Exception as e:
+        logging.error(f"Unexpected error in fetch_models: {str(e)}")
+        return ["Error: Unexpected issue"]
+def update_model_choices(base_url, api_key, service_type, settings_key):
+    models = fetch_models(base_url, api_key, service_type)
+    if not models:
+        logging.warning(f"No models fetched for {service_type}.")
+    # Get the current model from settings
+    current_model = settings.get(settings_key, {}).get('llm', {}).get('model')
+    # If the current model is not in the list, add it
+    if current_model and current_model not in models:
+        models.append(current_model)
+    return gr.update(choices=models, value=current_model if current_model in models else (models[0] if models else None))
+def update_llm_model_choices(base_url, api_key, service_type):
+    return update_model_choices(base_url, api_key, service_type, 'llm')
+def update_embeddings_model_choices(base_url, api_key, service_type):
+    return update_model_choices(base_url, api_key, service_type, 'embeddings')
+def update_llm_settings(llm_model, embeddings_model, context_window, system_message, temperature, max_tokens,
+                        llm_api_base, llm_api_key,
+                        embeddings_api_base, embeddings_api_key, embeddings_service_type):
+    try:
+        # Update settings.yaml
+        settings = load_settings()
+        settings['llm'].update({
+            "type": "openai",  # Always set to "openai" since we removed the radio button
+            "model": llm_model,
+            "api_base": llm_api_base,
+            "api_key": "${GRAPHRAG_API_KEY}",
+            "temperature": temperature,
+            "max_tokens": max_tokens,
+            "provider": "openai_chat"  # Always set to "openai_chat"
+        })
+        settings['embeddings']['llm'].update({
+            "type": "openai_embedding",  # Always use OpenAIEmbeddingsLLM
+            "model": embeddings_model,
+            "api_base": embeddings_api_base,
+            "api_key": "${GRAPHRAG_API_KEY}",
+            "provider": embeddings_service_type
+        })
+        with open("indexing/settings.yaml", 'w') as f:
+            yaml.dump(settings, f, default_flow_style=False)
+        # Update .env file
+        update_env_file("LLM_API_BASE", llm_api_base)
+        update_env_file("LLM_API_KEY", llm_api_key)
+        update_env_file("LLM_MODEL", llm_model)
+        update_env_file("EMBEDDINGS_API_BASE", embeddings_api_base)
+        update_env_file("EMBEDDINGS_API_KEY", embeddings_api_key)
+        update_env_file("EMBEDDINGS_MODEL", embeddings_model)
+        update_env_file("CONTEXT_WINDOW", str(context_window))
+        update_env_file("SYSTEM_MESSAGE", system_message)
+        update_env_file("TEMPERATURE", str(temperature))
+        update_env_file("MAX_TOKENS", str(max_tokens))
+        update_env_file("LLM_SERVICE_TYPE", "openai_chat")
+        update_env_file("EMBEDDINGS_SERVICE_TYPE", embeddings_service_type)
+        # Reload environment variables
+        load_dotenv(override=True)
+        return "LLM and embeddings settings updated successfully in both settings.yaml and .env files."
+    except Exception as e:
+        return f"Error updating LLM and embeddings settings: {str(e)}"
+def update_env_file(key, value):
+    env_path = 'indexing/.env'
+    with open(env_path, 'r') as file:
+        lines = file.readlines()
+    updated = False
+    for i, line in enumerate(lines):
+        if line.startswith(f"{key}="):
+            lines[i] = f"{key}={value}\n"
+            updated = True
+            break
+    if not updated:
+        lines.append(f"{key}={value}\n")
+    with open(env_path, 'w') as file:
+        file.writelines(lines)
+custom_css = """
+html, body {
+    margin: 0;
+    padding: 0;
+    height: 100vh;
+    overflow: hidden;
+}
+.gradio-container {
+    margin: 0 !important;
+    padding: 0 !important;
+    width: 100vw !important;
+    max-width: 100vw !important;
+    height: 100vh !important;
+    max-height: 100vh !important;
+    overflow: auto;
+    display: flex;
+    flex-direction: column;
+}
+#main-container {
+    flex: 1;
+    display: flex;
+    overflow: hidden;
+}
+#left-column, #right-column {
+    height: 100%;
+    overflow-y: auto;
+    padding: 10px;
+}
+#left-column {
+    flex: 1;
+}
+#right-column {
+    flex: 2;
+    display: flex;
+    flex-direction: column;
+}
+#chat-container {
+    flex: 0 0 auto;  /* Don't allow this to grow */
+    height: 100%;
+    display: flex;
+    flex-direction: column;
+    overflow: hidden;
+    border: 1px solid var(--color-accent);
+    border-radius: 8px;
+    padding: 10px;
+    overflow-y: auto;
+}
+#chatbot {
+    overflow-y: hidden;
+    height: 100%;
+}
+#chat-input-row {
+    margin-top: 10px;
+}
+#visualization-plot {
+    width: 100%;
+    aspect-ratio: 1 / 1;
+    max-height: 600px;  /* Adjust this value as needed */
+}
+#vis-controls-row {
+    display: flex;
+    justify-content: space-between;
+    align-items: center;
+    margin-top: 10px;
+}
+#vis-controls-row > * {
+    flex: 1;
+    margin: 0 5px;
+}
+#vis-status {
+    margin-top: 10px;
+}
+/* Chat input styling */
+#chat-input-row {
+    display: flex;
+    flex-direction: column;
+}
+#chat-input-row > div {
+    width: 100% !important;
+}
+#chat-input-row input[type="text"] {
+    width: 100% !important;
+}
+/* Adjust padding for all containers */
+.gr-box, .gr-form, .gr-panel {
+    padding: 10px !important;
+}
+/* Ensure all textboxes and textareas have full height */
+.gr-textbox, .gr-textarea {
+    height: auto !important;
+    min-height: 100px !important;
+}
+/* Ensure all dropdowns have full width */
+.gr-dropdown {
+    width: 100% !important;
+}
+:root {
+    --color-background: #2C3639;
+    --color-foreground: #3F4E4F;
+    --color-accent: #A27B5C;
+    --color-text: #DCD7C9;
+}
+body, .gradio-container {
+    background-color: var(--color-background);
+    color: var(--color-text);
+}
+.gr-button {
+    background-color: var(--color-accent);
+    color: var(--color-text);
+}
+.gr-input, .gr-textarea, .gr-dropdown {
+    background-color: var(--color-foreground);
+    color: var(--color-text);
+    border: 1px solid var(--color-accent);
+}
+.gr-panel {
+    background-color: var(--color-foreground);
+    border: 1px solid var(--color-accent);
+}
+.gr-box {
+    border-radius: 8px;
+    margin-bottom: 10px;
+    background-color: var(--color-foreground);
+}
+.gr-padded {
+    padding: 10px;
+}
+.gr-form {
+    background-color: var(--color-foreground);
+}
+.gr-input-label, .gr-radio-label {
+    color: var(--color-text);
+}
+.gr-checkbox-label {
+    color: var(--color-text);
+}
+.gr-markdown {
+    color: var(--color-text);
+}
+.gr-accordion {
+    background-color: var(--color-foreground);
+    border: 1px solid var(--color-accent);
+}
+.gr-accordion-header {
+    background-color: var(--color-accent);
+    color: var(--color-text);
+}
+#visualization-container {
+    display: flex;
+    flex-direction: column;
+    border: 2px solid var(--color-accent);
+    border-radius: 8px;
+    margin-top: 20px;
+    padding: 10px;
+    background-color: var(--color-foreground);
+    height: calc(100vh - 300px);  /* Adjust this value as needed */
+}
+#visualization-plot {
+    width: 100%;
+    height: 100%;
+}
+#vis-controls-row {
+    display: flex;
+    justify-content: space-between;
+    align-items: center;
+    margin-top: 10px;
+}
+#vis-controls-row > * {
+    flex: 1;
+    margin: 0 5px;
+}
+#vis-status {
+    margin-top: 10px;
+}
+#log-container {
+    background-color: var(--color-foreground);
+    border: 1px solid var(--color-accent);
+    border-radius: 8px;
+    padding: 10px;
+    margin-top: 20px;
+    max-height: auto;
+    overflow-y: auto;
+}
+.setting-accordion .label-wrap {
+    cursor: pointer;
+}
+.setting-accordion .icon {
+    transition: transform 0.3s ease;
+}
+.setting-accordion[open] .icon {
+    transform: rotate(90deg);
+}
+.gr-form.gr-box {
+    border: none !important;
+    background: none !important;
+}
+.model-params {
+    border-top: 1px solid var(--color-accent);
+    margin-top: 10px;
+    padding-top: 10px;
+}
+"""
+def list_output_files(root_dir):
+    output_dir = os.path.join(root_dir, "output")
+    files = []
+    for root, _, filenames in os.walk(output_dir):
+        for filename in filenames:
+            files.append(os.path.join(root, filename))
+    return files
+def update_file_list():
+    files = list_input_files()
+    return gr.update(choices=[f["path"] for f in files])
+def update_file_content(file_path):
+    if not file_path:
+        return ""
+    try:
+        with open(file_path, 'r', encoding='utf-8') as file:
+            content = file.read()
+        return content
+    except Exception as e:
+        logging.error(f"Error reading file: {str(e)}")
+        return f"Error reading file: {str(e)}"
+def list_output_folders(root_dir):
+    output_dir = os.path.join(root_dir, "output")
+    folders = [f for f in os.listdir(output_dir) if os.path.isdir(os.path.join(output_dir, f))]
+    return sorted(folders, reverse=True)
+def list_folder_contents(folder_path):
+    contents = []
+    for item in os.listdir(folder_path):
+        item_path = os.path.join(folder_path, item)
+        if os.path.isdir(item_path):
+            contents.append(f"[DIR] {item}")
+        else:
+            _, ext = os.path.splitext(item)
+            contents.append(f"[{ext[1:].upper()}] {item}")
+    return contents
+def update_output_folder_list():
+    root_dir = "./"
+    folders = list_output_folders(root_dir)
+    return gr.update(choices=folders, value=folders[0] if folders else None)
+def update_folder_content_list(folder_name):
+    root_dir = "./"
+    if not folder_name:
+        return gr.update(choices=[])
+    contents = list_folder_contents(os.path.join(root_dir, "output", folder_name, "artifacts"))
+    return gr.update(choices=contents)
+def handle_content_selection(folder_name, selected_item):
+    root_dir = "./"
+    if isinstance(selected_item, list) and selected_item:
+        selected_item = selected_item[0]  # Take the first item if it's a list
+    if isinstance(selected_item, str) and selected_item.startswith("[DIR]"):
+        dir_name = selected_item[6:]  # Remove "[DIR] " prefix
+        sub_contents = list_folder_contents(os.path.join(root_dir, "output", folder_name, dir_name))
+        return gr.update(choices=sub_contents), "", ""
+    elif isinstance(selected_item, str):
+        file_name = selected_item.split("] ")[1] if "]" in selected_item else selected_item  # Remove file type prefix if present
+        file_path = os.path.join(root_dir, "output", folder_name, "artifacts", file_name)
+        file_size = os.path.getsize(file_path)
+        file_type = os.path.splitext(file_name)[1]
+        file_info = f"File: {file_name}\nSize: {file_size} bytes\nType: {file_type}"
+        content = read_file_content(file_path)
+        return gr.update(), file_info, content
+    else:
+        return gr.update(), "", ""
+def initialize_selected_folder(folder_name):
+    root_dir = "./"
+    if not folder_name:
+        return "Please select a folder first.", gr.update(choices=[])
+    folder_path = os.path.join(root_dir, "output", folder_name, "artifacts")
+    if not os.path.exists(folder_path):
+        return f"Artifacts folder not found in '{folder_name}'.", gr.update(choices=[])
+    contents = list_folder_contents(folder_path)
+    return f"Folder '{folder_name}/artifacts' initialized with {len(contents)} items.", gr.update(choices=contents)
+settings = load_settings()
+default_model = settings['llm']['model']
+cli_args = gr.State({})
+stop_indexing = threading.Event()
+indexing_thread = None
+def start_indexing(*args):
+    global indexing_thread, stop_indexing
+    stop_indexing = threading.Event()  # Reset the stop_indexing event
+    indexing_thread = threading.Thread(target=run_indexing, args=args)
+    indexing_thread.start()
+    return gr.update(interactive=False), gr.update(interactive=True), gr.update(interactive=False)
+def stop_indexing_process():
+    global indexing_thread
+    logging.info("Stop indexing requested")
+    stop_indexing.set()
+    if indexing_thread and indexing_thread.is_alive():
+        logging.info("Waiting for indexing thread to finish")
+        indexing_thread.join(timeout=10)
+        logging.info("Indexing thread finished" if not indexing_thread.is_alive() else "Indexing thread did not finish within timeout")
+    indexing_thread = None  # Reset the thread
+    return gr.update(interactive=True), gr.update(interactive=False), gr.update(interactive=True)
+def refresh_indexing():
+    global indexing_thread, stop_indexing
+    if indexing_thread and indexing_thread.is_alive():
+        logging.info("Cannot refresh: Indexing is still running")
+        return gr.update(interactive=False), gr.update(interactive=True), gr.update(interactive=False), "Cannot refresh: Indexing is still running"
+    else:
+        stop_indexing = threading.Event()  # Reset the stop_indexing event
+        indexing_thread = None  # Reset the thread
+        return gr.update(interactive=True), gr.update(interactive=False), gr.update(interactive=True), "Indexing process refreshed. You can start indexing again."
+def run_indexing(root_dir, config_file, verbose, nocache, resume, reporter, emit_formats, custom_args):
+    cmd = ["python", "-m", "graphrag.index", "--root", "./indexing"]
+    # Add custom CLI arguments
+    if custom_args:
+        cmd.extend(custom_args.split())
+    logging.info(f"Executing command: {' '.join(cmd)}")
+    process = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True, bufsize=1, encoding='utf-8', universal_newlines=True)
+    output = []
+    progress_value = 0
+    iterations_completed = 0
+    while True:
+        if stop_indexing.is_set():
+            process.terminate()
+            process.wait(timeout=5)
+            if process.poll() is None:
+                process.kill()
+            return ("\n".join(output + ["Indexing stopped by user."]),
+                    "Indexing stopped.",
+                    100,
+                    gr.update(interactive=True),
+                    gr.update(interactive=False),
+                    gr.update(interactive=True),
+                    str(iterations_completed))
+        try:
+            line = process.stdout.readline()
+            if not line and process.poll() is not None:
+                break
+            if line:
+                line = line.strip()
+                output.append(line)
+                if "Processing file" in line:
+                    progress_value += 1
+                    iterations_completed += 1
+                elif "Indexing completed" in line:
+                    progress_value = 100
+                elif "ERROR" in line:
+                    line = f"🚨 ERROR: {line}"
+                yield ("\n".join(output),
+                       line,
+                       progress_value,
+                       gr.update(interactive=False),
+                       gr.update(interactive=True),
+                       gr.update(interactive=False),
+                       str(iterations_completed))
+        except Exception as e:
+            logging.error(f"Error during indexing: {str(e)}")
+            return ("\n".join(output + [f"Error: {str(e)}"]),
+                    "Error occurred during indexing.",
+                    100,
+                    gr.update(interactive=True),
+                    gr.update(interactive=False),
+                    gr.update(interactive=True),
+                    str(iterations_completed))
+    if process.returncode != 0 and not stop_indexing.is_set():
+        final_output = "\n".join(output + [f"Error: Process exited with return code {process.returncode}"])
+        final_progress = "Indexing failed. Check output for details."
+    else:
+        final_output = "\n".join(output)
+        final_progress = "Indexing completed successfully!"
+    return (final_output,
+            final_progress,
+            100,
+            gr.update(interactive=True),
+            gr.update(interactive=False),
+            gr.update(interactive=True),
+            str(iterations_completed))
+global_vector_store_wrapper = None
+def create_gradio_interface():
+    global global_vector_store_wrapper
+    llm_models, embeddings_models, llm_service_type, embeddings_service_type, llm_api_base, embeddings_api_base, text_embedder = initialize_models()
+    settings = load_settings()
+    log_output = gr.TextArea(label="Logs", elem_id="log-output", interactive=False, visible=False)
+    with gr.Blocks(css=custom_css, theme=gr.themes.Base()) as demo:
+        gr.Markdown("# GraphRAG Local UI", elem_id="title")
+        with gr.Row(elem_id="main-container"):
+            with gr.Column(scale=1, elem_id="left-column"):
+                with gr.Tabs():
+                    with gr.TabItem("Data Management"):
+                        with gr.Accordion("File Upload (.txt)", open=True):
+                            file_upload = gr.File(label="Upload .txt File", file_types=[".txt"])
+                            upload_btn = gr.Button("Upload File", variant="primary")
+                            upload_output = gr.Textbox(label="Upload Status", visible=False)
+                        with gr.Accordion("File Management", open=True):
+                            file_list = gr.Dropdown(label="Select File", choices=[], interactive=True)
+                            refresh_btn = gr.Button("Refresh File List", variant="secondary")
+                            file_content = gr.TextArea(label="File Content", lines=10)
+                            with gr.Row():
+                                delete_btn = gr.Button("Delete Selected File", variant="stop")
+                                save_btn = gr.Button("Save Changes", variant="primary")
+                            operation_status = gr.Textbox(label="Operation Status", visible=False)
+                    with gr.TabItem("Indexing"):
+                        root_dir = gr.Textbox(label="Root Directory", value="./")
+                        config_file = gr.File(label="Config File (optional)")
+                        with gr.Row():
+                            verbose = gr.Checkbox(label="Verbose", value=True)
+                            nocache = gr.Checkbox(label="No Cache", value=True)
+                        with gr.Row():
+                            resume = gr.Textbox(label="Resume Timestamp (optional)")
+                            reporter = gr.Dropdown(label="Reporter", choices=["rich", "print", "none"], value=None)
+                        with gr.Row():
+                            emit_formats = gr.CheckboxGroup(label="Emit Formats", choices=["json", "csv", "parquet"], value=None)
+                        with gr.Row():
+                            run_index_button = gr.Button("Run Indexing")
+                            stop_index_button = gr.Button("Stop Indexing", variant="stop")
+                            refresh_index_button = gr.Button("Refresh Indexing", variant="secondary")
+                        with gr.Accordion("Custom CLI Arguments", open=True):
+                            custom_cli_args = gr.Textbox(
+                                label="Custom CLI Arguments",
+                                placeholder="--arg1 value1 --arg2 value2",
+                                lines=3
+                            )
+                            cli_guide = gr.Markdown(
+                                textwrap.dedent("""
+                                ### CLI Argument Key Guide:
+                                - `--root <path>`: Set the root directory for the project
+                                - `--config <path>`: Specify a custom configuration file
+                                - `--verbose`: Enable verbose output
+                                - `--nocache`: Disable caching
+                                - `--resume <timestamp>`: Resume from a specific timestamp
+                                - `--reporter <type>`: Set the reporter type (rich, print, none)
+                                - `--emit <formats>`: Specify output formats (json, csv, parquet)
+                                Example: `--verbose --nocache --emit json,csv`
+                                """)
+                            )
+                        index_output = gr.Textbox(label="Indexing Output", lines=20, max_lines=30)
+                        index_progress = gr.Textbox(label="Indexing Progress", lines=3)
+                        iterations_completed = gr.Textbox(label="Iterations Completed", value="0")
+                        refresh_status = gr.Textbox(label="Refresh Status", visible=True)
+                        run_index_button.click(
+                            fn=start_indexing,
+                            inputs=[root_dir, config_file, verbose, nocache, resume, reporter, emit_formats, custom_cli_args],
+                            outputs=[run_index_button, stop_index_button, refresh_index_button]
+                        ).then(
+                            fn=run_indexing,
+                            inputs=[root_dir, config_file, verbose, nocache, resume, reporter, emit_formats, custom_cli_args],
+                            outputs=[index_output, index_progress, run_index_button, stop_index_button, refresh_index_button, iterations_completed]
+                        )
+                        stop_index_button.click(
+                            fn=stop_indexing_process,
+                            outputs=[run_index_button, stop_index_button, refresh_index_button]
+                        )
+                        refresh_index_button.click(
+                            fn=refresh_indexing,
+                            outputs=[run_index_button, stop_index_button, refresh_index_button, refresh_status]
+                        )
+                    with gr.TabItem("Indexing Outputs/Visuals"):
+                        output_folder_list = gr.Dropdown(label="Select Output Folder (Select GraphML File to Visualize)", choices=list_output_folders("./indexing"), interactive=True)
+                        refresh_folder_btn = gr.Button("Refresh Folder List", variant="secondary")
+                        initialize_folder_btn = gr.Button("Initialize Selected Folder", variant="primary")
+                        folder_content_list = gr.Dropdown(label="Select File or Directory", choices=[], interactive=True)
+                        file_info = gr.Textbox(label="File Information", interactive=False)
+                        output_content = gr.TextArea(label="File Content", lines=20, interactive=False)
+                        initialization_status = gr.Textbox(label="Initialization Status")
+                    with gr.TabItem("LLM Settings"):
+                        llm_base_url = gr.Textbox(label="LLM API Base URL", value=os.getenv("LLM_API_BASE"))
+                        llm_api_key = gr.Textbox(label="LLM API Key", value=os.getenv("LLM_API_KEY"), type="password")
+                        llm_service_type = gr.Radio(
+                            label="LLM Service Type",
+                            choices=["openai", "ollama"],
+                            value="openai",
+                            visible=False  # Hide this if you want to always use OpenAI
+                        )
+                        llm_model_dropdown = gr.Dropdown(
+                            label="LLM Model",
+                            choices=[],  # Start with an empty list
+                            value=settings['llm'].get('model'),
+                            allow_custom_value=True
+                        )
+                        refresh_llm_models_btn = gr.Button("Refresh LLM Models", variant="secondary")
+                        embeddings_base_url = gr.Textbox(label="Embeddings API Base URL", value=os.getenv("EMBEDDINGS_API_BASE"))
+                        embeddings_api_key = gr.Textbox(label="Embeddings API Key", value=os.getenv("EMBEDDINGS_API_KEY"), type="password")
+                        embeddings_service_type = gr.Radio(
+                            label="Embeddings Service Type",
+                            choices=["openai", "ollama"],
+                            value=settings.get('embeddings', {}).get('llm', {}).get('type', 'openai'),
+                            visible=False,
+                        )
+                        embeddings_model_dropdown = gr.Dropdown(
+                            label="Embeddings Model",
+                            choices=[],
+                            value=settings.get('embeddings', {}).get('llm', {}).get('model'),
+                            allow_custom_value=True
+                        )
+                        refresh_embeddings_models_btn = gr.Button("Refresh Embedding Models", variant="secondary")
+                        system_message = gr.Textbox(
+                            lines=5,
+                            label="System Message",
+                            value=os.getenv("SYSTEM_MESSAGE", "You are a helpful AI assistant.")
+                        )
+                        context_window = gr.Slider(
+                            label="Context Window",
+                            minimum=512,
+                            maximum=32768,
+                            step=512,
+                            value=int(os.getenv("CONTEXT_WINDOW", 4096))
+                        )
+                        temperature = gr.Slider(
+                            label="Temperature",
+                            minimum=0.0,
+                            maximum=2.0,
+                            step=0.1,
+                            value=float(settings['llm'].get('TEMPERATURE', 0.5))
+                        )
+                        max_tokens = gr.Slider(
+                            label="Max Tokens",
+                            minimum=1,
+                            maximum=8192,
+                            step=1,
+                            value=int(settings['llm'].get('MAX_TOKENS', 1024))
+                        )
+                        update_settings_btn = gr.Button("Update LLM Settings", variant="primary")
+                        llm_settings_status = gr.Textbox(label="Status", interactive=False)
+                        llm_base_url.change(
+                            fn=update_model_choices,
+                            inputs=[llm_base_url, llm_api_key, llm_service_type, gr.Textbox(value='llm', visible=False)],
+                            outputs=llm_model_dropdown
+                        )
+                        # Update Embeddings model choices when service type or base URL changes
+                        embeddings_service_type.change(
+                            fn=update_embeddings_model_choices,
+                            inputs=[embeddings_base_url, embeddings_api_key, embeddings_service_type],
+                            outputs=embeddings_model_dropdown
+                        )
+                        embeddings_base_url.change(
+                            fn=update_model_choices,
+                            inputs=[embeddings_base_url, embeddings_api_key, embeddings_service_type, gr.Textbox(value='embeddings', visible=False)],
+                            outputs=embeddings_model_dropdown
+                        )
+                        update_settings_btn.click(
+                            fn=update_llm_settings,
+                            inputs=[
+                                llm_model_dropdown,
+                                embeddings_model_dropdown,
+                                context_window,
+                                system_message,
+                                temperature,
+                                max_tokens,
+                                llm_base_url,
+                                llm_api_key,
+                                embeddings_base_url,
+                                embeddings_api_key,
+                                embeddings_service_type
+                            ],
+                            outputs=[llm_settings_status]
+                        )
+                        refresh_llm_models_btn.click(
+                            fn=update_model_choices,
+                            inputs=[llm_base_url, llm_api_key, llm_service_type, gr.Textbox(value='llm', visible=False)],
+                            outputs=[llm_model_dropdown]
+                        ).then(
+                            fn=update_logs,
+                            outputs=[log_output]
+                        )
+                        refresh_embeddings_models_btn.click(
+                            fn=update_model_choices,
+                            inputs=[embeddings_base_url, embeddings_api_key, embeddings_service_type, gr.Textbox(value='embeddings', visible=False)],
+                            outputs=[embeddings_model_dropdown]
+                        ).then(
+                            fn=update_logs,
+                            outputs=[log_output]
+                        )
+                    with gr.TabItem("YAML Settings"):
+                        settings = load_settings()
+                        with gr.Group():
+                            for key, value in settings.items():
+                                if key != 'llm':
+                                    create_setting_component(key, value)
+                with gr.Group(elem_id="log-container"):
+                    gr.Markdown("### Logs")
+                    log_output = gr.TextArea(label="Logs", elem_id="log-output", interactive=False)
+            with gr.Column(scale=2, elem_id="right-column"):
+                with gr.Group(elem_id="chat-container"):
+                    chatbot = gr.Chatbot(label="Chat History", elem_id="chatbot")
+                    with gr.Row(elem_id="chat-input-row"):
+                        with gr.Column(scale=1):
+                            query_input = gr.Textbox(
+                                label="Input",
+                                placeholder="Enter your query here...",
+                                elem_id="query-input"
+                            )
+                            query_btn = gr.Button("Send Query", variant="primary")
+                    with gr.Accordion("Query Parameters", open=True):
+                        query_type = gr.Radio(
+                            ["global", "local", "direct"],
+                            label="Query Type",
+                            value="global",
+                            info="Global: community-based search, Local: entity-based search, Direct: LLM chat"
+                        )
+                        preset_dropdown = gr.Dropdown(
+                            label="Preset Query Options",
+                            choices=[
+                                "Default Global Search",
+                                "Default Local Search",
+                                "Detailed Global Analysis",
+                                "Detailed Local Analysis",
+                                "Quick Global Summary",
+                                "Quick Local Summary",
+                                "Global Bullet Points",
+                                "Local Bullet Points",
+                                "Comprehensive Global Report",
+                                "Comprehensive Local Report",
+                                "High-Level Global Overview",
+                                "High-Level Local Overview",
+                                "Focused Global Insight",
+                                "Focused Local Insight",
+                                "Custom Query"
+                            ],
+                            value="Default Global Search",
+                            info="Select a preset or choose 'Custom Query' for manual configuration"
+                        )
+                        selected_folder = gr.Dropdown(
+                            label="Select Index Folder to Chat With",
+                            choices=list_output_folders("./indexing"),
+                            value=None,
+                            interactive=True
+                        )
+                        refresh_folder_btn = gr.Button("Refresh Folders", variant="secondary")
+                        clear_chat_btn = gr.Button("Clear Chat", variant="secondary")
+                        with gr.Group(visible=False) as custom_options:
+                            community_level = gr.Slider(
+                                label="Community Level",
+                                minimum=1,
+                                maximum=10,
+                                value=2,
+                                step=1,
+                                info="Higher values use reports on smaller communities"
+                            )
+                            response_type = gr.Dropdown(
+                                label="Response Type",
+                                choices=[
+                                    "Multiple Paragraphs",
+                                    "Single Paragraph",
+                                    "Single Sentence",
+                                    "List of 3-7 Points",
+                                    "Single Page",
+                                    "Multi-Page Report"
+                                ],
+                                value="Multiple Paragraphs",
+                                info="Specify the desired format of the response"
+                            )
+                            custom_cli_args = gr.Textbox(
+                                label="Custom CLI Arguments",
+                                placeholder="--arg1 value1 --arg2 value2",
+                                info="Additional CLI arguments for advanced users"
+                            )
+                    def update_custom_options(preset):
+                        if preset == "Custom Query":
+                            return gr.update(visible=True)
+                        else:
+                            return gr.update(visible=False)
+                    preset_dropdown.change(fn=update_custom_options, inputs=[preset_dropdown], outputs=[custom_options])
+                    with gr.Group(elem_id="visualization-container"):
+                        vis_output = gr.Plot(label="Graph Visualization", elem_id="visualization-plot")
+                        with gr.Row(elem_id="vis-controls-row"):
+                            vis_btn = gr.Button("Visualize Graph", variant="secondary")
+                        # Add new controls for customization
+                        with gr.Accordion("Visualization Settings", open=False):
+                            layout_type = gr.Dropdown(["3D Spring", "2D Spring", "Circular"], label="Layout Type", value="3D Spring")
+                            node_size = gr.Slider(1, 20, 7, label="Node Size", step=1)
+                            edge_width = gr.Slider(0.1, 5, 0.5, label="Edge Width", step=0.1)
+                            node_color_attribute = gr.Dropdown(["Degree", "Random"], label="Node Color Attribute", value="Degree")
+                            color_scheme = gr.Dropdown(["Viridis", "Plasma", "Inferno", "Magma", "Cividis"], label="Color Scheme", value="Viridis")
+                            show_labels = gr.Checkbox(label="Show Node Labels", value=True)
+                            label_size = gr.Slider(5, 20, 10, label="Label Size", step=1)
+        # Event handlers
+        upload_btn.click(fn=upload_file, inputs=[file_upload], outputs=[upload_output, file_list, log_output])
+        refresh_btn.click(fn=update_file_list, outputs=[file_list]).then(
+            fn=update_logs,
+            outputs=[log_output]
+        )
+        file_list.change(fn=update_file_content, inputs=[file_list], outputs=[file_content]).then(
+            fn=update_logs,
+            outputs=[log_output]
+        )
+        delete_btn.click(fn=delete_file, inputs=[file_list], outputs=[operation_status, file_list, log_output])
+        save_btn.click(fn=save_file_content, inputs=[file_list, file_content], outputs=[operation_status, log_output])
+        refresh_folder_btn.click(
+            fn=lambda: gr.update(choices=list_output_folders("./indexing")),
+            outputs=[selected_folder]
+        )
+        clear_chat_btn.click(
+            fn=lambda: ([], ""),
+            outputs=[chatbot, query_input]
+        )
+        refresh_folder_btn.click(
+            fn=update_output_folder_list,
+            outputs=[output_folder_list]
+        ).then(
+            fn=update_logs,
+            outputs=[log_output]
+        )
+        output_folder_list.change(
+            fn=update_folder_content_list,
+            inputs=[output_folder_list],
+            outputs=[folder_content_list]
+        ).then(
+            fn=update_logs,
+            outputs=[log_output]
+        )
+        folder_content_list.change(
+            fn=handle_content_selection,
+            inputs=[output_folder_list, folder_content_list],
+            outputs=[folder_content_list, file_info, output_content]
+        ).then(
+            fn=update_logs,
+            outputs=[log_output]
+        )
+        initialize_folder_btn.click(
+            fn=initialize_selected_folder,
+            inputs=[output_folder_list],
+            outputs=[initialization_status, folder_content_list]
+        ).then(
+            fn=update_logs,
+            outputs=[log_output]
+        )
+        vis_btn.click(
+            fn=update_visualization,
+            inputs=[
+                output_folder_list,
+                folder_content_list,
+                layout_type,
+                node_size,
+                edge_width,
+                node_color_attribute,
+                color_scheme,
+                show_labels,
+                label_size
+            ],
+            outputs=[vis_output, gr.Textbox(label="Visualization Status")]
+        )
+        query_btn.click(
+            fn=send_message,
+            inputs=[
+                query_type,
+                query_input,
+                chatbot,
+                system_message,
+                temperature,
+                max_tokens,
+                preset_dropdown,
+                community_level,
+                response_type,
+                custom_cli_args,
+                selected_folder
+            ],
+            outputs=[chatbot, query_input, log_output]
+        )
+        query_input.submit(
+            fn=send_message,
+            inputs=[
+                query_type,
+                query_input,
+                chatbot,
+                system_message,
+                temperature,
+                max_tokens,
+                preset_dropdown,
+                community_level,
+                response_type,
+                custom_cli_args,
+                selected_folder
+            ],
+            outputs=[chatbot, query_input, log_output]
+        )
+        refresh_llm_models_btn.click(
+            fn=update_model_choices,
+            inputs=[llm_base_url, llm_api_key, llm_service_type, gr.Textbox(value='llm', visible=False)],
+            outputs=[llm_model_dropdown]
+        )
+        # Update Embeddings model choices
+        refresh_embeddings_models_btn.click(
+            fn=update_model_choices,
+            inputs=[embeddings_base_url, embeddings_api_key, embeddings_service_type, gr.Textbox(value='embeddings', visible=False)],
+            outputs=[embeddings_model_dropdown]
+        )
+        # Add this JavaScript to enable Shift+Enter functionality
+        demo.load(js="""
+        function addShiftEnterListener() {
+            const queryInput = document.getElementById('query-input');
+            if (queryInput) {
+                queryInput.addEventListener('keydown', function(event) {
+                    if (event.key === 'Enter' && event.shiftKey) {
+                        event.preventDefault();
+                        const submitButton = queryInput.closest('.gradio-container').querySelector('button.primary');
+                        if (submitButton) {
+                            submitButton.click();
+                        }
+                    }
+                });
+            }
+        }
+        document.addEventListener('DOMContentLoaded', addShiftEnterListener);
+        """)
+    return demo.queue()
+async def main():
+    api_port = 8088
+    gradio_port = 7860
+    print(f"Starting API server on port {api_port}")
+    start_api_server(api_port)
+    # Wait for the API server to start in a separate thread
+    threading.Thread(target=wait_for_api_server, args=(api_port,)).start()
+    # Create the Gradio app
+    demo = create_gradio_interface()
+    print(f"Starting Gradio app on port {gradio_port}")
+    # Launch the Gradio app
+    demo.launch(server_port=gradio_port, share=True)
+demo = create_gradio_interface()
+app = demo.app
+if __name__ == "__main__":
+    initialize_data()
+    demo.launch(server_port=7860, share=True)

css ADDED Viewed

	@@ -0,0 +1,242 @@

+html, body {
+    margin: 0;
+    padding: 0;
+    height: 100vh;
+    overflow: hidden;
+}
+.gradio-container {
+    margin: 0 !important;
+    padding: 0 !important;
+    width: 100vw !important;
+    max-width: 100vw !important;
+    height: 100vh !important;
+    max-height: 100vh !important;
+    overflow: auto;
+    display: flex;
+    flex-direction: column;
+}
+#main-container {
+    flex: 1;
+    display: flex;
+    overflow: hidden;
+}
+#left-column, #right-column {
+    height: 100%;
+    overflow-y: auto;
+    padding: 10px;
+}
+#left-column {
+    flex: 1;
+}
+#right-column {
+    flex: 2;
+    display: flex;
+    flex-direction: column;
+}
+#chat-container {
+    flex: 0 0 auto;  /* Don't allow this to grow */
+    height: 100%;
+    display: flex;
+    flex-direction: column;
+    overflow: hidden;
+    border: 1px solid var(--color-accent);
+    border-radius: 8px;
+    padding: 10px;
+    overflow-y: auto;
+}
+#chatbot {
+    overflow-y: hidden;
+    height: 100%;
+}
+#chat-input-row {
+    margin-top: 10px;
+}
+#visualization-plot {
+    width: 100%;
+    aspect-ratio: 1 / 1;
+    max-height: 600px;  /* Adjust this value as needed */
+}
+#vis-controls-row {
+    display: flex;
+    justify-content: space-between;
+    align-items: center;
+    margin-top: 10px;
+}
+#vis-controls-row > * {
+    flex: 1;
+    margin: 0 5px;
+}
+#vis-status {
+    margin-top: 10px;
+}
+/* Chat input styling */
+#chat-input-row {
+    display: flex;
+    flex-direction: column;
+}
+#chat-input-row > div {
+    width: 100% !important;
+}
+#chat-input-row input[type="text"] {
+    width: 100% !important;
+}
+/* Adjust padding for all containers */
+.gr-box, .gr-form, .gr-panel {
+    padding: 10px !important;
+}
+/* Ensure all textboxes and textareas have full height */
+.gr-textbox, .gr-textarea {
+    height: auto !important;
+    min-height: 100px !important;
+}
+/* Ensure all dropdowns have full width */
+.gr-dropdown {
+    width: 100% !important;
+}
+:root {
+    --color-background: #2C3639;
+    --color-foreground: #3F4E4F;
+    --color-accent: #A27B5C;
+    --color-text: #DCD7C9;
+}
+body, .gradio-container {
+    background-color: var(--color-background);
+    color: var(--color-text);
+}
+.gr-button {
+    background-color: var(--color-accent);
+    color: var(--color-text);
+}
+.gr-input, .gr-textarea, .gr-dropdown {
+    background-color: var(--color-foreground);
+    color: var(--color-text);
+    border: 1px solid var(--color-accent);
+}
+.gr-panel {
+    background-color: var(--color-foreground);
+    border: 1px solid var(--color-accent);
+}
+.gr-box {
+    border-radius: 8px;
+    margin-bottom: 10px;
+    background-color: var(--color-foreground);
+}
+.gr-padded {
+    padding: 10px;
+}
+.gr-form {
+    background-color: var(--color-foreground);
+}
+.gr-input-label, .gr-radio-label {
+    color: var(--color-text);
+}
+.gr-checkbox-label {
+    color: var(--color-text);
+}
+.gr-markdown {
+    color: var(--color-text);
+}
+.gr-accordion {
+    background-color: var(--color-foreground);
+    border: 1px solid var(--color-accent);
+}
+.gr-accordion-header {
+    background-color: var(--color-accent);
+    color: var(--color-text);
+}
+#visualization-container {
+    display: flex;
+    flex-direction: column;
+    border: 2px solid var(--color-accent);
+    border-radius: 8px;
+    margin-top: 20px;
+    padding: 10px;
+    background-color: var(--color-foreground);
+    height: calc(100vh - 300px);  /* Adjust this value as needed */
+}
+#visualization-plot {
+    width: 100%;
+    height: 100%;
+}
+#vis-controls-row {
+    display: flex;
+    justify-content: space-between;
+    align-items: center;
+    margin-top: 10px;
+}
+#vis-controls-row > * {
+    flex: 1;
+    margin: 0 5px;
+}
+#vis-status {
+    margin-top: 10px;
+}
+#log-container {
+    background-color: var(--color-foreground);
+    border: 1px solid var(--color-accent);
+    border-radius: 8px;
+    padding: 10px;
+    margin-top: 20px;
+    max-height: auto;
+    overflow-y: auto;
+}
+.setting-accordion .label-wrap {
+    cursor: pointer;
+}
+.setting-accordion .icon {
+    transition: transform 0.3s ease;
+}
+.setting-accordion[open] .icon {
+    transform: rotate(90deg);
+}
+.gr-form.gr-box {
+    border: none !important;
+    background: none !important;
+}
+.model-params {
+    border-top: 1px solid var(--color-accent);
+    margin-top: 10px;
+    padding-top: 10px;
+}

embedding_proxy.py ADDED Viewed

	@@ -0,0 +1,62 @@

+import json
+from fastapi import FastAPI, HTTPException
+import uvicorn
+import httpx
+from pydantic import BaseModel
+from typing import List, Union
+app = FastAPI()
+OLLAMA_URL = "http://localhost:11434"  # Default Ollama URL
+class EmbeddingRequest(BaseModel):
+    input: Union[str, List[str]]
+    model: str
+class EmbeddingResponse(BaseModel):
+    object: str
+    data: List[dict]
+    model: str
+    usage: dict
+@app.post("/v1/embeddings")
+async def create_embedding(request: EmbeddingRequest):
+    async with httpx.AsyncClient() as client:
+        if isinstance(request.input, str):
+            request.input = [request.input]
+        ollama_requests = [{"model": request.model, "prompt": text} for text in request.input]
+        embeddings = []
+        for i, ollama_request in enumerate(ollama_requests):
+            response = await client.post(f"{OLLAMA_URL}/api/embeddings", json=ollama_request)
+            if response.status_code != 200:
+                raise HTTPException(status_code=response.status_code, detail="Ollama API error")
+            result = response.json()
+            embeddings.append({
+                "object": "embedding",
+                "embedding": result["embedding"],
+                "index": i
+            })
+        return EmbeddingResponse(
+            object="list",
+            data=embeddings,
+            model=request.model,
+        )
+if __name__ == "__main__":
+    import argparse
+    parser = argparse.ArgumentParser(description="Run the embedding proxy server")
+    parser.add_argument("--port", type=int, default=11435, help="Port to run the server on")
+    parser.add_argument("--host", type=str, default="http://localhost:11434", help="URL of the Ollama server")
+    parser.add_argument("--reload", action="store_true", help="Enable auto-reload for development")
+    args = parser.parse_args()
+    OLLAMA_URL = args.host
+    uvicorn.run("embedding_proxy:app", host="0.0.0.0", port=args.port, reload=args.reload)

env-example.txt ADDED Viewed

	@@ -0,0 +1,19 @@

+LLM_PROVIDER=openai
+LLM_API_BASE=http://localhost:11434/v1
+LLM_MODEL='mistral-large:123b-instruct-2407-q4_0'
+LLM_API_KEY=12345
+EMBEDDINGS_PROVIDER=openai
+EMBEDDINGS_API_BASE=http://localhost:11434
+EMBEDDINGS_MODEL='snowflake-arctic-embed:335m'
+EMBEDDINGS_API_KEY=12345
+GRAPHRAG_API_KEY=12345
+ROOT_DIR=indexing
+INPUT_DIR=${ROOT_DIR}/output/${timestamp}/artifacts
+LLM_SERVICE_TYPE=openai_chat
+EMBEDDINGS_SERVICE_TYPE=openai_embedding
+API_URL=http://localhost:8012
+API_PORT=8012

graphrag/.github/ISSUE_TEMPLATE.md ADDED Viewed

	@@ -0,0 +1,69 @@

+### Description
+<!-- A clear and concise description of the issue or feature request. -->
+### Environment
+- GraphRAG version: <!-- Specify the GraphRAG version (e.g., v0.1.1) -->
+- Python version: <!-- Specify the Python version (e.g., 3.8) -->
+- Operating System: <!-- Specify the OS (e.g., Windows 10, Ubuntu 20.04) -->
+### Steps to Reproduce (for bugs)
+<!-- Provide detailed steps to reproduce the issue. Include code snippets, configuration files, or any other relevant information. -->
+1. Step 1
+2. Step 2
+3. ...
+### Expected Behavior
+<!-- Describe what you expected to happen. -->
+### Actual Behavior
+<!-- Describe what actually happened. Include any error messages, stack traces, or unexpected behavior. -->
+### Screenshots / Logs (if applicable)
+<!-- If relevant, include screenshots or logs that help illustrate the issue. -->
+### GraphRAG Configuration
+<!-- Include the GraphRAG configuration used for this run. -->
+### Additional Information
+<!-- Include any additional information that might be helpful, such as specific configurations, data samples, or context about the environment. -->
+### Possible Solution (if you have one)
+<!-- If you have suggestions on how to address the issue, provide them here. -->
+### Is this a Bug or Feature Request?
+<!-- Choose one: Bug | Feature Request -->
+### Any related issues?
+<!-- If this is related to another issue, reference it here. -->
+### Any relevant discussions?
+<!-- If there are any discussions or forum threads related to this issue, provide links. -->
+### Checklist
+<!-- Please check the items that you have completed -->
+- [ ] I have searched for similar issues and didn't find any duplicates.
+- [ ] I have provided a clear and concise description of the issue.
+- [ ] I have included the necessary environment details.
+- [ ] I have outlined the steps to reproduce the issue.
+- [ ] I have included any relevant logs or screenshots.
+- [ ] I have included the GraphRAG configuration for this run.
+- [ ] I have indicated whether this is a bug or a feature request.
+### Additional Comments
+<!-- Any additional comments or context that you think would be helpful. -->

graphrag/.github/ISSUE_TEMPLATE/bug_report.yml ADDED Viewed

	@@ -0,0 +1,57 @@

+name: Bug Report
+description: File a bug report
+title: "[Bug]: <title>"
+labels: ["bug", "triage"]
+body:
+  - type: textarea
+    id: description
+    attributes:
+      label: Describe the bug
+      description: A clear and concise description of what the bug is.
+      placeholder: What went wrong?
+  - type: textarea
+    id: reproduce
+    attributes:
+      label: Steps to reproduce
+      description: |
+        Steps to reproduce the behavior:
+        1. Step 1
+        2. Step 2
+        3. ...
+        4. See error
+      placeholder: How can we replicate the issue?
+  - type: textarea
+    id: expected_behavior
+    attributes:
+      label: Expected Behavior
+      description: A clear and concise description of what you expected to happen.
+      placeholder: What should have happened?
+  - type: textarea
+    id: configused
+    attributes:
+      label: GraphRAG Config Used
+      description: The GraphRAG configuration used for the run.
+      placeholder: The settings.yaml content or GraphRAG configuration
+  - type: textarea
+    id: screenshotslogs
+    attributes:
+      label: Logs and screenshots
+      description: If applicable, add screenshots and logs to help explain your problem.
+      placeholder: Add logs and screenshots here
+  - type: textarea
+    id: additional_information
+    attributes:
+      label: Additional Information
+      description: |
+        - GraphRAG Version: e.g., v0.1.1
+        - Operating System: e.g., Windows 10, Ubuntu 20.04
+        - Python Version: e.g., 3.8
+        - Related Issues: e.g., #1
+        - Any other relevant information.
+      value: |
+        - GraphRAG Version:
+        - Operating System:
+        - Python Version:
+        - Related Issues:

graphrag/.github/ISSUE_TEMPLATE/config.yml ADDED Viewed

	@@ -0,0 +1 @@


1	+ blank_issues_enabled: true

graphrag/.github/ISSUE_TEMPLATE/feature_request.yml ADDED Viewed

	@@ -0,0 +1,26 @@

+name: Feature Request
+description: File a feature request
+labels: ["enhancement"]
+title: "[Feature Request]: <title>"
+body:
+  - type: textarea
+    id: problem_description
+    attributes:
+      label: Is your feature request related to a problem? Please describe.
+      description: A clear and concise description of what the problem is.
+      placeholder: What problem are you trying to solve?
+  - type: textarea
+    id: solution_description
+    attributes:
+      label: Describe the solution you'd like
+      description: A clear and concise description of what you want to happen.
+      placeholder: How do you envision the solution?
+  - type: textarea
+    id: additional_context
+    attributes:
+      label: Additional context
+      description: Add any other context or screenshots about the feature request here.
+      placeholder: Any additional information

graphrag/.github/ISSUE_TEMPLATE/general_issue.yml ADDED Viewed

	@@ -0,0 +1,51 @@

+name: General Issue
+description: File a general issue
+title: "[Issue]: <title> "
+labels: ["triage"]
+body:
+  - type: textarea
+    id: description
+    attributes:
+      label: Describe the issue
+      description: A clear and concise description of what the issue is.
+      placeholder: What went wrong?
+  - type: textarea
+    id: reproduce
+    attributes:
+      label: Steps to reproduce
+      description: |
+        Steps to reproduce the behavior:
+        1. Step 1
+        2. Step 2
+        3. ...
+        4. See error
+      placeholder: How can we replicate the issue?
+  - type: textarea
+    id: configused
+    attributes:
+      label: GraphRAG Config Used
+      description: The GraphRAG configuration used for the run.
+      placeholder: The settings.yaml content or GraphRAG configuration
+  - type: textarea
+    id: screenshotslogs
+    attributes:
+      label: Logs and screenshots
+      description: If applicable, add screenshots and logs to help explain your problem.
+      placeholder: Add logs and screenshots here
+  - type: textarea
+    id: additional_information
+    attributes:
+      label: Additional Information
+      description: |
+        - GraphRAG Version: e.g., v0.1.1
+        - Operating System: e.g., Windows 10, Ubuntu 20.04
+        - Python Version: e.g., 3.8
+        - Related Issues: e.g., #1
+        - Any other relevant information.
+      value: |
+        - GraphRAG Version:
+        - Operating System:
+        - Python Version:
+        - Related Issues:

graphrag/.github/dependabot.yml ADDED Viewed

	@@ -0,0 +1,19 @@

+# To get started with Dependabot version updates, you'll need to specify which
+# package ecosystems to update and where the package manifests are located.
+# Please see the documentation for all configuration options:
+# https://docs.github.com/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file
+version: 2
+updates:
+  - package-ecosystem: "npm" # See documentation for possible values
+    directory: "docsite/" # Location of package manifests
+    schedule:
+      interval: "weekly"
+  - package-ecosystem: "pip" # See documentation for possible values
+    directory: "/" # Location of package manifests
+    schedule:
+      interval: "weekly"
+  - package-ecosystem: "github-actions"
+    # Workflow files stored in the default location of `.github/workflows`. (You don't need to specify `/.github/workflows` for `directory`. You can use `directory: "/"`.)
+    directory: "/"
+    schedule:
+      interval: "weekly"

graphrag/.github/pull_request_template.md ADDED Viewed

	@@ -0,0 +1,36 @@

+<!--
+Thanks for contributing to GraphRAG!
+Please do not make *Draft* pull requests, as they still notify anyone watching the repo.
+Create a pull request when it is ready for review and feedback.
+About this template
+The following template aims to help contributors write a good description for their pull requests.
+We'd like you to provide a description of the changes in your pull request (i.e. bugs fixed or features added), the motivation behind the changes, and complete the checklist below before opening a pull request.
+Feel free to discard it if you need to (e.g. when you just fix a typo). -->
+## Description
+[Provide a brief description of the changes made in this pull request.]
+## Related Issues
+[Reference any related issues or tasks that this pull request addresses.]
+## Proposed Changes
+[List the specific changes made in this pull request.]
+## Checklist
+- [ ] I have tested these changes locally.
+- [ ] I have reviewed the code changes.
+- [ ] I have updated the documentation (if necessary).
+- [ ] I have added appropriate unit tests (if applicable).
+## Additional Notes
+[Add any additional notes or context that may be helpful for the reviewer(s).]

graphrag/.github/workflows/gh-pages.yml ADDED Viewed

	@@ -0,0 +1,97 @@

+name: gh-pages
+on:
+  push:
+    branches: [main]
+permissions:
+  contents: write
+env:
+  POETRY_VERSION: 1.8.3
+  PYTHON_VERSION: "3.11"
+  NODE_VERSION: 18.x
+jobs:
+  build:
+    runs-on: ubuntu-latest
+    env:
+      GH_PAGES: 1
+      DEBUG: 1
+      GRAPHRAG_LLM_TYPE: "azure_openai_chat"
+      GRAPHRAG_EMBEDDING_TYPE: "azure_openai_embedding"
+      GRAPHRAG_API_KEY: ${{ secrets.OPENAI_API_KEY }}
+      GRAPHRAG_API_BASE: ${{ secrets.GRAPHRAG_API_BASE }}
+      GRAPHRAG_API_VERSION: ${{ secrets.GRAPHRAG_API_VERSION }}
+      GRAPHRAG_LLM_DEPLOYMENT_NAME: ${{ secrets.GRAPHRAG_LLM_DEPLOYMENT_NAME }}
+      GRAPHRAG_EMBEDDING_DEPLOYMENT_NAME: ${{ secrets.GRAPHRAG_EMBEDDING_DEPLOYMENT_NAME }}
+      GRAPHRAG_CACHE_TYPE: "blob"
+      GRAPHRAG_CACHE_CONNECTION_STRING: ${{ secrets.BLOB_STORAGE_CONNECTION_STRING }}
+      GRAPHRAG_CACHE_CONTAINER_NAME: "cicache"
+      GRAPHRAG_CACHE_BASE_DIR": "cache"
+      GRAPHRAG_LLM_MODEL: gpt-3.5-turbo-16k
+      GRAPHRAG_EMBEDDING_MODEL: text-embedding-ada-002
+      # We have Windows + Linux runners in 3.10 and 3.11, so we need to divide the rate limits by 4
+      GRAPHRAG_LLM_TPM: 45_000 # 180,000 / 4
+      GRAPHRAG_LLM_RPM: 270 # 1,080 / 4
+      GRAPHRAG_EMBEDDING_TPM: 87_500 # 350,000 / 4
+      GRAPHRAG_EMBEDDING_RPM: 525 # 2,100 / 4
+      GRAPHRAG_CHUNK_SIZE: 1200
+      GRAPHRAG_CHUNK_OVERLAP: 0
+      # Azure AI Search config
+      AZURE_AI_SEARCH_URL_ENDPOINT: ${{ secrets.AZURE_AI_SEARCH_URL_ENDPOINT }}
+      AZURE_AI_SEARCH_API_KEY: ${{ secrets.AZURE_AI_SEARCH_API_KEY }}
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          persist-credentials: false
+      - name: Set up Python ${{ env.PYTHON_VERSION }}
+        uses: actions/setup-python@v5
+        with:
+          python-version: ${{ env.PYTHON_VERSION }}
+      - name: Install Poetry ${{ env.POETRY_VERSION }}
+        uses: abatilo/[email protected]
+        with:
+          poetry-version: ${{ env.POETRY_VERSION }}
+      - name: Use Node ${{ env.NODE_VERSION }}
+        uses: actions/setup-node@v4
+        with:
+          node-version: ${{ env.NODE_VERSION }}
+      - name: Install Yarn dependencies
+        run: yarn install
+        working-directory: docsite
+      - name: Install Poetry dependencies
+        run: poetry install
+      - name: Install Azurite
+        id: azuright
+        uses: potatoqualitee/[email protected]
+      - name: Generate Indexer Outputs
+        run: |
+          poetry run poe test_smoke
+          zip -jrm docsite/data/operation_dulce/dataset.zip tests/fixtures/min-csv/output/*/artifacts/*.parquet
+      - name: Build Jupyter Notebooks
+        run: poetry run poe convert_docsite_notebooks
+      - name: Build docsite
+        run: yarn build
+        working-directory: docsite
+        env:
+          DOCSITE_BASE_URL: "graphrag"
+      - name: List docsite files
+        run: find docsite/_site
+      - name: Deploy to GitHub Pages
+        uses: JamesIves/[email protected]
+        with:
+          branch: gh-pages
+          folder: docsite/_site
+          clean: true

graphrag/.github/workflows/javascript-ci.yml ADDED Viewed

	@@ -0,0 +1,30 @@

+name: JavaScript CI
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+env:
+  NODE_VERSION: 18.x
+jobs:
+  javascript-ci:
+    runs-on: ubuntu-latest
+    strategy:
+      fail-fast: false
+    steps:
+      - name: Use Node ${{ env.NODE_VERSION }}
+        uses: actions/setup-node@v4
+        with:
+          node-version: ${{ env.NODE_VERSION }}
+      - uses: actions/checkout@v4
+      - run: yarn install
+        working-directory: docsite
+        name: Install Dependencies
+      - run: yarn build
+        working-directory: docsite
+        name: Build Docsite

graphrag/.github/workflows/python-ci.yml ADDED Viewed

	@@ -0,0 +1,122 @@

+name: Python CI
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+permissions:
+  contents: read
+  pull-requests: read
+concurrency:
+  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
+  # Only run the for the latest commit
+  cancel-in-progress: true
+env:
+  POETRY_VERSION: 1.8.3
+jobs:
+  python-ci:
+    strategy:
+      matrix:
+        python-version: ["3.10", "3.11", "3.12"]
+        os: [ubuntu-latest, windows-latest]
+    env:
+      DEBUG: 1
+      GRAPHRAG_LLM_TYPE: "azure_openai_chat"
+      GRAPHRAG_EMBEDDING_TYPE: "azure_openai_embedding"
+      GRAPHRAG_API_KEY: ${{ secrets.OPENAI_API_KEY }}
+      GRAPHRAG_API_BASE: ${{ secrets.GRAPHRAG_API_BASE }}
+      GRAPHRAG_API_VERSION: ${{ secrets.GRAPHRAG_API_VERSION }}
+      GRAPHRAG_LLM_DEPLOYMENT_NAME: ${{ secrets.GRAPHRAG_LLM_DEPLOYMENT_NAME }}
+      GRAPHRAG_EMBEDDING_DEPLOYMENT_NAME: ${{ secrets.GRAPHRAG_EMBEDDING_DEPLOYMENT_NAME }}
+      GRAPHRAG_CACHE_TYPE: "blob"
+      GRAPHRAG_CACHE_CONNECTION_STRING: ${{ secrets.BLOB_STORAGE_CONNECTION_STRING }}
+      GRAPHRAG_CACHE_CONTAINER_NAME: "cicache"
+      GRAPHRAG_CACHE_BASE_DIR": "cache"
+      GRAPHRAG_LLM_MODEL: gpt-3.5-turbo-16k
+      GRAPHRAG_EMBEDDING_MODEL: text-embedding-ada-002
+      # We have Windows + Linux runners in 3.10 and 3.11, so we need to divide the rate limits by 4
+      GRAPHRAG_LLM_TPM: 45_000 # 180,000 / 4
+      GRAPHRAG_LLM_RPM: 270 # 1,080 / 4
+      GRAPHRAG_EMBEDDING_TPM: 87_500 # 350,000 / 4
+      GRAPHRAG_EMBEDDING_RPM: 525 # 2,100 / 4
+      GRAPHRAG_CHUNK_SIZE: 1200
+      GRAPHRAG_CHUNK_OVERLAP: 0
+      # Azure AI Search config
+      AZURE_AI_SEARCH_URL_ENDPOINT: ${{ secrets.AZURE_AI_SEARCH_URL_ENDPOINT }}
+      AZURE_AI_SEARCH_API_KEY: ${{ secrets.AZURE_AI_SEARCH_API_KEY }}
+    runs-on: ${{ matrix.os }}
+    steps:
+      - uses: actions/checkout@v4
+      - uses: dorny/paths-filter@v3
+        id: changes
+        with:
+          filters: |
+            python:
+              - 'graphrag/**/*'
+              - 'poetry.lock'
+              - 'pyproject.toml'
+              - '**/*.py'
+              - '**/*.toml'
+              - '**/*.ipynb'
+              - '.github/workflows/python*.yml'
+              - 'tests/smoke/*'
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v5
+        with:
+          python-version: ${{ matrix.python-version }}
+      - name: Install Poetry
+        uses: abatilo/[email protected]
+        with:
+          poetry-version: $POETRY_VERSION
+      - name: Install dependencies
+        shell: bash
+        run: poetry self add setuptools && poetry run python -m pip install gensim && poetry install
+      - name: Check Semversioner
+        run: |
+          poetry run semversioner check
+      - name: Check
+        run: |
+          poetry run poe check
+      - name: Build
+        run: |
+          poetry build
+      - name: Install Azurite
+        id: azuright
+        uses: potatoqualitee/[email protected]
+      - name: Unit Test
+        run: |
+          poetry run poe test_unit
+      - name: Integration Test
+        run: |
+          poetry run poe test_integration
+      - name: Smoke Test
+        if: steps.changes.outputs.python == 'true'
+        run: |
+          poetry run poe test_smoke
+      - uses: actions/upload-artifact@v4
+        if: always()
+        with:
+          name: smoke-test-artifacts-${{ matrix.python-version }}-${{ matrix.poetry-version }}-${{ runner.os }}
+          path: tests/fixtures/*/output
+      - name: E2E Test
+        if: steps.changes.outputs.python == 'true'
+        run: |
+          ./scripts/e2e-test.sh

graphrag/.github/workflows/python-publish.yml ADDED Viewed

	@@ -0,0 +1,52 @@

+name: Python Publish
+on:
+  release:
+    types: [created]
+  push:
+    branches: [main]
+env:
+  POETRY_VERSION: "1.8.3"
+  PYTHON_VERSION: "3.10"
+jobs:
+  publish:
+    name: Upload release to PyPI
+    if: github.ref == 'refs/heads/main'
+    runs-on: ubuntu-latest
+    environment:
+      name: pypi
+      url: https://pypi.org/p/graphrag
+    permissions:
+      id-token: write # IMPORTANT: this permission is mandatory for trusted publishing
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+          fetch-tags: true
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: ${{ env.PYTHON_VERSION }}
+      - name: Install Poetry
+        uses: abatilo/[email protected]
+        with:
+          poetry-version: ${{ env.POETRY_VERSION }}
+      - name: Install dependencies
+        shell: bash
+        run: poetry install
+      - name: Build Distributable
+        shell: bash
+        run: poetry build
+      - name: Publish package distributions to PyPI
+        uses: pypa/gh-action-pypi-publish@release/v1
+        with:
+          packages-dir: dist
+          skip-existing: true
+          verbose: true

graphrag/.github/workflows/semver.yml ADDED Viewed

	@@ -0,0 +1,15 @@

+name: Semver Check
+on:
+  pull_request:
+    branches: [main]
+jobs:
+    semver:
+        runs-on: ubuntu-latest
+        steps:
+        - uses: actions/checkout@v4
+          with:
+            fetch-depth: 0
+        - name: Check Semver
+          run: ./scripts/semver-check.sh

graphrag/.github/workflows/spellcheck.yml ADDED Viewed

	@@ -0,0 +1,15 @@

+name: Spellcheck
+on:
+  push:
+    branches: [main]
+  pull_request:
+    paths:
+      - '**/*'
+jobs:
+  spellcheck:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Spellcheck
+        run: ./scripts/spellcheck.sh

graphrag/.gitignore ADDED Viewed

	@@ -0,0 +1,68 @@

+# Node Artifacts
+*/node_modules/
+docsite/*/src/**/*.js
+docsite/*/lib/
+docsite/*/storybook-static/
+docsite/*/docsTemp/
+docsite/*/build/
+.swc/
+dist/
+.idea
+# https://yarnpkg.com/advanced/qa#which-files-should-be-gitignored
+docsite/.yarn/*
+!docsite/.yarn/patches
+!docsite/.yarn/releases
+!docsite/.yarn/plugins
+!docsite/.yarn/sdks
+!docsite/.yarn/versions
+docsite/.pnp.*
+.yarn/*
+!.yarn/patches
+!.yarn/releases
+!.yarn/plugins
+!.yarn/sdks
+!.yarn/versions
+.pnp.*
+# Python Artifacts
+python/*/lib/
+# Test Output
+.coverage
+coverage/
+licenses.txt
+examples_notebooks/*/lancedb
+examples_notebooks/*/data
+tests/fixtures/cache
+tests/fixtures/*/cache
+tests/fixtures/*/output
+lancedb/
+# Random
+.DS_Store
+*.log*
+.venv
+.conda
+.tmp
+.env
+build.zip
+.turbo
+__pycache__
+.pipeline
+# Azurite
+temp_azurite/
+__azurite*.json
+__blobstorage*.json
+__blobstorage__/
+# Getting started example
+ragtest/
+.ragtest/
+.pipelines
+.pipeline

graphrag/.semversioner/0.1.0.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "changes": [
+    {
+      "description": "Initial Release",
+      "type": "minor"
+    }
+  ],
+  "created_at": "2024-07-01T21:48:50+00:00",
+  "version": "0.1.0"
+}

graphrag/.semversioner/next-release/minor-20240710183748086411.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "type": "minor",
+  "description": "Add dynamic community report rating to the prompt tuning engine"
+}

graphrag/.semversioner/next-release/patch-20240701233152787373.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "type": "patch",
+  "description": "Fix docsite base url"
+}

graphrag/.semversioner/next-release/patch-20240703152422358587.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "type": "patch",
+  "description": "Add cli flag to overlay default values onto a provided config."
+}

graphrag/.semversioner/next-release/patch-20240703182750529114.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "type": "patch",
+  "description": "Fix broken prompt tuning link on docs"
+}

graphrag/.semversioner/next-release/patch-20240704181236015699.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "type": "patch",
+  "description": "Fix for --limit exceeding the dataframe lenght"
+}

graphrag/.semversioner/next-release/patch-20240705184142723331.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "type": "patch",
+  "description": "Add Minute-based Rate Limiting and fix rpm, tpm settings"
+}

graphrag/.semversioner/next-release/patch-20240705235656897489.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "type": "patch",
+  "description": "Add N parameter support"
+}

graphrag/.semversioner/next-release/patch-20240707063053679262.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "type": "patch",
+  "description": "fix community_report doesn't work in settings.yaml"
+}

graphrag/.semversioner/next-release/patch-20240709225514193665.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "type": "patch",
+  "description": "Add language support to prompt tuning"
+}

graphrag/.semversioner/next-release/patch-20240710114442871595.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "type": "patch",
+  "description": "Modify defaults for CHUNK_SIZE, CHUNK_OVERLAP and GLEANINGS to reduce time and LLM calls"
+}

graphrag/.semversioner/next-release/patch-20240710165603516866.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "type": "patch",
+  "description": "Fixed an issue where base OpenAI embeddings can't work with Azure OpenAI LLM"
+}

graphrag/.semversioner/next-release/patch-20240711004716103302.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "type": "patch",
+  "description": "Fix encoding model parameter on prompt tune"
+}

graphrag/.semversioner/next-release/patch-20240711092703710242.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "type": "patch",
+  "description": "support non-open ai model config to prompt tune"
+}

graphrag/.semversioner/next-release/patch-20240711223132221685.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "type": "patch",
+  "description": "Fix delta none on query calls"
+}

graphrag/.semversioner/next-release/patch-20240712035356859335.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "type": "patch",
+  "description": "fix llm response content is None in query"
+}

graphrag/.semversioner/next-release/patch-20240712210400518089.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "type": "patch",
+  "description": "Add exception handling on file load"
+}

graphrag/.semversioner/next-release/patch-20240712235357550877.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "type": "patch",
+  "description": "Add llm params to local and global search"
+}

graphrag/.semversioner/next-release/patch-20240716225953784804.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "type": "patch",
+  "description": "Fix for Ruff 0.5.2"
+}

graphrag/.vsts-ci.yml ADDED Viewed

	@@ -0,0 +1,41 @@

+name: GraphRAG CI
+pool:
+  vmImage: ubuntu-latest
+trigger:
+  batch: true
+  branches:
+    include:
+      - main
+variables:
+  isMain: $[eq(variables['Build.SourceBranch'], 'refs/heads/main')]
+  pythonVersion: "3.10"
+  poetryVersion: "1.6.1"
+  nodeVersion: "18.x"
+  artifactsFullFeedName: "Resilience/resilience_python"
+stages:
+  - stage: Compliance
+    dependsOn: []
+    jobs:
+      - job: compliance
+        displayName: Compliance
+        pool:
+          vmImage: windows-latest
+        steps:
+          - task: CredScan@3
+            inputs:
+              outputFormat: sarif
+              debugMode: false
+          - task: ComponentGovernanceComponentDetection@0
+            inputs:
+              scanType: "Register"
+              verbosity: "Verbose"
+              alertWarningLevel: "High"
+          - task: PublishSecurityAnalysisLogs@3
+            inputs:
+              ArtifactName: "CodeAnalysisLogs"
+              ArtifactType: "Container"

graphrag/CODEOWNERS ADDED Viewed

	@@ -0,0 +1,6 @@

+# These owners will be the default owners for everything in
+# the repo. Unless a later match takes precedence,
+# @global-owner1 and @global-owner2 will be requested for
+# review when someone opens a pull request.
+*       @microsoft/societal-resilience
+*       @microsoft/graphrag-core-team

graphrag/CODE_OF_CONDUCT.md ADDED Viewed

	@@ -0,0 +1,9 @@

+# Microsoft Open Source Code of Conduct
+This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
+Resources:
+- [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/)
+- [Microsoft Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/)
+- Contact [[email protected]](mailto:[email protected]) with questions or concerns