Spaces:

Agents-MCP-Hackathon
/

doc-mcp

Running

App Files Files Community

mdabidhussain commited on 11 days ago

Commit

409063f

1 Parent(s): 3fcc667

added about tab and updated readme.md

Browse files

Files changed (2) hide show

README.md +76 -29
app.py +481 -213

README.md CHANGED Viewed

@@ -22,6 +22,59 @@ short_description: 'RAG on documentations for your agent '
 Doc-MCP ingests markdown documentation from GitHub repositories and creates MCP servers that provide easy access to documentation context for AI agents. Just point it at any GitHub repo with markdown docs, and get an intelligent Q&A interface powered by vector search.
 ## ✨ Key Features
 - **GitHub Integration**: Fetch markdown files directly from any GitHub repository
@@ -30,6 +83,20 @@ Doc-MCP ingests markdown documentation from GitHub repositories and creates MCP
 - **Smart Q&A**: Ask questions about documentation with source citations
 - **Repository Management**: Track multiple repositories and their statistics
 ## 🚀 Quick Start
 1. **Setup Environment**:
@@ -41,7 +108,7 @@ uv sync
 # Configure environment
 cp .env.example .env
-# Add your NEBIUS_API_KEY and MONGODB_URI
 ```
 2. **Run the App**:
@@ -62,34 +129,14 @@ python main.py
 ## Workflow
-```mermaid
-flowchart TD
- subgraph Ingestion["Ingestion"]
-        B["Discover Markdown Files"]
-        A["GitHub Repo URL"]
-        C["User File Selection"]
-        D["Chunk & Embed Documents"]
-        E["Store in MongoDB"]
-  end
- subgraph Query["Query"]
-        G["Select Repository"]
-        F["User Question"]
-        H["Vector Search"]
-        I["Retrieve Context"]
-        J["Generate Response"]
-        K["Display with Sources"]
-  end
-    A --> B
-    B --> C
-    C --> D
-    D --> E
-    F --> G
-    G --> H
-    H --> I
-    I --> J
-    J --> K
-    E --> H
-```
 ## 🛠️ Technology Stack

 Doc-MCP ingests markdown documentation from GitHub repositories and creates MCP servers that provide easy access to documentation context for AI agents. Just point it at any GitHub repo with markdown docs, and get an intelligent Q&A interface powered by vector search.
+## 🛠️ Available MCP Tools
+### 📋 Documentation Query Tools
+#### `get_available_docs_repo`
+List all available ingested repositories
+- **Returns**: Array of repository names that have been processed and are available for querying
+- **Usage**: Get a list of documentation repositories before making queries
+#### `make_query`
+Search documentation with AI-powered semantic search
+- **Parameters**:
+  - `repo` (string): Repository name to search in
+  - `mode` (string): Search strategy - "default", "text_search", or "hybrid"
+  - `query` (string): Natural language question about the documentation
+- **Returns**: AI-generated response with source citations and metadata
+- **Usage**: Ask questions about specific documentation repositories
+### 📁 GitHub File Operations Tools
+#### `list_repository_files`
+Scan and list files in a GitHub repository
+- **Parameters**:
+  - `repo_url` (string): GitHub repository URL or owner/repo format
+  - `branch` (string, optional): Branch name (default: "main")
+  - `extensions` (string, optional): Comma-separated file extensions (default: ".md,.mdx")
+- **Returns**: JSON with file list and repository metadata
+- **Usage**: Discover available documentation files before ingestion
+#### `get_single_file`
+Retrieve content of a specific file from GitHub
+- **Parameters**:
+  - `repo_url` (string): GitHub repository URL or owner/repo format
+  - `file_path` (string): Path to the specific file in the repository
+  - `branch` (string, optional): Branch name (default: "main")
+- **Returns**: JSON with file content, metadata, and GitHub URLs
+- **Usage**: Fetch individual documentation files for processing or review
+#### `get_multiple_files`
+Retrieve multiple files from GitHub in one request
+- **Parameters**:
+  - `repo_url` (string): GitHub repository URL or owner/repo format
+  - `file_paths_str` (string): Comma-separated list of file paths
+  - `branch` (string, optional): Branch name (default: "main")
+- **Returns**: JSON with all file contents, success/failure counts, and metadata
+- **Usage**: Batch fetch multiple documentation files efficiently
 ## ✨ Key Features
 - **GitHub Integration**: Fetch markdown files directly from any GitHub repository
 - **Smart Q&A**: Ask questions about documentation with source citations
 - **Repository Management**: Track multiple repositories and their statistics
+### 🎯 MCP Server Configuration
+Add this configuration to your MCP client (Cursor, Windsurf, Cline):
+```json
+{
+  "mcpServers": {
+    "doc-mcp": {
+      "url": "https://agents-mcp-hackathon-doc-mcp.hf.space/gradio_api/mcp/sse"
+    }
+  }
+}
+```
 ## 🚀 Quick Start
 1. **Setup Environment**:
 # Configure environment
 cp .env.example .env
+# Add your GITHUB_API_KEY, NEBIUS_API_KEY and MONGODB_URI
 ```
 2. **Run the App**:
 ## Workflow
+ - Input GitHub URL
+ - Scan for markdown files
+ - Select files to process
+ - Generate embeddings and Store in vector database
+ - Ask questions
+ - Search similar content
+ - Generate contextual answers
+ - Show sources and citations
 ## 🛠️ Technology Stack

app.py CHANGED Viewed

@@ -9,11 +9,15 @@ from dotenv import load_dotenv
 from llama_index.core import Settings
 from llama_index.core.text_splitter import SentenceSplitter
-from rag.config import (delete_repository_data, embed_model,
-                        get_available_repos, get_repo_details,
-                        get_repository_stats, llm)
-from rag.github_file_loader import \
-    fetch_markdown_files as fetch_files_with_loader
 from rag.github_file_loader import fetch_repository_files, load_github_files
 from rag.ingest import ingest_documents_async
 from rag.query import QueryRetriever
@@ -27,7 +31,7 @@ Settings.node_parser = SentenceSplitter(chunk_size=3072)
 def get_available_repositories():
     return get_available_repos()
 def start_file_loading(
     repo_url: str, selected_files: List[str], current_progress: Dict
@@ -170,8 +174,8 @@ def start_file_loading(
         )
         return current_progress
 def start_vector_ingestion(current_progress: Dict):
     """Step 2: Ingest loaded documents into vector store"""
     print("\n🔄 STARTING VECTOR INGESTION STEP")
@@ -235,7 +239,9 @@ def start_vector_ingestion(current_progress: Dict):
         if isinstance(failed_files_data, list):
             failed_files_count = len(failed_files_data)
         else:
-            failed_files_count = failed_files_data if isinstance(failed_files_data, int) else 0
         # Update final success state with repository update flag
         current_progress.update(
@@ -269,7 +275,9 @@ def start_vector_ingestion(current_progress: Dict):
         if isinstance(failed_files_data, list):
             failed_files_count = len(failed_files_data)
         else:
-            failed_files_count = failed_files_data if isinstance(failed_files_data, int) else 0
         current_progress.update(
             {
@@ -287,11 +295,12 @@ def start_vector_ingestion(current_progress: Dict):
         return current_progress
 def start_file_loading_generator(
     repo_url: str, selected_files: List[str], current_progress: Dict
 ):
     """Step 1: Load files from GitHub with yield-based real-time updates"""
     print("\n🔄 STARTING FILE LOADING STEP")
     print(f"📍 Repository: {repo_url}")
     print(f"📋 Selected files: {len(selected_files)} files")
@@ -352,7 +361,7 @@ def start_file_loading_generator(
             "repo_name": repo_name,
         }
         yield initial_progress
         time.sleep(0.5)
         for i in range(0, len(selected_files), batch_size):
@@ -421,13 +430,13 @@ def start_file_loading_generator(
                     "repo_name": repo_name,
                 }
                 yield batch_complete_progress
                 time.sleep(0.3)
             except Exception as batch_error:
                 print(f"❌ Batch processing error: {batch_error}")
                 all_failed.extend(batch)
                 error_progress = {
                     "status": "loading",
                     "message": f"⚠️ Error in batch {current_batch_num}",
@@ -451,7 +460,7 @@ def start_file_loading_generator(
             "message": f"✅ File Loading Complete! Loaded {len(all_documents)} documents",
             "progress": 100,
             "phase": "Files Loaded Successfully",
-            "details": f"🎯 Final Results:\n✅ Successfully loaded: {len(all_documents)} documents\n❌ Failed files: {len(all_failed)}\n⏱️ Total time: {loading_time:.1f}s\n📊 Success rate: {(len(all_documents)/(len(all_documents)+len(all_failed))*100):.1f}%",
             "step": "file_loading_complete",
             "loaded_documents": all_documents,
             "failed_files": all_failed,
@@ -481,6 +490,7 @@ def start_file_loading_generator(
         yield error_progress
         return error_progress
 # Progress display component
 def format_progress_display(progress_state: Dict) -> str:
     """Format progress state into readable display with enhanced details"""
@@ -496,20 +506,20 @@ def format_progress_display(progress_state: Dict) -> str:
     # Enhanced progress bar
     filled = int(progress / 2.5)  # 40 chars total
     progress_bar = "█" * filled + "░" * (40 - filled)
     # Status emoji mapping
     status_emoji = {
         "loading": "⏳",
-        "loaded": "���",
         "vectorizing": "🧠",
         "complete": "🎉",
-        "error": "❌"
     }
     emoji = status_emoji.get(status, "🔄")
     output = f"{emoji} **{message}**\n\n"
     # Phase and progress section
     output += f"📊 **Current Phase:** {phase}\n"
     output += f"📈 **Progress:** {progress:.1f}%\n"
@@ -521,14 +531,14 @@ def format_progress_display(progress_state: Dict) -> str:
         total = progress_state.get("total_files", 0)
         successful = progress_state.get("successful_files", 0)
         failed = progress_state.get("failed_files", 0)
         if total > 0:
             output += "📁 **File Processing Status:**\n"
             output += f"   • Total files: {total}\n"
             output += f"   • Processed: {processed}/{total}\n"
             output += f"   • ✅ Successful: {successful}\n"
             output += f"   • ❌ Failed: {failed}\n"
             if "current_batch" in progress_state and "total_batches" in progress_state:
                 output += f"   • 📦 Current batch: {progress_state['current_batch']}/{progress_state['total_batches']}\n"
             output += "\n"
@@ -537,7 +547,7 @@ def format_progress_display(progress_state: Dict) -> str:
     elif progress_state.get("step") == "vector_ingestion":
         docs_count = progress_state.get("documents_count", 0)
         repo_name = progress_state.get("repo_name", "Unknown")
         if docs_count > 0:
             output += "🧠 **Vector Processing Status:**\n"
             output += f"   • Repository: {repo_name}\n"
@@ -564,14 +574,18 @@ def format_progress_display(progress_state: Dict) -> str:
         output += f"⏱️ **Total time:** {total_time:.1f} seconds\n"
         output += f"   ├─ File loading: {loading_time:.1f}s\n"
         output += f"   └─ Vector processing: {vector_time:.1f}s\n"
-        output += f"📊 **Processing rate:** {docs_processed/total_time:.1f} docs/second\n\n"
         output += "🚀 **Next Step:** Go to the 'Query Interface' tab to start asking questions!"
     elif status == "error":
         error = progress_state.get("error", "Unknown error")
         output += "\n💥 **ERROR OCCURRED**\n"
         output += "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n"
-        output += f"❌ **Error Details:** {error[:300]}{'...' if len(error) > 300 else ''}\n"
         output += "\n🔧 **Troubleshooting Tips:**\n"
         output += "   • Check your GitHub token permissions\n"
         output += "   • Verify repository URL format\n"
@@ -596,7 +610,7 @@ with gr.Blocks(title="Doc-MCP") as demo:
         with gr.TabItem("📥 Documentation Ingestion"):
             gr.Markdown("### 🚀 Two-Step Documentation Processing Pipeline")
             gr.Markdown(
-                 "**Step 1:** Fetch markdown files from GitHub repository → **Step 2:** Generate vector embeddings and store in MongoDB Atlas"
             )
             with gr.Row():
@@ -605,28 +619,38 @@ with gr.Blocks(title="Doc-MCP") as demo:
                         label="📂 GitHub Repository URL",
                         placeholder="Enter: owner/repo or https://github.com/owner/repo (e.g., gradio-app/gradio)",
                         value="",
-                        info="Enter any GitHub repository containing markdown documentation"
                     )
-                    load_btn = gr.Button("🔍 Discover Documentation Files", variant="secondary")
                 with gr.Column(scale=1):
                     status_output = gr.Textbox(
-                        label="Repository Discovery Status", interactive=False, lines=4,
-                        placeholder="Repository scanning results will appear here..."
                     )
             with gr.Row():
-                select_all_btn = gr.Button("📋 Select All Documents", variant="secondary")
                 clear_all_btn = gr.Button("🗑️ Clear Selection", variant="secondary")
             # File selection
             with gr.Accordion(label="Available Documentation Files"):
                 file_selector = gr.CheckboxGroup(
-                    choices=[], label="Select Markdown Files for RAG Processing", visible=False
                 )
             # Two-step ingestion controls
             gr.Markdown("### 🔄 RAG Pipeline Execution")
-            gr.Markdown("Process your documentation through our advanced RAG pipeline using Nebius AI embeddings and MongoDB Atlas vector storage.")
             with gr.Row():
                 with gr.Column():
@@ -656,7 +680,6 @@ with gr.Blocks(title="Doc-MCP") as demo:
                 lines=25,
                 value="🚀 Ready to start two-step ingestion process...\n\n📋 Steps:\n1️⃣ Load files from GitHub repository\n2️⃣ Generate embeddings and store in vector database",
                 max_lines=30,
-                show_copy_button=True,
             )
             # Event handlers
@@ -694,12 +717,18 @@ with gr.Blocks(title="Doc-MCP") as demo:
                         gr.Button(interactive=False),
                     )
-            def start_step1_generator(repo_url: str, selected_files: List[str], current_progress: Dict):
                 """Start Step 1 with generator-based real-time progress updates"""
-                for progress_update in start_file_loading_generator(repo_url, selected_files, current_progress.copy()):
                     progress_text = format_progress_display(progress_update)
-                    step2_enabled = progress_update.get("step") == "file_loading_complete"
                     yield (
                         progress_update,
                         progress_text,
@@ -799,23 +828,24 @@ with gr.Blocks(title="Doc-MCP") as demo:
                     # Repository selection - Dropdown that becomes textbox when selected
                     with gr.Row():
                         repo_dropdown = gr.Dropdown(
-                            choices=get_available_repositories() or ["No repositories available"],
                             label="📚 Select Documentation Repository",
                             value=None,
                             interactive=True,
                             allow_custom_value=True,
-                            info="Choose from available repositories"
                         )
                         # Hidden textbox that will become visible when repo is selected
                         selected_repo_textbox = gr.Textbox(
                             label="🎯 Selected Repository",
                             value="",
                             interactive=False,
                             visible=False,
-                            info="Currently selected repository for querying"
                         )
                     refresh_repos_btn = gr.Button(
                         "🔄 Refresh Repository List", variant="secondary", size="sm"
                     )
@@ -833,10 +863,12 @@ with gr.Blocks(title="Doc-MCP") as demo:
                         label="💭 Ask About Your Documentation",
                         placeholder="How do I implement a custom component? What are the available API endpoints? How to configure the system?",
                         lines=3,
-                        info="Ask natural language questions about your documentation"
                     )
-                    query_btn = gr.Button("🚀 Search Documentation", variant="primary", size="lg")
                     # Response display as text area
                     response_output = gr.Textbox(
@@ -844,53 +876,68 @@ with gr.Blocks(title="Doc-MCP") as demo:
                         value="Your AI-powered documentation response will appear here with contextual information and source citations...",
                         lines=10,
                         interactive=False,
-                        info="Generated using Nebius LLM with retrieved documentation context"
                     )
                 with gr.Column(scale=2):
                     gr.Markdown("### 📖 Source References")
-                    gr.Markdown("View the exact documentation sources used to generate the response, with relevance scores and GitHub links.")
                     # Source nodes display as JSON
                     sources_output = gr.JSON(
                         label="📎 Source Citations & Metadata",
                         value={
                             "message": "Source documentation excerpts with relevance scores will appear here after your query...",
-                            "info": "Each source includes file path, relevance score, and content snippet"
                         },
                     )
             # Event handlers
             def handle_repo_selection(selected_repo):
                 """Handle repository selection from dropdown"""
-                if not selected_repo or selected_repo in ["No repositories available", ""]:
                     return (
                         gr.Dropdown(visible=True),  # Keep dropdown visible
                         gr.Textbox(visible=False, value=""),  # Hide textbox
-                        gr.Button(interactive=False)  # Disable query button
                     )
                 else:
                     return (
                         gr.Dropdown(visible=False),  # Hide dropdown
-                        gr.Textbox(visible=True, value=selected_repo),  # Show textbox with selected repo
-                        gr.Button(interactive=True)  # Enable query button
                     )
             def reset_repo_selection():
                 """Reset to show dropdown again"""
                 try:
-                    repos = get_available_repositories() or ["No repositories available"]
                     return (
-                        gr.Dropdown(choices=repos, value=None, visible=True),  # Show dropdown with refreshed choices
                         gr.Textbox(visible=False, value=""),  # Hide textbox
-                        gr.Button(interactive=False)  # Disable query button
                     )
                 except Exception as e:
                     print(f"Error refreshing repository list: {e}")
                     return (
-                        gr.Dropdown(choices=["Error loading repositories"], value=None, visible=True),
                         gr.Textbox(visible=False, value=""),
-                        gr.Button(interactive=False)
                     )
             def get_available_docs_repo():
@@ -903,11 +950,15 @@ with gr.Blocks(title="Doc-MCP") as demo:
                 try:
                     repos = get_available_repositories()
                     if not repos:
-                        repos = ["No repositories available - Please ingest documentation first"]
                     return gr.Dropdown(choices=repos, value=None)
                 except Exception as e:
                     print(f"Error refreshing repository list: {e}")
-                    return gr.Dropdown(choices=["Error loading repositories"], value=None)
             # Simple query handler
             def handle_query(repo: str, mode: str, query: str):
@@ -923,11 +974,14 @@ with gr.Blocks(title="Doc-MCP") as demo:
                 if not query.strip():
                     return {"error": "Please enter a query."}
-                if not repo or repo in ["No repositories available", "Error loading repositories", ""]:
                     return {"error": "Please select a valid repository."}
                 try:
                     # Create query retriever for the selected repo
                     retriever = QueryRetriever(repo)
@@ -967,22 +1021,22 @@ with gr.Blocks(title="Doc-MCP") as demo:
                 return response_text, source_nodes
             # Wire up events
             # Handle repository selection from dropdown
             repo_dropdown.change(
                 fn=handle_repo_selection,
                 inputs=[repo_dropdown],
                 outputs=[repo_dropdown, selected_repo_textbox, query_btn],
-                show_api=False
             )
             # Handle refresh button - resets to dropdown view
             refresh_repos_btn.click(
                 fn=reset_repo_selection,
                 outputs=[repo_dropdown, selected_repo_textbox, query_btn],
-                show_api=False
             )
             # Also provide API endpoint for listing repositories
             refresh_repos_btn.click(
                 fn=get_available_docs_repo,
@@ -993,7 +1047,11 @@ with gr.Blocks(title="Doc-MCP") as demo:
             # Query button uses the textbox value (not dropdown)
             query_btn.click(
                 fn=make_query,
-                inputs=[selected_repo_textbox, query_mode, query_input],  # Use textbox, not dropdown
                 outputs=[response_output, sources_output],
                 api_name="query_documentation",
             )
@@ -1001,7 +1059,11 @@ with gr.Blocks(title="Doc-MCP") as demo:
             # Also allow Enter key to trigger query
             query_input.submit(
                 fn=make_query,
-                inputs=[selected_repo_textbox, query_mode, query_input],  # Use textbox, not dropdown
                 outputs=[response_output, sources_output],
                 show_api=False,
             )
@@ -1010,17 +1072,21 @@ with gr.Blocks(title="Doc-MCP") as demo:
         # Tab 3: Repository Management
         # ================================
         with gr.TabItem("🗂️ Repository Management"):
-            gr.Markdown("Manage your ingested repositories - view details and delete repositories when needed.")
             with gr.Row():
                 with gr.Column(scale=1):
                     gr.Markdown("### 📊 Repository Statistics")
                     stats_display = gr.JSON(
                         label="Database Statistics",
-                        value={"message": "Click refresh to load statistics..."}
                     )
-                    refresh_stats_btn = gr.Button("🔄 Refresh Statistics", variant="secondary")
                 with gr.Column(scale=2):
                     gr.Markdown("### 📋 Repository Details")
                     repos_table = gr.Dataframe(
@@ -1028,13 +1094,17 @@ with gr.Blocks(title="Doc-MCP") as demo:
                         datatype=["str", "number", "str"],
                         label="Ingested Repositories",
                         interactive=False,
-                        wrap=True
                     )
-                    refresh_repos_btn = gr.Button("🔄 Refresh Repository List", variant="secondary")
             gr.Markdown("### 🗑️ Delete Repository")
-            gr.Markdown("**⚠️ Warning:** This will permanently delete all documents and metadata for the selected repository.")
             with gr.Row():
                 with gr.Column(scale=2):
                     delete_repo_dropdown = gr.Dropdown(
@@ -1044,33 +1114,31 @@ with gr.Blocks(title="Doc-MCP") as demo:
                         interactive=True,
                         allow_custom_value=False,
                     )
                     # Confirmation checkbox
                     confirm_delete = gr.Checkbox(
-                        label="I understand this action cannot be undone",
-                        value=False
                     )
                     delete_btn = gr.Button(
-                        "🗑️ Delete Repository",
-                        variant="stop",
                         size="lg",
-                        interactive=False
                     )
                 with gr.Column(scale=1):
                     deletion_status = gr.Textbox(
                         label="Deletion Status",
                         value="Select a repository and confirm to enable deletion.",
                         interactive=False,
-                        lines=6
                     )
             # Management functions
             def load_repository_stats():
                 """Load overall repository statistics"""
                 try:
                     stats = get_repository_stats()
                     return stats
                 except Exception as e:
@@ -1079,29 +1147,30 @@ with gr.Blocks(title="Doc-MCP") as demo:
             def load_repository_details():
                 """Load detailed repository information as a table"""
                 try:
                     details = get_repo_details()
                     if not details:
                         return [["No repositories found", 0, "N/A"]]
                     # Format for dataframe
                     table_data = []
                     for repo in details:
                         last_updated = repo.get("last_updated", "Unknown")
-                        if hasattr(last_updated, 'strftime'):
                             last_updated = last_updated.strftime("%Y-%m-%d %H:%M")
                         elif last_updated != "Unknown":
                             last_updated = str(last_updated)
-                        table_data.append([
-                            repo.get("repo_name", "Unknown"),
-                            repo.get("file_count", 0),
-                            last_updated
-                        ])
                     return table_data
                 except Exception as e:
                     return [["Error loading repositories", 0, str(e)]]
@@ -1124,17 +1193,23 @@ with gr.Blocks(title="Doc-MCP") as demo:
             def delete_repository(repo_name: str, confirmed: bool):
                 """Delete the selected repository"""
                 if not repo_name:
-                    return "❌ No repository selected.", gr.Dropdown(choices=[]), gr.Checkbox(value=False)
-                if not confirmed:
-                    return "❌ Please confirm deletion by checking the checkbox.", gr.Dropdown(choices=[]), gr.Checkbox(value=False)
-                try:
                     # Perform deletion
                     result = delete_repository_data(repo_name)
                     # Prepare status message
                     status_msg = result["message"]
                     if result["success"]:
@@ -1142,79 +1217,67 @@ with gr.Blocks(title="Doc-MCP") as demo:
                         status_msg += f"\n- Vector documents removed: {result['vector_docs_deleted']}"
                         status_msg += f"\n- Repository record deleted: {'Yes' if result['repo_record_deleted'] else 'No'}"
                         status_msg += f"\n\n✅ Repository '{repo_name}' has been completely removed."
                     # Update dropdown (remove deleted repo)
                     updated_dropdown = update_delete_dropdown()
                     # Reset confirmation checkbox
                     reset_checkbox = gr.Checkbox(value=False)
                     return status_msg, updated_dropdown, reset_checkbox
                 except Exception as e:
                     error_msg = f"❌ Error deleting repository: {str(e)}"
                     return error_msg, gr.Dropdown(choices=[]), gr.Checkbox(value=False)
             # Wire up management events
             refresh_stats_btn.click(
-                fn=load_repository_stats,
-                outputs=[stats_display],
-                show_api=False
             )
             refresh_repos_btn.click(
-                fn=load_repository_details,
-                outputs=[repos_table],
-                show_api=False
             )
             # Update delete dropdown when refreshing repos
             refresh_repos_btn.click(
                 fn=update_delete_dropdown,
                 outputs=[delete_repo_dropdown],
-                show_api=False
             )
             # Enable/disable delete button based on selection and confirmation
             delete_repo_dropdown.change(
                 fn=check_delete_button_state,
                 inputs=[delete_repo_dropdown, confirm_delete],
                 outputs=[delete_btn],
-                show_api=False
             )
             confirm_delete.change(
                 fn=check_delete_button_state,
                 inputs=[delete_repo_dropdown, confirm_delete],
                 outputs=[delete_btn],
-                show_api=False
             )
             # Delete repository
             delete_btn.click(
                 fn=delete_repository,
                 inputs=[delete_repo_dropdown, confirm_delete],
                 outputs=[deletion_status, delete_repo_dropdown, confirm_delete],
-                show_api=False
             )
             # Load data on tab load
-            demo.load(
-                fn=load_repository_stats,
-                outputs=[stats_display],
-                show_api=False
-            )
-            demo.load(
-                fn=load_repository_details,
-                outputs=[repos_table],
-                show_api=False
-            )
             demo.load(
                 fn=update_delete_dropdown,
                 outputs=[delete_repo_dropdown],
-                show_api=False
             )
         # ================================
@@ -1222,96 +1285,102 @@ with gr.Blocks(title="Doc-MCP") as demo:
         # ================================
         with gr.TabItem("🔍 GitHub File Search", visible=False):
             gr.Markdown("### 🔧 GitHub Repository File Search API")
-            gr.Markdown("Pure API endpoints for GitHub file operations - all responses in JSON format")
             with gr.Row():
                 with gr.Column():
                     gr.Markdown("#### 📋 List Repository Files")
                     # Repository input for file operations
                     api_repo_input = gr.Textbox(
                         label="Repository URL",
                         placeholder="owner/repo or https://github.com/owner/repo",
                         value="",
-                        info="GitHub repository to scan"
                     )
                     # Branch selection
                     api_branch_input = gr.Textbox(
                         label="Branch",
                         value="main",
                         placeholder="main",
-                        info="Branch to search (default: main)"
                     )
                     # File extensions
                     api_extensions_input = gr.Textbox(
                         label="File Extensions (comma-separated)",
                         value=".md,.mdx",
                         placeholder=".md,.mdx,.txt",
-                        info="File extensions to include"
                     )
                     # List files button
                     list_files_btn = gr.Button("📋 List Files", variant="primary")
                 with gr.Column():
                     gr.Markdown("#### 📄 Get Single File")
                     # Single file inputs
                     single_repo_input = gr.Textbox(
                         label="Repository URL",
                         placeholder="owner/repo or https://github.com/owner/repo",
                         value="",
-                        info="GitHub repository"
                     )
                     single_file_input = gr.Textbox(
                         label="File Path",
                         placeholder="docs/README.md",
                         value="",
-                        info="Path to specific file in repository"
                     )
                     single_branch_input = gr.Textbox(
                         label="Branch",
                         value="main",
                         placeholder="main",
-                        info="Branch name (default: main)"
                     )
                     # Get single file button
-                    get_single_btn = gr.Button("📄 Get Single File", variant="secondary")
             with gr.Row():
                 with gr.Column():
                     gr.Markdown("#### 📚 Get Multiple Files")
                     # Multiple files inputs
                     multiple_repo_input = gr.Textbox(
                         label="Repository URL",
                         placeholder="owner/repo or https://github.com/owner/repo",
                         value="",
-                        info="GitHub repository"
                     )
                     multiple_files_input = gr.Textbox(
                         label="File Paths (comma-separated)",
                         placeholder="README.md,docs/guide.md,api/overview.md",
                         value="",
                         lines=3,
-                        info="Comma-separated list of file paths"
                     )
                     multiple_branch_input = gr.Textbox(
                         label="Branch",
                         value="main",
                         placeholder="main",
-                        info="Branch name (default: main)"
                     )
                     # Get multiple files button
-                    get_multiple_btn = gr.Button("📚 Get Multiple Files", variant="secondary")
             # Single JSON output for all operations
             gr.Markdown("### 📊 API Response")
@@ -1319,41 +1388,44 @@ with gr.Blocks(title="Doc-MCP") as demo:
                 label="JSON Response",
                 value={
                     "message": "API responses will appear here",
-                    "info": "Use the buttons above to interact with GitHub repositories"
-                }
             )
             # Pure API Functions (JSON only responses)
-            def list_repository_files(repo_url: str, branch: str = "main", extensions: str = ".md,.mdx"):
                 """
                 List all files in a GitHub repository with specified extensions
                 Args:
                     repo_url: GitHub repository URL or owner/repo format
                     branch: Branch name to search (default: main)
                     extensions: Comma-separated file extensions (default: .md,.mdx)
                 Returns:
                     JSON response with file list and metadata
                 """
                 try:
                     if not repo_url.strip():
                         return {"success": False, "error": "Repository URL is required"}
                     # Parse extensions list
-                    ext_list = [ext.strip() for ext in extensions.split(",") if ext.strip()]
                     if not ext_list:
                         ext_list = [".md", ".mdx"]
                     # Get files list
                     files, status_message = fetch_repository_files(
                         repo_url=repo_url,
                         file_extensions=ext_list,
                         github_token=os.getenv("GITHUB_API_KEY"),
-                        branch=branch
                     )
                     if files:
                         return {
                             "success": True,
@@ -1362,7 +1434,7 @@ with gr.Blocks(title="Doc-MCP") as demo:
                             "extensions": ext_list,
                             "total_files": len(files),
                             "files": files,
-                            "status": status_message
                         }
                     else:
                         return {
@@ -1372,36 +1444,36 @@ with gr.Blocks(title="Doc-MCP") as demo:
                             "extensions": ext_list,
                             "total_files": 0,
                             "files": [],
-                            "error": status_message or "No files found"
                         }
                 except Exception as e:
                     return {
                         "success": False,
                         "error": f"Failed to list files: {str(e)}",
                         "repository": repo_url,
-                        "branch": branch
                     }
             def get_single_file(repo_url: str, file_path: str, branch: str = "main"):
                 """
                 Retrieve a single file from GitHub repository
                 Args:
-                    repo_url: GitHub repository URL or owner/repo format
                     file_path: Path to the file in the repository
                     branch: Branch name (default: main)
                 Returns:
                     JSON response with file content and metadata
                 """
                 try:
                     if not repo_url.strip():
                         return {"success": False, "error": "Repository URL is required"}
                     if not file_path.strip():
                         return {"success": False, "error": "File path is required"}
                     # Parse repo name
                     if "github.com" in repo_url:
                         repo_name = (
@@ -1411,15 +1483,15 @@ with gr.Blocks(title="Doc-MCP") as demo:
                         )
                     else:
                         repo_name = repo_url.strip()
                     # Load single file
                     documents, failed = load_github_files(
                         repo_name=repo_name,
                         file_paths=[file_path.strip()],
                         branch=branch,
-                        github_token=os.getenv("GITHUB_API_KEY")
                     )
                     if documents and len(documents) > 0:
                         doc = documents[0]
                         return {
@@ -1432,7 +1504,7 @@ with gr.Blocks(title="Doc-MCP") as demo:
                             "content": doc.text,
                             "metadata": doc.metadata,
                             "url": doc.metadata.get("url", ""),
-                            "raw_url": doc.metadata.get("raw_url", "")
                         }
                     else:
                         error_msg = f"Failed to retrieve file: {failed[0] if failed else 'File not found or access denied'}"
@@ -1441,43 +1513,52 @@ with gr.Blocks(title="Doc-MCP") as demo:
                             "repository": repo_name,
                             "branch": branch,
                             "file_path": file_path,
-                            "error": error_msg
                         }
                 except Exception as e:
                     return {
                         "success": False,
                         "error": f"Failed to get single file: {str(e)}",
                         "repository": repo_url,
                         "file_path": file_path,
-                        "branch": branch
                     }
-            def get_multiple_files(repo_url: str, file_paths_str: str, branch: str = "main"):
                 """
                 Retrieve multiple files from GitHub repository
                 Args:
                     repo_url: GitHub repository URL or owner/repo format
                     file_paths_str: Comma-separated string of file paths
                     branch: Branch name (default: main)
                 Returns:
                     JSON response with multiple file contents and metadata
                 """
                 try:
                     if not repo_url.strip():
                         return {"success": False, "error": "Repository URL is required"}
                     if not file_paths_str.strip():
                         return {"success": False, "error": "File paths are required"}
                     # Parse file paths from comma-separated string
-                    file_paths = [path.strip() for path in file_paths_str.split(",") if path.strip()]
                     if not file_paths:
-                        return {"success": False, "error": "No valid file paths provided"}
                     # Parse repo name
                     if "github.com" in repo_url:
                         repo_name = (
@@ -1487,15 +1568,15 @@ with gr.Blocks(title="Doc-MCP") as demo:
                         )
                     else:
                         repo_name = repo_url.strip()
                     # Load multiple files
                     documents, failed = load_github_files(
                         repo_name=repo_name,
                         file_paths=file_paths,
                         branch=branch,
-                        github_token=os.getenv("GITHUB_API_KEY")
                     )
                     # Process successful documents
                     successful_files = []
                     for doc in documents:
@@ -1506,10 +1587,10 @@ with gr.Blocks(title="Doc-MCP") as demo:
                             "content": doc.text,
                             "metadata": doc.metadata,
                             "url": doc.metadata.get("url", ""),
-                            "raw_url": doc.metadata.get("raw_url", "")
                         }
                         successful_files.append(file_data)
                     return {
                         "success": True,
                         "repository": repo_name,
@@ -1520,16 +1601,16 @@ with gr.Blocks(title="Doc-MCP") as demo:
                         "files": successful_files,
                         "failed_file_paths": failed,
                         "total_content_size": sum(len(doc.text) for doc in documents),
-                        "requested_file_paths": file_paths
                     }
                 except Exception as e:
                     return {
                         "success": False,
                         "error": f"Failed to get multiple files: {str(e)}",
                         "repository": repo_url,
                         "file_paths": file_paths_str,
-                        "branch": branch
                     }
             # Wire up the GitHub file search events - all output to single JSON component
@@ -1537,21 +1618,208 @@ with gr.Blocks(title="Doc-MCP") as demo:
                 fn=list_repository_files,
                 inputs=[api_repo_input, api_branch_input, api_extensions_input],
                 outputs=[api_response_output],
-                api_name="list_repository_files"
             )
             get_single_btn.click(
                 fn=get_single_file,
                 inputs=[single_repo_input, single_file_input, single_branch_input],
                 outputs=[api_response_output],
-                api_name="get_single_file"
             )
             get_multiple_btn.click(
                 fn=get_multiple_files,
-                inputs=[multiple_repo_input, multiple_files_input, multiple_branch_input],
                 outputs=[api_response_output],
-                api_name="get_multiple_files"
             )
 if __name__ == "__main__":
     demo.launch(mcp_server=True)

 from llama_index.core import Settings
 from llama_index.core.text_splitter import SentenceSplitter
+from rag.config import (
+    delete_repository_data,
+    embed_model,
+    get_available_repos,
+    get_repo_details,
+    get_repository_stats,
+    llm,
+)
+from rag.github_file_loader import fetch_markdown_files as fetch_files_with_loader
 from rag.github_file_loader import fetch_repository_files, load_github_files
 from rag.ingest import ingest_documents_async
 from rag.query import QueryRetriever
 def get_available_repositories():
     return get_available_repos()
 def start_file_loading(
     repo_url: str, selected_files: List[str], current_progress: Dict
         )
         return current_progress
 def start_vector_ingestion(current_progress: Dict):
     """Step 2: Ingest loaded documents into vector store"""
     print("\n🔄 STARTING VECTOR INGESTION STEP")
         if isinstance(failed_files_data, list):
             failed_files_count = len(failed_files_data)
         else:
+            failed_files_count = (
+                failed_files_data if isinstance(failed_files_data, int) else 0
+            )
         # Update final success state with repository update flag
         current_progress.update(
         if isinstance(failed_files_data, list):
             failed_files_count = len(failed_files_data)
         else:
+            failed_files_count = (
+                failed_files_data if isinstance(failed_files_data, int) else 0
+            )
         current_progress.update(
             {
         return current_progress
 def start_file_loading_generator(
     repo_url: str, selected_files: List[str], current_progress: Dict
 ):
     """Step 1: Load files from GitHub with yield-based real-time updates"""
     print("\n🔄 STARTING FILE LOADING STEP")
     print(f"📍 Repository: {repo_url}")
     print(f"📋 Selected files: {len(selected_files)} files")
             "repo_name": repo_name,
         }
         yield initial_progress
         time.sleep(0.5)
         for i in range(0, len(selected_files), batch_size):
                     "repo_name": repo_name,
                 }
                 yield batch_complete_progress
                 time.sleep(0.3)
             except Exception as batch_error:
                 print(f"❌ Batch processing error: {batch_error}")
                 all_failed.extend(batch)
                 error_progress = {
                     "status": "loading",
                     "message": f"⚠️ Error in batch {current_batch_num}",
             "message": f"✅ File Loading Complete! Loaded {len(all_documents)} documents",
             "progress": 100,
             "phase": "Files Loaded Successfully",
+            "details": f"🎯 Final Results:\n✅ Successfully loaded: {len(all_documents)} documents\n❌ Failed files: {len(all_failed)}\n⏱️ Total time: {loading_time:.1f}s\n📊 Success rate: {(len(all_documents) / (len(all_documents) + len(all_failed)) * 100):.1f}%",
             "step": "file_loading_complete",
             "loaded_documents": all_documents,
             "failed_files": all_failed,
         yield error_progress
         return error_progress
 # Progress display component
 def format_progress_display(progress_state: Dict) -> str:
     """Format progress state into readable display with enhanced details"""
     # Enhanced progress bar
     filled = int(progress / 2.5)  # 40 chars total
     progress_bar = "█" * filled + "░" * (40 - filled)
     # Status emoji mapping
     status_emoji = {
         "loading": "⏳",
+        "loaded": "✅",
         "vectorizing": "🧠",
         "complete": "🎉",
+        "error": "❌",
     }
     emoji = status_emoji.get(status, "🔄")
     output = f"{emoji} **{message}**\n\n"
     # Phase and progress section
     output += f"📊 **Current Phase:** {phase}\n"
     output += f"📈 **Progress:** {progress:.1f}%\n"
         total = progress_state.get("total_files", 0)
         successful = progress_state.get("successful_files", 0)
         failed = progress_state.get("failed_files", 0)
         if total > 0:
             output += "📁 **File Processing Status:**\n"
             output += f"   • Total files: {total}\n"
             output += f"   • Processed: {processed}/{total}\n"
             output += f"   • ✅ Successful: {successful}\n"
             output += f"   • ❌ Failed: {failed}\n"
             if "current_batch" in progress_state and "total_batches" in progress_state:
                 output += f"   • 📦 Current batch: {progress_state['current_batch']}/{progress_state['total_batches']}\n"
             output += "\n"
     elif progress_state.get("step") == "vector_ingestion":
         docs_count = progress_state.get("documents_count", 0)
         repo_name = progress_state.get("repo_name", "Unknown")
         if docs_count > 0:
             output += "🧠 **Vector Processing Status:**\n"
             output += f"   • Repository: {repo_name}\n"
         output += f"⏱️ **Total time:** {total_time:.1f} seconds\n"
         output += f"   ├─ File loading: {loading_time:.1f}s\n"
         output += f"   └─ Vector processing: {vector_time:.1f}s\n"
+        output += (
+            f"📊 **Processing rate:** {docs_processed / total_time:.1f} docs/second\n\n"
+        )
         output += "🚀 **Next Step:** Go to the 'Query Interface' tab to start asking questions!"
     elif status == "error":
         error = progress_state.get("error", "Unknown error")
         output += "\n💥 **ERROR OCCURRED**\n"
         output += "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n"
+        output += (
+            f"❌ **Error Details:** {error[:300]}{'...' if len(error) > 300 else ''}\n"
+        )
         output += "\n🔧 **Troubleshooting Tips:**\n"
         output += "   • Check your GitHub token permissions\n"
         output += "   • Verify repository URL format\n"
         with gr.TabItem("📥 Documentation Ingestion"):
             gr.Markdown("### 🚀 Two-Step Documentation Processing Pipeline")
             gr.Markdown(
+                "**Step 1:** Fetch markdown files from GitHub repository → **Step 2:** Generate vector embeddings and store in MongoDB Atlas"
             )
             with gr.Row():
                         label="📂 GitHub Repository URL",
                         placeholder="Enter: owner/repo or https://github.com/owner/repo (e.g., gradio-app/gradio)",
                         value="",
+                        info="Enter any GitHub repository containing markdown documentation",
+                    )
+                    load_btn = gr.Button(
+                        "🔍 Discover Documentation Files", variant="secondary"
                     )
                 with gr.Column(scale=1):
                     status_output = gr.Textbox(
+                        label="Repository Discovery Status",
+                        interactive=False,
+                        lines=4,
+                        placeholder="Repository scanning results will appear here...",
                     )
             with gr.Row():
+                select_all_btn = gr.Button(
+                    "📋 Select All Documents", variant="secondary"
+                )
                 clear_all_btn = gr.Button("🗑️ Clear Selection", variant="secondary")
             # File selection
             with gr.Accordion(label="Available Documentation Files"):
                 file_selector = gr.CheckboxGroup(
+                    choices=[],
+                    label="Select Markdown Files for RAG Processing",
+                    visible=False,
                 )
             # Two-step ingestion controls
             gr.Markdown("### 🔄 RAG Pipeline Execution")
+            gr.Markdown(
+                "Process your documentation through our advanced RAG pipeline using Nebius AI embeddings and MongoDB Atlas vector storage."
+            )
             with gr.Row():
                 with gr.Column():
                 lines=25,
                 value="🚀 Ready to start two-step ingestion process...\n\n📋 Steps:\n1️⃣ Load files from GitHub repository\n2️⃣ Generate embeddings and store in vector database",
                 max_lines=30,
             )
             # Event handlers
                         gr.Button(interactive=False),
                     )
+            def start_step1_generator(
+                repo_url: str, selected_files: List[str], current_progress: Dict
+            ):
                 """Start Step 1 with generator-based real-time progress updates"""
+                for progress_update in start_file_loading_generator(
+                    repo_url, selected_files, current_progress.copy()
+                ):
                     progress_text = format_progress_display(progress_update)
+                    step2_enabled = (
+                        progress_update.get("step") == "file_loading_complete"
+                    )
                     yield (
                         progress_update,
                         progress_text,
                     # Repository selection - Dropdown that becomes textbox when selected
                     with gr.Row():
                         repo_dropdown = gr.Dropdown(
+                            choices=get_available_repositories()
+                            or ["No repositories available"],
                             label="📚 Select Documentation Repository",
                             value=None,
                             interactive=True,
                             allow_custom_value=True,
+                            info="Choose from available repositories",
                         )
                         # Hidden textbox that will become visible when repo is selected
                         selected_repo_textbox = gr.Textbox(
                             label="🎯 Selected Repository",
                             value="",
                             interactive=False,
                             visible=False,
+                            info="Currently selected repository for querying",
                         )
                     refresh_repos_btn = gr.Button(
                         "🔄 Refresh Repository List", variant="secondary", size="sm"
                     )
                         label="💭 Ask About Your Documentation",
                         placeholder="How do I implement a custom component? What are the available API endpoints? How to configure the system?",
                         lines=3,
+                        info="Ask natural language questions about your documentation",
                     )
+                    query_btn = gr.Button(
+                        "🚀 Search Documentation", variant="primary", size="lg"
+                    )
                     # Response display as text area
                     response_output = gr.Textbox(
                         value="Your AI-powered documentation response will appear here with contextual information and source citations...",
                         lines=10,
                         interactive=False,
+                        info="Generated using Nebius LLM with retrieved documentation context",
                     )
                 with gr.Column(scale=2):
                     gr.Markdown("### 📖 Source References")
+                    gr.Markdown(
+                        "View the exact documentation sources used to generate the response, with relevance scores and GitHub links."
+                    )
                     # Source nodes display as JSON
                     sources_output = gr.JSON(
                         label="📎 Source Citations & Metadata",
                         value={
                             "message": "Source documentation excerpts with relevance scores will appear here after your query...",
+                            "info": "Each source includes file path, relevance score, and content snippet",
                         },
                     )
             # Event handlers
             def handle_repo_selection(selected_repo):
                 """Handle repository selection from dropdown"""
+                if not selected_repo or selected_repo in [
+                    "No repositories available",
+                    "",
+                ]:
                     return (
                         gr.Dropdown(visible=True),  # Keep dropdown visible
                         gr.Textbox(visible=False, value=""),  # Hide textbox
+                        gr.Button(interactive=False),  # Disable query button
                     )
                 else:
                     return (
                         gr.Dropdown(visible=False),  # Hide dropdown
+                        gr.Textbox(
+                            visible=True, value=selected_repo
+                        ),  # Show textbox with selected repo
+                        gr.Button(interactive=True),  # Enable query button
                     )
             def reset_repo_selection():
                 """Reset to show dropdown again"""
                 try:
+                    repos = get_available_repositories() or [
+                        "No repositories available"
+                    ]
                     return (
+                        gr.Dropdown(
+                            choices=repos, value=None, visible=True
+                        ),  # Show dropdown with refreshed choices
                         gr.Textbox(visible=False, value=""),  # Hide textbox
+                        gr.Button(interactive=False),  # Disable query button
                     )
                 except Exception as e:
                     print(f"Error refreshing repository list: {e}")
                     return (
+                        gr.Dropdown(
+                            choices=["Error loading repositories"],
+                            value=None,
+                            visible=True,
+                        ),
                         gr.Textbox(visible=False, value=""),
+                        gr.Button(interactive=False),
                     )
             def get_available_docs_repo():
                 try:
                     repos = get_available_repositories()
                     if not repos:
+                        repos = [
+                            "No repositories available - Please ingest documentation first"
+                        ]
                     return gr.Dropdown(choices=repos, value=None)
                 except Exception as e:
                     print(f"Error refreshing repository list: {e}")
+                    return gr.Dropdown(
+                        choices=["Error loading repositories"], value=None
+                    )
             # Simple query handler
             def handle_query(repo: str, mode: str, query: str):
                 if not query.strip():
                     return {"error": "Please enter a query."}
+                if not repo or repo in [
+                    "No repositories available",
+                    "Error loading repositories",
+                    "",
+                ]:
                     return {"error": "Please select a valid repository."}
                 try:
                     # Create query retriever for the selected repo
                     retriever = QueryRetriever(repo)
                 return response_text, source_nodes
             # Wire up events
             # Handle repository selection from dropdown
             repo_dropdown.change(
                 fn=handle_repo_selection,
                 inputs=[repo_dropdown],
                 outputs=[repo_dropdown, selected_repo_textbox, query_btn],
+                show_api=False,
             )
             # Handle refresh button - resets to dropdown view
             refresh_repos_btn.click(
                 fn=reset_repo_selection,
                 outputs=[repo_dropdown, selected_repo_textbox, query_btn],
+                show_api=False,
             )
             # Also provide API endpoint for listing repositories
             refresh_repos_btn.click(
                 fn=get_available_docs_repo,
             # Query button uses the textbox value (not dropdown)
             query_btn.click(
                 fn=make_query,
+                inputs=[
+                    selected_repo_textbox,
+                    query_mode,
+                    query_input,
+                ],  # Use textbox, not dropdown
                 outputs=[response_output, sources_output],
                 api_name="query_documentation",
             )
             # Also allow Enter key to trigger query
             query_input.submit(
                 fn=make_query,
+                inputs=[
+                    selected_repo_textbox,
+                    query_mode,
+                    query_input,
+                ],  # Use textbox, not dropdown
                 outputs=[response_output, sources_output],
                 show_api=False,
             )
         # Tab 3: Repository Management
         # ================================
         with gr.TabItem("🗂️ Repository Management"):
+            gr.Markdown(
+                "Manage your ingested repositories - view details and delete repositories when needed."
+            )
             with gr.Row():
                 with gr.Column(scale=1):
                     gr.Markdown("### 📊 Repository Statistics")
                     stats_display = gr.JSON(
                         label="Database Statistics",
+                        value={"message": "Click refresh to load statistics..."},
+                    )
+                    refresh_stats_btn = gr.Button(
+                        "🔄 Refresh Statistics", variant="secondary"
                     )
                 with gr.Column(scale=2):
                     gr.Markdown("### 📋 Repository Details")
                     repos_table = gr.Dataframe(
                         datatype=["str", "number", "str"],
                         label="Ingested Repositories",
                         interactive=False,
+                        wrap=True,
+                    )
+                    refresh_repos_btn = gr.Button(
+                        "🔄 Refresh Repository List", variant="secondary"
                     )
             gr.Markdown("### 🗑️ Delete Repository")
+            gr.Markdown(
+                "**⚠️ Warning:** This will permanently delete all documents and metadata for the selected repository."
+            )
             with gr.Row():
                 with gr.Column(scale=2):
                     delete_repo_dropdown = gr.Dropdown(
                         interactive=True,
                         allow_custom_value=False,
                     )
                     # Confirmation checkbox
                     confirm_delete = gr.Checkbox(
+                        label="I understand this action cannot be undone", value=False
                     )
                     delete_btn = gr.Button(
+                        "🗑️ Delete Repository",
+                        variant="stop",
                         size="lg",
+                        interactive=False,
                     )
                 with gr.Column(scale=1):
                     deletion_status = gr.Textbox(
                         label="Deletion Status",
                         value="Select a repository and confirm to enable deletion.",
                         interactive=False,
+                        lines=6,
                     )
             # Management functions
             def load_repository_stats():
                 """Load overall repository statistics"""
                 try:
                     stats = get_repository_stats()
                     return stats
                 except Exception as e:
             def load_repository_details():
                 """Load detailed repository information as a table"""
                 try:
                     details = get_repo_details()
                     if not details:
                         return [["No repositories found", 0, "N/A"]]
                     # Format for dataframe
                     table_data = []
                     for repo in details:
                         last_updated = repo.get("last_updated", "Unknown")
+                        if hasattr(last_updated, "strftime"):
                             last_updated = last_updated.strftime("%Y-%m-%d %H:%M")
                         elif last_updated != "Unknown":
                             last_updated = str(last_updated)
+                        table_data.append(
+                            [
+                                repo.get("repo_name", "Unknown"),
+                                repo.get("file_count", 0),
+                                last_updated,
+                            ]
+                        )
                     return table_data
                 except Exception as e:
                     return [["Error loading repositories", 0, str(e)]]
             def delete_repository(repo_name: str, confirmed: bool):
                 """Delete the selected repository"""
                 if not repo_name:
+                    return (
+                        "❌ No repository selected.",
+                        gr.Dropdown(choices=[]),
+                        gr.Checkbox(value=False),
+                    )
+                if not confirmed:
+                    return (
+                        "❌ Please confirm deletion by checking the checkbox.",
+                        gr.Dropdown(choices=[]),
+                        gr.Checkbox(value=False),
+                    )
+                try:
                     # Perform deletion
                     result = delete_repository_data(repo_name)
                     # Prepare status message
                     status_msg = result["message"]
                     if result["success"]:
                         status_msg += f"\n- Vector documents removed: {result['vector_docs_deleted']}"
                         status_msg += f"\n- Repository record deleted: {'Yes' if result['repo_record_deleted'] else 'No'}"
                         status_msg += f"\n\n✅ Repository '{repo_name}' has been completely removed."
                     # Update dropdown (remove deleted repo)
                     updated_dropdown = update_delete_dropdown()
                     # Reset confirmation checkbox
                     reset_checkbox = gr.Checkbox(value=False)
                     return status_msg, updated_dropdown, reset_checkbox
                 except Exception as e:
                     error_msg = f"❌ Error deleting repository: {str(e)}"
                     return error_msg, gr.Dropdown(choices=[]), gr.Checkbox(value=False)
             # Wire up management events
             refresh_stats_btn.click(
+                fn=load_repository_stats, outputs=[stats_display], show_api=False
             )
             refresh_repos_btn.click(
+                fn=load_repository_details, outputs=[repos_table], show_api=False
             )
             # Update delete dropdown when refreshing repos
             refresh_repos_btn.click(
                 fn=update_delete_dropdown,
                 outputs=[delete_repo_dropdown],
+                show_api=False,
             )
             # Enable/disable delete button based on selection and confirmation
             delete_repo_dropdown.change(
                 fn=check_delete_button_state,
                 inputs=[delete_repo_dropdown, confirm_delete],
                 outputs=[delete_btn],
+                show_api=False,
             )
             confirm_delete.change(
                 fn=check_delete_button_state,
                 inputs=[delete_repo_dropdown, confirm_delete],
                 outputs=[delete_btn],
+                show_api=False,
             )
             # Delete repository
             delete_btn.click(
                 fn=delete_repository,
                 inputs=[delete_repo_dropdown, confirm_delete],
                 outputs=[deletion_status, delete_repo_dropdown, confirm_delete],
+                show_api=False,
             )
             # Load data on tab load
+            demo.load(fn=load_repository_stats, outputs=[stats_display], show_api=False)
+            demo.load(fn=load_repository_details, outputs=[repos_table], show_api=False)
             demo.load(
                 fn=update_delete_dropdown,
                 outputs=[delete_repo_dropdown],
+                show_api=False,
             )
         # ================================
         # ================================
         with gr.TabItem("🔍 GitHub File Search", visible=False):
             gr.Markdown("### 🔧 GitHub Repository File Search API")
+            gr.Markdown(
+                "Pure API endpoints for GitHub file operations - all responses in JSON format"
+            )
             with gr.Row():
                 with gr.Column():
                     gr.Markdown("#### 📋 List Repository Files")
                     # Repository input for file operations
                     api_repo_input = gr.Textbox(
                         label="Repository URL",
                         placeholder="owner/repo or https://github.com/owner/repo",
                         value="",
+                        info="GitHub repository to scan",
                     )
                     # Branch selection
                     api_branch_input = gr.Textbox(
                         label="Branch",
                         value="main",
                         placeholder="main",
+                        info="Branch to search (default: main)",
                     )
                     # File extensions
                     api_extensions_input = gr.Textbox(
                         label="File Extensions (comma-separated)",
                         value=".md,.mdx",
                         placeholder=".md,.mdx,.txt",
+                        info="File extensions to include",
                     )
                     # List files button
                     list_files_btn = gr.Button("📋 List Files", variant="primary")
                 with gr.Column():
                     gr.Markdown("#### 📄 Get Single File")
                     # Single file inputs
                     single_repo_input = gr.Textbox(
                         label="Repository URL",
                         placeholder="owner/repo or https://github.com/owner/repo",
                         value="",
+                        info="GitHub repository",
                     )
                     single_file_input = gr.Textbox(
                         label="File Path",
                         placeholder="docs/README.md",
                         value="",
+                        info="Path to specific file in repository",
                     )
                     single_branch_input = gr.Textbox(
                         label="Branch",
                         value="main",
                         placeholder="main",
+                        info="Branch name (default: main)",
                     )
                     # Get single file button
+                    get_single_btn = gr.Button(
+                        "📄 Get Single File", variant="secondary"
+                    )
             with gr.Row():
                 with gr.Column():
                     gr.Markdown("#### 📚 Get Multiple Files")
                     # Multiple files inputs
                     multiple_repo_input = gr.Textbox(
                         label="Repository URL",
                         placeholder="owner/repo or https://github.com/owner/repo",
                         value="",
+                        info="GitHub repository",
                     )
                     multiple_files_input = gr.Textbox(
                         label="File Paths (comma-separated)",
                         placeholder="README.md,docs/guide.md,api/overview.md",
                         value="",
                         lines=3,
+                        info="Comma-separated list of file paths",
                     )
                     multiple_branch_input = gr.Textbox(
                         label="Branch",
                         value="main",
                         placeholder="main",
+                        info="Branch name (default: main)",
                     )
                     # Get multiple files button
+                    get_multiple_btn = gr.Button(
+                        "📚 Get Multiple Files", variant="secondary"
+                    )
             # Single JSON output for all operations
             gr.Markdown("### 📊 API Response")
                 label="JSON Response",
                 value={
                     "message": "API responses will appear here",
+                    "info": "Use the buttons above to interact with GitHub repositories",
+                },
             )
             # Pure API Functions (JSON only responses)
+            def list_repository_files(
+                repo_url: str, branch: str = "main", extensions: str = ".md,.mdx"
+            ):
                 """
                 List all files in a GitHub repository with specified extensions
                 Args:
                     repo_url: GitHub repository URL or owner/repo format
                     branch: Branch name to search (default: main)
                     extensions: Comma-separated file extensions (default: .md,.mdx)
                 Returns:
                     JSON response with file list and metadata
                 """
                 try:
                     if not repo_url.strip():
                         return {"success": False, "error": "Repository URL is required"}
                     # Parse extensions list
+                    ext_list = [
+                        ext.strip() for ext in extensions.split(",") if ext.strip()
+                    ]
                     if not ext_list:
                         ext_list = [".md", ".mdx"]
                     # Get files list
                     files, status_message = fetch_repository_files(
                         repo_url=repo_url,
                         file_extensions=ext_list,
                         github_token=os.getenv("GITHUB_API_KEY"),
+                        branch=branch,
                     )
                     if files:
                         return {
                             "success": True,
                             "extensions": ext_list,
                             "total_files": len(files),
                             "files": files,
+                            "status": status_message,
                         }
                     else:
                         return {
                             "extensions": ext_list,
                             "total_files": 0,
                             "files": [],
+                            "error": status_message or "No files found",
                         }
                 except Exception as e:
                     return {
                         "success": False,
                         "error": f"Failed to list files: {str(e)}",
                         "repository": repo_url,
+                        "branch": branch,
                     }
             def get_single_file(repo_url: str, file_path: str, branch: str = "main"):
                 """
                 Retrieve a single file from GitHub repository
                 Args:
+                    repo_url: GitHub repository URL or owner/repo format
                     file_path: Path to the file in the repository
                     branch: Branch name (default: main)
                 Returns:
                     JSON response with file content and metadata
                 """
                 try:
                     if not repo_url.strip():
                         return {"success": False, "error": "Repository URL is required"}
                     if not file_path.strip():
                         return {"success": False, "error": "File path is required"}
                     # Parse repo name
                     if "github.com" in repo_url:
                         repo_name = (
                         )
                     else:
                         repo_name = repo_url.strip()
                     # Load single file
                     documents, failed = load_github_files(
                         repo_name=repo_name,
                         file_paths=[file_path.strip()],
                         branch=branch,
+                        github_token=os.getenv("GITHUB_API_KEY"),
                     )
                     if documents and len(documents) > 0:
                         doc = documents[0]
                         return {
                             "content": doc.text,
                             "metadata": doc.metadata,
                             "url": doc.metadata.get("url", ""),
+                            "raw_url": doc.metadata.get("raw_url", ""),
                         }
                     else:
                         error_msg = f"Failed to retrieve file: {failed[0] if failed else 'File not found or access denied'}"
                             "repository": repo_name,
                             "branch": branch,
                             "file_path": file_path,
+                            "error": error_msg,
                         }
                 except Exception as e:
                     return {
                         "success": False,
                         "error": f"Failed to get single file: {str(e)}",
                         "repository": repo_url,
                         "file_path": file_path,
+                        "branch": branch,
                     }
+            def get_multiple_files(
+                repo_url: str, file_paths_str: str, branch: str = "main"
+            ):
                 """
                 Retrieve multiple files from GitHub repository
                 Args:
                     repo_url: GitHub repository URL or owner/repo format
                     file_paths_str: Comma-separated string of file paths
                     branch: Branch name (default: main)
                 Returns:
                     JSON response with multiple file contents and metadata
                 """
                 try:
                     if not repo_url.strip():
                         return {"success": False, "error": "Repository URL is required"}
                     if not file_paths_str.strip():
                         return {"success": False, "error": "File paths are required"}
                     # Parse file paths from comma-separated string
+                    file_paths = [
+                        path.strip()
+                        for path in file_paths_str.split(",")
+                        if path.strip()
+                    ]
                     if not file_paths:
+                        return {
+                            "success": False,
+                            "error": "No valid file paths provided",
+                        }
                     # Parse repo name
                     if "github.com" in repo_url:
                         repo_name = (
                         )
                     else:
                         repo_name = repo_url.strip()
                     # Load multiple files
                     documents, failed = load_github_files(
                         repo_name=repo_name,
                         file_paths=file_paths,
                         branch=branch,
+                        github_token=os.getenv("GITHUB_API_KEY"),
                     )
                     # Process successful documents
                     successful_files = []
                     for doc in documents:
                             "content": doc.text,
                             "metadata": doc.metadata,
                             "url": doc.metadata.get("url", ""),
+                            "raw_url": doc.metadata.get("raw_url", ""),
                         }
                         successful_files.append(file_data)
                     return {
                         "success": True,
                         "repository": repo_name,
                         "files": successful_files,
                         "failed_file_paths": failed,
                         "total_content_size": sum(len(doc.text) for doc in documents),
+                        "requested_file_paths": file_paths,
                     }
                 except Exception as e:
                     return {
                         "success": False,
                         "error": f"Failed to get multiple files: {str(e)}",
                         "repository": repo_url,
                         "file_paths": file_paths_str,
+                        "branch": branch,
                     }
             # Wire up the GitHub file search events - all output to single JSON component
                 fn=list_repository_files,
                 inputs=[api_repo_input, api_branch_input, api_extensions_input],
                 outputs=[api_response_output],
+                api_name="list_repository_files",
             )
             get_single_btn.click(
                 fn=get_single_file,
                 inputs=[single_repo_input, single_file_input, single_branch_input],
                 outputs=[api_response_output],
+                api_name="get_single_file",
             )
             get_multiple_btn.click(
                 fn=get_multiple_files,
+                inputs=[
+                    multiple_repo_input,
+                    multiple_files_input,
+                    multiple_branch_input,
+                ],
                 outputs=[api_response_output],
+                api_name="get_multiple_files",
+            )
+        # ================================
+        # Tab 5: About & MCP Configuration
+        # ================================
+        with gr.TabItem("ℹ️ About & MCP Setup"):
+            gr.Markdown("# 📚 Doc-MCP: Documentation RAG System")
+            gr.Markdown(
+                "**Transform GitHub documentation repositories into accessible MCP servers for AI agents.**"
             )
+            with gr.Row():
+                with gr.Column(scale=2):
+                    # Project Overview
+                    with gr.Accordion("🎯 What is Doc-MCP?", open=True):
+                        gr.Markdown("""
+                        **Doc-MCP** converts GitHub documentation into AI-queryable knowledge bases via the Model Context Protocol.
+                        **🔑 Key Features:**
+                        - 📥 **GitHub Integration** - Automatic markdown file extraction
+                        - 🧠 **AI Embeddings** - Nebius AI-powered vector search
+                        - 🔍 **Smart Search** - Semantic, keyword & hybrid modes
+                        - 🤖 **MCP Server** - Direct AI agent integration
+                        - ⚡ **Real-time** - Live processing progress
+                        """)
+                    # Quick Start Guide
+                    with gr.Accordion("🚀 Quick Start", open=False):
+                        gr.Markdown("""
+                        **1. Ingest Documentation** → Enter GitHub repo URL → Select files → Run 2-step pipeline
+                        **2. Query with AI** → Select repository → Ask questions → Get answers with sources
+                        **3. Manage Repos** → View stats → Delete old repositories
+                        **4. Use MCP Tools** → Configure your AI agent → Query docs directly from IDE
+                        """)
+                with gr.Column(scale=2):
+                    # MCP Server Configuration
+                    with gr.Accordion("🔧 MCP Server Setup", open=True):
+                        gr.Markdown("### 🌐 Server URL")
+                        # Server URL
+                        gr.Textbox(
+                            value="https://agents-mcp-hackathon-doc-mcp.hf.space/gradio_api/mcp/sse",
+                            label="MCP Endpoint",
+                            interactive=False,
+                            info="Copy this URL for your MCP client configuration",
+                        )
+                        gr.Markdown("### ⚙️ Configuration")
+                        # SSE Configuration
+                        with gr.Accordion("For Cursor, Windsurf, Cline", open=False):
+                            sse_config = """{
+  "mcpServers": {
+    "doc-mcp": {
+      "url": "https://agents-mcp-hackathon-doc-mcp.hf.space/gradio_api/mcp/sse"
+    }
+  }
+}"""
+                            gr.Code(
+                                value=sse_config,
+                                label="SSE Configuration",
+                                language="json",
+                                interactive=False,
+                            )
+                        # STDIO Configuration
+                        with gr.Accordion(
+                            "For STDIO Clients (Experimental)", open=False
+                        ):
+                            stdio_config = """{
+  "mcpServers": {
+    "doc-mcp": {
+      "command": "npx",
+      "args": ["mcp-remote", "https://agents-mcp-hackathon-doc-mcp.hf.space/gradio_api/mcp/sse", "--transport", "sse-only"]
+    }
+  }
+}"""
+                            gr.Code(
+                                value=stdio_config,
+                                label="STDIO Configuration",
+                                language="json",
+                                interactive=False,
+                            )
+            # MCP Tools Overview
+            with gr.Row():
+                with gr.Column():
+                    gr.Markdown("### 🛠️ Available MCP Tools")
+                    with gr.Row():
+                        with gr.Column():
+                            gr.Markdown("**🔍 Documentation Query Tools**")
+                            gr.Markdown(
+                                "• `get_available_docs_repo` - List repositories"
+                            )
+                            gr.Markdown("• `make_query` - Search documentation with AI")
+                        with gr.Column():
+                            gr.Markdown("**📁 GitHub File Tools**")
+                            gr.Markdown("• `list_repository_files` - Scan repo files")
+                            gr.Markdown("• `get_single_file` - Fetch one file")
+                            gr.Markdown("• `get_multiple_files` - Fetch multiple files")
+            # Technology Stack & Project Info
+            with gr.Row():
+                with gr.Column():
+                    with gr.Accordion("⚙️ Technology Stack", open=False):
+                        gr.Markdown("**🖥️ Frontend & API**")
+                        gr.Markdown("• **Gradio** - Web interface & API framework")
+                        gr.Markdown("• **Hugging Face Spaces** - Cloud hosting")
+                        gr.Markdown("**🤖 AI & ML**")
+                        gr.Markdown("• **Nebius AI** - LLM & embedding models")
+                        gr.Markdown("• **LlamaIndex** - RAG framework")
+                        gr.Markdown("**💾 Database & Storage**")
+                        gr.Markdown("• **MongoDB Atlas** - Vector database")
+                        gr.Markdown("• **GitHub API** - Source file access")
+                        gr.Markdown("**🔌 Integration**")
+                        gr.Markdown("• **Model Context Protocol** - AI agent standard")
+                        gr.Markdown(
+                            "• **Server-Sent Events** - Real-time communication"
+                        )
+                with gr.Column():
+                    with gr.Accordion("👥 Project Information", open=False):
+                        gr.Markdown("**🏆 MCP Hackathon Project**")
+                        gr.Markdown(
+                            "Created to showcase AI agent integration with documentation systems."
+                        )
+                        gr.Markdown("**💡 Inspiration**")
+                        gr.Markdown("• Making Gradio docs easily searchable")
+                        gr.Markdown("• Leveraging Hugging Face AI ecosystem")
+                        gr.Markdown(
+                            "• Improving developer experience with AI assistants"
+                        )
+                        gr.Markdown("**🔮 Future Plans**")
+                        gr.Markdown("• Support for PDF, HTML files")
+                        gr.Markdown("• Multi-language documentation")
+                        gr.Markdown("• Custom embedding fine-tuning")
+                        gr.Markdown("**📄 License:** MIT - Free to use and modify")
+            # Usage Examples
+            with gr.Row():
+                with gr.Column():
+                    with gr.Accordion("💡 Usage Examples", open=False):
+                        gr.Markdown("### Example Workflow")
+                        with gr.Row():
+                            with gr.Column():
+                                gr.Markdown("**📥 Step 1: Ingest Docs**")
+                                gr.Code(
+                                    value="1. Enter: gradio-app/gradio\n2. Select markdown files\n3. Run ingestion pipeline",
+                                    label="Ingestion Process",
+                                    interactive=False,
+                                )
+                            with gr.Column():
+                                gr.Markdown("**🤖 Step 2: Query with AI**")
+                                gr.Code(
+                                    value='Query: "How to create custom components?"\nResponse: Detailed answer with source links',
+                                    label="AI Query Example",
+                                    interactive=False,
+                                )
+                        gr.Markdown("### MCP Tool Usage")
+                        gr.Code(
+                            value="""# In your AI agent:
+1. Call: get_available_docs_repo() -> ["gradio-app/gradio", ...]
+2. Call: make_query("gradio-app/gradio", "default", "custom components")
+3. Get: AI response + source citations""",
+                            label="MCP Integration Example",
+                            language="python",
+                            interactive=False,
+                        )
 if __name__ == "__main__":
     demo.launch(mcp_server=True)