mdabidhussain commited on
Commit
409063f
ยท
1 Parent(s): 3fcc667

added about tab and updated readme.md

Browse files
Files changed (2) hide show
  1. README.md +76 -29
  2. app.py +481 -213
README.md CHANGED
@@ -22,6 +22,59 @@ short_description: 'RAG on documentations for your agent '
22
 
23
  Doc-MCP ingests markdown documentation from GitHub repositories and creates MCP servers that provide easy access to documentation context for AI agents. Just point it at any GitHub repo with markdown docs, and get an intelligent Q&A interface powered by vector search.
24
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  ## โœจ Key Features
26
 
27
  - **GitHub Integration**: Fetch markdown files directly from any GitHub repository
@@ -30,6 +83,20 @@ Doc-MCP ingests markdown documentation from GitHub repositories and creates MCP
30
  - **Smart Q&A**: Ask questions about documentation with source citations
31
  - **Repository Management**: Track multiple repositories and their statistics
32
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  ## ๐Ÿš€ Quick Start
34
 
35
  1. **Setup Environment**:
@@ -41,7 +108,7 @@ uv sync
41
 
42
  # Configure environment
43
  cp .env.example .env
44
- # Add your NEBIUS_API_KEY and MONGODB_URI
45
  ```
46
 
47
  2. **Run the App**:
@@ -62,34 +129,14 @@ python main.py
62
 
63
 
64
  ## Workflow
65
- ```mermaid
66
- flowchart TD
67
- subgraph Ingestion["Ingestion"]
68
- B["Discover Markdown Files"]
69
- A["GitHub Repo URL"]
70
- C["User File Selection"]
71
- D["Chunk & Embed Documents"]
72
- E["Store in MongoDB"]
73
- end
74
- subgraph Query["Query"]
75
- G["Select Repository"]
76
- F["User Question"]
77
- H["Vector Search"]
78
- I["Retrieve Context"]
79
- J["Generate Response"]
80
- K["Display with Sources"]
81
- end
82
- A --> B
83
- B --> C
84
- C --> D
85
- D --> E
86
- F --> G
87
- G --> H
88
- H --> I
89
- I --> J
90
- J --> K
91
- E --> H
92
- ```
93
 
94
  ## ๐Ÿ› ๏ธ Technology Stack
95
 
 
22
 
23
  Doc-MCP ingests markdown documentation from GitHub repositories and creates MCP servers that provide easy access to documentation context for AI agents. Just point it at any GitHub repo with markdown docs, and get an intelligent Q&A interface powered by vector search.
24
 
25
+ ## ๐Ÿ› ๏ธ Available MCP Tools
26
+
27
+ ### ๐Ÿ“‹ Documentation Query Tools
28
+
29
+ #### `get_available_docs_repo`
30
+ List all available ingested repositories
31
+
32
+ - **Returns**: Array of repository names that have been processed and are available for querying
33
+ - **Usage**: Get a list of documentation repositories before making queries
34
+
35
+ #### `make_query`
36
+ Search documentation with AI-powered semantic search
37
+
38
+ - **Parameters**:
39
+ - `repo` (string): Repository name to search in
40
+ - `mode` (string): Search strategy - "default", "text_search", or "hybrid"
41
+ - `query` (string): Natural language question about the documentation
42
+ - **Returns**: AI-generated response with source citations and metadata
43
+ - **Usage**: Ask questions about specific documentation repositories
44
+
45
+ ### ๐Ÿ“ GitHub File Operations Tools
46
+
47
+ #### `list_repository_files`
48
+ Scan and list files in a GitHub repository
49
+
50
+ - **Parameters**:
51
+ - `repo_url` (string): GitHub repository URL or owner/repo format
52
+ - `branch` (string, optional): Branch name (default: "main")
53
+ - `extensions` (string, optional): Comma-separated file extensions (default: ".md,.mdx")
54
+ - **Returns**: JSON with file list and repository metadata
55
+ - **Usage**: Discover available documentation files before ingestion
56
+
57
+ #### `get_single_file`
58
+ Retrieve content of a specific file from GitHub
59
+
60
+ - **Parameters**:
61
+ - `repo_url` (string): GitHub repository URL or owner/repo format
62
+ - `file_path` (string): Path to the specific file in the repository
63
+ - `branch` (string, optional): Branch name (default: "main")
64
+ - **Returns**: JSON with file content, metadata, and GitHub URLs
65
+ - **Usage**: Fetch individual documentation files for processing or review
66
+
67
+ #### `get_multiple_files`
68
+ Retrieve multiple files from GitHub in one request
69
+
70
+ - **Parameters**:
71
+ - `repo_url` (string): GitHub repository URL or owner/repo format
72
+ - `file_paths_str` (string): Comma-separated list of file paths
73
+ - `branch` (string, optional): Branch name (default: "main")
74
+ - **Returns**: JSON with all file contents, success/failure counts, and metadata
75
+ - **Usage**: Batch fetch multiple documentation files efficiently
76
+
77
+
78
  ## โœจ Key Features
79
 
80
  - **GitHub Integration**: Fetch markdown files directly from any GitHub repository
 
83
  - **Smart Q&A**: Ask questions about documentation with source citations
84
  - **Repository Management**: Track multiple repositories and their statistics
85
 
86
+ ### ๐ŸŽฏ MCP Server Configuration
87
+
88
+ Add this configuration to your MCP client (Cursor, Windsurf, Cline):
89
+
90
+ ```json
91
+ {
92
+ "mcpServers": {
93
+ "doc-mcp": {
94
+ "url": "https://agents-mcp-hackathon-doc-mcp.hf.space/gradio_api/mcp/sse"
95
+ }
96
+ }
97
+ }
98
+ ```
99
+
100
  ## ๐Ÿš€ Quick Start
101
 
102
  1. **Setup Environment**:
 
108
 
109
  # Configure environment
110
  cp .env.example .env
111
+ # Add your GITHUB_API_KEY, NEBIUS_API_KEY and MONGODB_URI
112
  ```
113
 
114
  2. **Run the App**:
 
129
 
130
 
131
  ## Workflow
132
+ - Input GitHub URL
133
+ - Scan for markdown files
134
+ - Select files to process
135
+ - Generate embeddings and Store in vector database
136
+ - Ask questions
137
+ - Search similar content
138
+ - Generate contextual answers
139
+ - Show sources and citations
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
140
 
141
  ## ๐Ÿ› ๏ธ Technology Stack
142
 
app.py CHANGED
@@ -9,11 +9,15 @@ from dotenv import load_dotenv
9
  from llama_index.core import Settings
10
  from llama_index.core.text_splitter import SentenceSplitter
11
 
12
- from rag.config import (delete_repository_data, embed_model,
13
- get_available_repos, get_repo_details,
14
- get_repository_stats, llm)
15
- from rag.github_file_loader import \
16
- fetch_markdown_files as fetch_files_with_loader
 
 
 
 
17
  from rag.github_file_loader import fetch_repository_files, load_github_files
18
  from rag.ingest import ingest_documents_async
19
  from rag.query import QueryRetriever
@@ -27,7 +31,7 @@ Settings.node_parser = SentenceSplitter(chunk_size=3072)
27
 
28
  def get_available_repositories():
29
  return get_available_repos()
30
-
31
 
32
  def start_file_loading(
33
  repo_url: str, selected_files: List[str], current_progress: Dict
@@ -170,8 +174,8 @@ def start_file_loading(
170
  )
171
 
172
  return current_progress
173
-
174
-
175
  def start_vector_ingestion(current_progress: Dict):
176
  """Step 2: Ingest loaded documents into vector store"""
177
  print("\n๐Ÿ”„ STARTING VECTOR INGESTION STEP")
@@ -235,7 +239,9 @@ def start_vector_ingestion(current_progress: Dict):
235
  if isinstance(failed_files_data, list):
236
  failed_files_count = len(failed_files_data)
237
  else:
238
- failed_files_count = failed_files_data if isinstance(failed_files_data, int) else 0
 
 
239
 
240
  # Update final success state with repository update flag
241
  current_progress.update(
@@ -269,7 +275,9 @@ def start_vector_ingestion(current_progress: Dict):
269
  if isinstance(failed_files_data, list):
270
  failed_files_count = len(failed_files_data)
271
  else:
272
- failed_files_count = failed_files_data if isinstance(failed_files_data, int) else 0
 
 
273
 
274
  current_progress.update(
275
  {
@@ -287,11 +295,12 @@ def start_vector_ingestion(current_progress: Dict):
287
 
288
  return current_progress
289
 
 
290
  def start_file_loading_generator(
291
  repo_url: str, selected_files: List[str], current_progress: Dict
292
  ):
293
  """Step 1: Load files from GitHub with yield-based real-time updates"""
294
-
295
  print("\n๐Ÿ”„ STARTING FILE LOADING STEP")
296
  print(f"๐Ÿ“ Repository: {repo_url}")
297
  print(f"๐Ÿ“‹ Selected files: {len(selected_files)} files")
@@ -352,7 +361,7 @@ def start_file_loading_generator(
352
  "repo_name": repo_name,
353
  }
354
  yield initial_progress
355
-
356
  time.sleep(0.5)
357
 
358
  for i in range(0, len(selected_files), batch_size):
@@ -421,13 +430,13 @@ def start_file_loading_generator(
421
  "repo_name": repo_name,
422
  }
423
  yield batch_complete_progress
424
-
425
  time.sleep(0.3)
426
 
427
  except Exception as batch_error:
428
  print(f"โŒ Batch processing error: {batch_error}")
429
  all_failed.extend(batch)
430
-
431
  error_progress = {
432
  "status": "loading",
433
  "message": f"โš ๏ธ Error in batch {current_batch_num}",
@@ -451,7 +460,7 @@ def start_file_loading_generator(
451
  "message": f"โœ… File Loading Complete! Loaded {len(all_documents)} documents",
452
  "progress": 100,
453
  "phase": "Files Loaded Successfully",
454
- "details": f"๐ŸŽฏ Final Results:\nโœ… Successfully loaded: {len(all_documents)} documents\nโŒ Failed files: {len(all_failed)}\nโฑ๏ธ Total time: {loading_time:.1f}s\n๐Ÿ“Š Success rate: {(len(all_documents)/(len(all_documents)+len(all_failed))*100):.1f}%",
455
  "step": "file_loading_complete",
456
  "loaded_documents": all_documents,
457
  "failed_files": all_failed,
@@ -481,6 +490,7 @@ def start_file_loading_generator(
481
  yield error_progress
482
  return error_progress
483
 
 
484
  # Progress display component
485
  def format_progress_display(progress_state: Dict) -> str:
486
  """Format progress state into readable display with enhanced details"""
@@ -496,20 +506,20 @@ def format_progress_display(progress_state: Dict) -> str:
496
  # Enhanced progress bar
497
  filled = int(progress / 2.5) # 40 chars total
498
  progress_bar = "โ–ˆ" * filled + "โ–‘" * (40 - filled)
499
-
500
  # Status emoji mapping
501
  status_emoji = {
502
  "loading": "โณ",
503
- "loaded": "๏ฟฝ๏ฟฝ๏ฟฝ",
504
  "vectorizing": "๐Ÿง ",
505
  "complete": "๐ŸŽ‰",
506
- "error": "โŒ"
507
  }
508
-
509
  emoji = status_emoji.get(status, "๐Ÿ”„")
510
 
511
  output = f"{emoji} **{message}**\n\n"
512
-
513
  # Phase and progress section
514
  output += f"๐Ÿ“Š **Current Phase:** {phase}\n"
515
  output += f"๐Ÿ“ˆ **Progress:** {progress:.1f}%\n"
@@ -521,14 +531,14 @@ def format_progress_display(progress_state: Dict) -> str:
521
  total = progress_state.get("total_files", 0)
522
  successful = progress_state.get("successful_files", 0)
523
  failed = progress_state.get("failed_files", 0)
524
-
525
  if total > 0:
526
  output += "๐Ÿ“ **File Processing Status:**\n"
527
  output += f" โ€ข Total files: {total}\n"
528
  output += f" โ€ข Processed: {processed}/{total}\n"
529
  output += f" โ€ข โœ… Successful: {successful}\n"
530
  output += f" โ€ข โŒ Failed: {failed}\n"
531
-
532
  if "current_batch" in progress_state and "total_batches" in progress_state:
533
  output += f" โ€ข ๐Ÿ“ฆ Current batch: {progress_state['current_batch']}/{progress_state['total_batches']}\n"
534
  output += "\n"
@@ -537,7 +547,7 @@ def format_progress_display(progress_state: Dict) -> str:
537
  elif progress_state.get("step") == "vector_ingestion":
538
  docs_count = progress_state.get("documents_count", 0)
539
  repo_name = progress_state.get("repo_name", "Unknown")
540
-
541
  if docs_count > 0:
542
  output += "๐Ÿง  **Vector Processing Status:**\n"
543
  output += f" โ€ข Repository: {repo_name}\n"
@@ -564,14 +574,18 @@ def format_progress_display(progress_state: Dict) -> str:
564
  output += f"โฑ๏ธ **Total time:** {total_time:.1f} seconds\n"
565
  output += f" โ”œโ”€ File loading: {loading_time:.1f}s\n"
566
  output += f" โ””โ”€ Vector processing: {vector_time:.1f}s\n"
567
- output += f"๐Ÿ“Š **Processing rate:** {docs_processed/total_time:.1f} docs/second\n\n"
 
 
568
  output += "๐Ÿš€ **Next Step:** Go to the 'Query Interface' tab to start asking questions!"
569
 
570
  elif status == "error":
571
  error = progress_state.get("error", "Unknown error")
572
  output += "\n๐Ÿ’ฅ **ERROR OCCURRED**\n"
573
  output += "โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\n"
574
- output += f"โŒ **Error Details:** {error[:300]}{'...' if len(error) > 300 else ''}\n"
 
 
575
  output += "\n๐Ÿ”ง **Troubleshooting Tips:**\n"
576
  output += " โ€ข Check your GitHub token permissions\n"
577
  output += " โ€ข Verify repository URL format\n"
@@ -596,7 +610,7 @@ with gr.Blocks(title="Doc-MCP") as demo:
596
  with gr.TabItem("๐Ÿ“ฅ Documentation Ingestion"):
597
  gr.Markdown("### ๐Ÿš€ Two-Step Documentation Processing Pipeline")
598
  gr.Markdown(
599
- "**Step 1:** Fetch markdown files from GitHub repository โ†’ **Step 2:** Generate vector embeddings and store in MongoDB Atlas"
600
  )
601
 
602
  with gr.Row():
@@ -605,28 +619,38 @@ with gr.Blocks(title="Doc-MCP") as demo:
605
  label="๐Ÿ“‚ GitHub Repository URL",
606
  placeholder="Enter: owner/repo or https://github.com/owner/repo (e.g., gradio-app/gradio)",
607
  value="",
608
- info="Enter any GitHub repository containing markdown documentation"
 
 
 
609
  )
610
- load_btn = gr.Button("๐Ÿ” Discover Documentation Files", variant="secondary")
611
 
612
  with gr.Column(scale=1):
613
  status_output = gr.Textbox(
614
- label="Repository Discovery Status", interactive=False, lines=4,
615
- placeholder="Repository scanning results will appear here..."
 
 
616
  )
617
  with gr.Row():
618
- select_all_btn = gr.Button("๐Ÿ“‹ Select All Documents", variant="secondary")
 
 
619
  clear_all_btn = gr.Button("๐Ÿ—‘๏ธ Clear Selection", variant="secondary")
620
 
621
  # File selection
622
  with gr.Accordion(label="Available Documentation Files"):
623
  file_selector = gr.CheckboxGroup(
624
- choices=[], label="Select Markdown Files for RAG Processing", visible=False
 
 
625
  )
626
 
627
  # Two-step ingestion controls
628
  gr.Markdown("### ๐Ÿ”„ RAG Pipeline Execution")
629
- gr.Markdown("Process your documentation through our advanced RAG pipeline using Nebius AI embeddings and MongoDB Atlas vector storage.")
 
 
630
 
631
  with gr.Row():
632
  with gr.Column():
@@ -656,7 +680,6 @@ with gr.Blocks(title="Doc-MCP") as demo:
656
  lines=25,
657
  value="๐Ÿš€ Ready to start two-step ingestion process...\n\n๐Ÿ“‹ Steps:\n1๏ธโƒฃ Load files from GitHub repository\n2๏ธโƒฃ Generate embeddings and store in vector database",
658
  max_lines=30,
659
- show_copy_button=True,
660
  )
661
 
662
  # Event handlers
@@ -694,12 +717,18 @@ with gr.Blocks(title="Doc-MCP") as demo:
694
  gr.Button(interactive=False),
695
  )
696
 
697
- def start_step1_generator(repo_url: str, selected_files: List[str], current_progress: Dict):
 
 
698
  """Start Step 1 with generator-based real-time progress updates"""
699
- for progress_update in start_file_loading_generator(repo_url, selected_files, current_progress.copy()):
 
 
700
  progress_text = format_progress_display(progress_update)
701
- step2_enabled = progress_update.get("step") == "file_loading_complete"
702
-
 
 
703
  yield (
704
  progress_update,
705
  progress_text,
@@ -799,23 +828,24 @@ with gr.Blocks(title="Doc-MCP") as demo:
799
  # Repository selection - Dropdown that becomes textbox when selected
800
  with gr.Row():
801
  repo_dropdown = gr.Dropdown(
802
- choices=get_available_repositories() or ["No repositories available"],
 
803
  label="๐Ÿ“š Select Documentation Repository",
804
  value=None,
805
  interactive=True,
806
  allow_custom_value=True,
807
- info="Choose from available repositories"
808
  )
809
-
810
  # Hidden textbox that will become visible when repo is selected
811
  selected_repo_textbox = gr.Textbox(
812
  label="๐ŸŽฏ Selected Repository",
813
  value="",
814
  interactive=False,
815
  visible=False,
816
- info="Currently selected repository for querying"
817
  )
818
-
819
  refresh_repos_btn = gr.Button(
820
  "๐Ÿ”„ Refresh Repository List", variant="secondary", size="sm"
821
  )
@@ -833,10 +863,12 @@ with gr.Blocks(title="Doc-MCP") as demo:
833
  label="๐Ÿ’ญ Ask About Your Documentation",
834
  placeholder="How do I implement a custom component? What are the available API endpoints? How to configure the system?",
835
  lines=3,
836
- info="Ask natural language questions about your documentation"
837
  )
838
 
839
- query_btn = gr.Button("๐Ÿš€ Search Documentation", variant="primary", size="lg")
 
 
840
 
841
  # Response display as text area
842
  response_output = gr.Textbox(
@@ -844,53 +876,68 @@ with gr.Blocks(title="Doc-MCP") as demo:
844
  value="Your AI-powered documentation response will appear here with contextual information and source citations...",
845
  lines=10,
846
  interactive=False,
847
- info="Generated using Nebius LLM with retrieved documentation context"
848
  )
849
 
850
  with gr.Column(scale=2):
851
  gr.Markdown("### ๐Ÿ“– Source References")
852
- gr.Markdown("View the exact documentation sources used to generate the response, with relevance scores and GitHub links.")
 
 
853
 
854
  # Source nodes display as JSON
855
  sources_output = gr.JSON(
856
  label="๐Ÿ“Ž Source Citations & Metadata",
857
  value={
858
  "message": "Source documentation excerpts with relevance scores will appear here after your query...",
859
- "info": "Each source includes file path, relevance score, and content snippet"
860
  },
861
  )
862
 
863
  # Event handlers
864
  def handle_repo_selection(selected_repo):
865
  """Handle repository selection from dropdown"""
866
- if not selected_repo or selected_repo in ["No repositories available", ""]:
 
 
 
867
  return (
868
  gr.Dropdown(visible=True), # Keep dropdown visible
869
  gr.Textbox(visible=False, value=""), # Hide textbox
870
- gr.Button(interactive=False) # Disable query button
871
  )
872
  else:
873
  return (
874
  gr.Dropdown(visible=False), # Hide dropdown
875
- gr.Textbox(visible=True, value=selected_repo), # Show textbox with selected repo
876
- gr.Button(interactive=True) # Enable query button
 
 
877
  )
878
 
879
  def reset_repo_selection():
880
  """Reset to show dropdown again"""
881
  try:
882
- repos = get_available_repositories() or ["No repositories available"]
 
 
883
  return (
884
- gr.Dropdown(choices=repos, value=None, visible=True), # Show dropdown with refreshed choices
 
 
885
  gr.Textbox(visible=False, value=""), # Hide textbox
886
- gr.Button(interactive=False) # Disable query button
887
  )
888
  except Exception as e:
889
  print(f"Error refreshing repository list: {e}")
890
  return (
891
- gr.Dropdown(choices=["Error loading repositories"], value=None, visible=True),
 
 
 
 
892
  gr.Textbox(visible=False, value=""),
893
- gr.Button(interactive=False)
894
  )
895
 
896
  def get_available_docs_repo():
@@ -903,11 +950,15 @@ with gr.Blocks(title="Doc-MCP") as demo:
903
  try:
904
  repos = get_available_repositories()
905
  if not repos:
906
- repos = ["No repositories available - Please ingest documentation first"]
 
 
907
  return gr.Dropdown(choices=repos, value=None)
908
  except Exception as e:
909
  print(f"Error refreshing repository list: {e}")
910
- return gr.Dropdown(choices=["Error loading repositories"], value=None)
 
 
911
 
912
  # Simple query handler
913
  def handle_query(repo: str, mode: str, query: str):
@@ -923,11 +974,14 @@ with gr.Blocks(title="Doc-MCP") as demo:
923
  if not query.strip():
924
  return {"error": "Please enter a query."}
925
 
926
- if not repo or repo in ["No repositories available", "Error loading repositories", ""]:
 
 
 
 
927
  return {"error": "Please select a valid repository."}
928
 
929
  try:
930
-
931
  # Create query retriever for the selected repo
932
  retriever = QueryRetriever(repo)
933
 
@@ -967,22 +1021,22 @@ with gr.Blocks(title="Doc-MCP") as demo:
967
  return response_text, source_nodes
968
 
969
  # Wire up events
970
-
971
  # Handle repository selection from dropdown
972
  repo_dropdown.change(
973
  fn=handle_repo_selection,
974
  inputs=[repo_dropdown],
975
  outputs=[repo_dropdown, selected_repo_textbox, query_btn],
976
- show_api=False
977
  )
978
-
979
  # Handle refresh button - resets to dropdown view
980
  refresh_repos_btn.click(
981
  fn=reset_repo_selection,
982
  outputs=[repo_dropdown, selected_repo_textbox, query_btn],
983
- show_api=False
984
  )
985
-
986
  # Also provide API endpoint for listing repositories
987
  refresh_repos_btn.click(
988
  fn=get_available_docs_repo,
@@ -993,7 +1047,11 @@ with gr.Blocks(title="Doc-MCP") as demo:
993
  # Query button uses the textbox value (not dropdown)
994
  query_btn.click(
995
  fn=make_query,
996
- inputs=[selected_repo_textbox, query_mode, query_input], # Use textbox, not dropdown
 
 
 
 
997
  outputs=[response_output, sources_output],
998
  api_name="query_documentation",
999
  )
@@ -1001,7 +1059,11 @@ with gr.Blocks(title="Doc-MCP") as demo:
1001
  # Also allow Enter key to trigger query
1002
  query_input.submit(
1003
  fn=make_query,
1004
- inputs=[selected_repo_textbox, query_mode, query_input], # Use textbox, not dropdown
 
 
 
 
1005
  outputs=[response_output, sources_output],
1006
  show_api=False,
1007
  )
@@ -1010,17 +1072,21 @@ with gr.Blocks(title="Doc-MCP") as demo:
1010
  # Tab 3: Repository Management
1011
  # ================================
1012
  with gr.TabItem("๐Ÿ—‚๏ธ Repository Management"):
1013
- gr.Markdown("Manage your ingested repositories - view details and delete repositories when needed.")
1014
-
 
 
1015
  with gr.Row():
1016
  with gr.Column(scale=1):
1017
  gr.Markdown("### ๐Ÿ“Š Repository Statistics")
1018
  stats_display = gr.JSON(
1019
  label="Database Statistics",
1020
- value={"message": "Click refresh to load statistics..."}
 
 
 
1021
  )
1022
- refresh_stats_btn = gr.Button("๐Ÿ”„ Refresh Statistics", variant="secondary")
1023
-
1024
  with gr.Column(scale=2):
1025
  gr.Markdown("### ๐Ÿ“‹ Repository Details")
1026
  repos_table = gr.Dataframe(
@@ -1028,13 +1094,17 @@ with gr.Blocks(title="Doc-MCP") as demo:
1028
  datatype=["str", "number", "str"],
1029
  label="Ingested Repositories",
1030
  interactive=False,
1031
- wrap=True
 
 
 
1032
  )
1033
- refresh_repos_btn = gr.Button("๐Ÿ”„ Refresh Repository List", variant="secondary")
1034
 
1035
  gr.Markdown("### ๐Ÿ—‘๏ธ Delete Repository")
1036
- gr.Markdown("**โš ๏ธ Warning:** This will permanently delete all documents and metadata for the selected repository.")
1037
-
 
 
1038
  with gr.Row():
1039
  with gr.Column(scale=2):
1040
  delete_repo_dropdown = gr.Dropdown(
@@ -1044,33 +1114,31 @@ with gr.Blocks(title="Doc-MCP") as demo:
1044
  interactive=True,
1045
  allow_custom_value=False,
1046
  )
1047
-
1048
  # Confirmation checkbox
1049
  confirm_delete = gr.Checkbox(
1050
- label="I understand this action cannot be undone",
1051
- value=False
1052
  )
1053
-
1054
  delete_btn = gr.Button(
1055
- "๐Ÿ—‘๏ธ Delete Repository",
1056
- variant="stop",
1057
  size="lg",
1058
- interactive=False
1059
  )
1060
-
1061
  with gr.Column(scale=1):
1062
  deletion_status = gr.Textbox(
1063
  label="Deletion Status",
1064
  value="Select a repository and confirm to enable deletion.",
1065
  interactive=False,
1066
- lines=6
1067
  )
1068
 
1069
  # Management functions
1070
  def load_repository_stats():
1071
  """Load overall repository statistics"""
1072
  try:
1073
-
1074
  stats = get_repository_stats()
1075
  return stats
1076
  except Exception as e:
@@ -1079,29 +1147,30 @@ with gr.Blocks(title="Doc-MCP") as demo:
1079
  def load_repository_details():
1080
  """Load detailed repository information as a table"""
1081
  try:
1082
-
1083
  details = get_repo_details()
1084
-
1085
  if not details:
1086
  return [["No repositories found", 0, "N/A"]]
1087
-
1088
  # Format for dataframe
1089
  table_data = []
1090
  for repo in details:
1091
  last_updated = repo.get("last_updated", "Unknown")
1092
- if hasattr(last_updated, 'strftime'):
1093
  last_updated = last_updated.strftime("%Y-%m-%d %H:%M")
1094
  elif last_updated != "Unknown":
1095
  last_updated = str(last_updated)
1096
-
1097
- table_data.append([
1098
- repo.get("repo_name", "Unknown"),
1099
- repo.get("file_count", 0),
1100
- last_updated
1101
- ])
1102
-
 
 
1103
  return table_data
1104
-
1105
  except Exception as e:
1106
  return [["Error loading repositories", 0, str(e)]]
1107
 
@@ -1124,17 +1193,23 @@ with gr.Blocks(title="Doc-MCP") as demo:
1124
  def delete_repository(repo_name: str, confirmed: bool):
1125
  """Delete the selected repository"""
1126
  if not repo_name:
1127
- return "โŒ No repository selected.", gr.Dropdown(choices=[]), gr.Checkbox(value=False)
1128
-
1129
- if not confirmed:
1130
- return "โŒ Please confirm deletion by checking the checkbox.", gr.Dropdown(choices=[]), gr.Checkbox(value=False)
1131
-
1132
- try:
1133
 
 
 
 
 
 
 
1134
 
 
1135
  # Perform deletion
1136
  result = delete_repository_data(repo_name)
1137
-
1138
  # Prepare status message
1139
  status_msg = result["message"]
1140
  if result["success"]:
@@ -1142,79 +1217,67 @@ with gr.Blocks(title="Doc-MCP") as demo:
1142
  status_msg += f"\n- Vector documents removed: {result['vector_docs_deleted']}"
1143
  status_msg += f"\n- Repository record deleted: {'Yes' if result['repo_record_deleted'] else 'No'}"
1144
  status_msg += f"\n\nโœ… Repository '{repo_name}' has been completely removed."
1145
-
1146
  # Update dropdown (remove deleted repo)
1147
  updated_dropdown = update_delete_dropdown()
1148
-
1149
  # Reset confirmation checkbox
1150
  reset_checkbox = gr.Checkbox(value=False)
1151
-
1152
  return status_msg, updated_dropdown, reset_checkbox
1153
-
1154
  except Exception as e:
1155
  error_msg = f"โŒ Error deleting repository: {str(e)}"
1156
  return error_msg, gr.Dropdown(choices=[]), gr.Checkbox(value=False)
1157
 
1158
  # Wire up management events
1159
  refresh_stats_btn.click(
1160
- fn=load_repository_stats,
1161
- outputs=[stats_display],
1162
- show_api=False
1163
  )
1164
-
1165
  refresh_repos_btn.click(
1166
- fn=load_repository_details,
1167
- outputs=[repos_table],
1168
- show_api=False
1169
  )
1170
-
1171
  # Update delete dropdown when refreshing repos
1172
  refresh_repos_btn.click(
1173
  fn=update_delete_dropdown,
1174
  outputs=[delete_repo_dropdown],
1175
- show_api=False
1176
  )
1177
-
1178
  # Enable/disable delete button based on selection and confirmation
1179
  delete_repo_dropdown.change(
1180
  fn=check_delete_button_state,
1181
  inputs=[delete_repo_dropdown, confirm_delete],
1182
  outputs=[delete_btn],
1183
- show_api=False
1184
  )
1185
-
1186
  confirm_delete.change(
1187
  fn=check_delete_button_state,
1188
  inputs=[delete_repo_dropdown, confirm_delete],
1189
  outputs=[delete_btn],
1190
- show_api=False
1191
  )
1192
-
1193
  # Delete repository
1194
  delete_btn.click(
1195
  fn=delete_repository,
1196
  inputs=[delete_repo_dropdown, confirm_delete],
1197
  outputs=[deletion_status, delete_repo_dropdown, confirm_delete],
1198
- show_api=False
1199
  )
1200
 
1201
  # Load data on tab load
1202
- demo.load(
1203
- fn=load_repository_stats,
1204
- outputs=[stats_display],
1205
- show_api=False
1206
- )
1207
-
1208
- demo.load(
1209
- fn=load_repository_details,
1210
- outputs=[repos_table],
1211
- show_api=False
1212
- )
1213
-
1214
  demo.load(
1215
  fn=update_delete_dropdown,
1216
  outputs=[delete_repo_dropdown],
1217
- show_api=False
1218
  )
1219
 
1220
  # ================================
@@ -1222,96 +1285,102 @@ with gr.Blocks(title="Doc-MCP") as demo:
1222
  # ================================
1223
  with gr.TabItem("๐Ÿ” GitHub File Search", visible=False):
1224
  gr.Markdown("### ๐Ÿ”ง GitHub Repository File Search API")
1225
- gr.Markdown("Pure API endpoints for GitHub file operations - all responses in JSON format")
1226
-
 
 
1227
  with gr.Row():
1228
  with gr.Column():
1229
  gr.Markdown("#### ๐Ÿ“‹ List Repository Files")
1230
-
1231
  # Repository input for file operations
1232
  api_repo_input = gr.Textbox(
1233
  label="Repository URL",
1234
  placeholder="owner/repo or https://github.com/owner/repo",
1235
  value="",
1236
- info="GitHub repository to scan"
1237
  )
1238
-
1239
  # Branch selection
1240
  api_branch_input = gr.Textbox(
1241
  label="Branch",
1242
  value="main",
1243
  placeholder="main",
1244
- info="Branch to search (default: main)"
1245
  )
1246
-
1247
  # File extensions
1248
  api_extensions_input = gr.Textbox(
1249
  label="File Extensions (comma-separated)",
1250
  value=".md,.mdx",
1251
  placeholder=".md,.mdx,.txt",
1252
- info="File extensions to include"
1253
  )
1254
-
1255
  # List files button
1256
  list_files_btn = gr.Button("๐Ÿ“‹ List Files", variant="primary")
1257
-
1258
  with gr.Column():
1259
  gr.Markdown("#### ๐Ÿ“„ Get Single File")
1260
-
1261
  # Single file inputs
1262
  single_repo_input = gr.Textbox(
1263
  label="Repository URL",
1264
  placeholder="owner/repo or https://github.com/owner/repo",
1265
  value="",
1266
- info="GitHub repository"
1267
  )
1268
-
1269
  single_file_input = gr.Textbox(
1270
  label="File Path",
1271
  placeholder="docs/README.md",
1272
  value="",
1273
- info="Path to specific file in repository"
1274
  )
1275
-
1276
  single_branch_input = gr.Textbox(
1277
  label="Branch",
1278
  value="main",
1279
  placeholder="main",
1280
- info="Branch name (default: main)"
1281
  )
1282
-
1283
  # Get single file button
1284
- get_single_btn = gr.Button("๐Ÿ“„ Get Single File", variant="secondary")
 
 
1285
 
1286
  with gr.Row():
1287
  with gr.Column():
1288
  gr.Markdown("#### ๐Ÿ“š Get Multiple Files")
1289
-
1290
  # Multiple files inputs
1291
  multiple_repo_input = gr.Textbox(
1292
  label="Repository URL",
1293
  placeholder="owner/repo or https://github.com/owner/repo",
1294
  value="",
1295
- info="GitHub repository"
1296
  )
1297
-
1298
  multiple_files_input = gr.Textbox(
1299
  label="File Paths (comma-separated)",
1300
  placeholder="README.md,docs/guide.md,api/overview.md",
1301
  value="",
1302
  lines=3,
1303
- info="Comma-separated list of file paths"
1304
  )
1305
-
1306
  multiple_branch_input = gr.Textbox(
1307
  label="Branch",
1308
  value="main",
1309
  placeholder="main",
1310
- info="Branch name (default: main)"
1311
  )
1312
-
1313
  # Get multiple files button
1314
- get_multiple_btn = gr.Button("๐Ÿ“š Get Multiple Files", variant="secondary")
 
 
1315
 
1316
  # Single JSON output for all operations
1317
  gr.Markdown("### ๐Ÿ“Š API Response")
@@ -1319,41 +1388,44 @@ with gr.Blocks(title="Doc-MCP") as demo:
1319
  label="JSON Response",
1320
  value={
1321
  "message": "API responses will appear here",
1322
- "info": "Use the buttons above to interact with GitHub repositories"
1323
- }
1324
  )
1325
 
1326
  # Pure API Functions (JSON only responses)
1327
- def list_repository_files(repo_url: str, branch: str = "main", extensions: str = ".md,.mdx"):
 
 
1328
  """
1329
  List all files in a GitHub repository with specified extensions
1330
-
1331
  Args:
1332
  repo_url: GitHub repository URL or owner/repo format
1333
  branch: Branch name to search (default: main)
1334
  extensions: Comma-separated file extensions (default: .md,.mdx)
1335
-
1336
  Returns:
1337
  JSON response with file list and metadata
1338
  """
1339
  try:
1340
  if not repo_url.strip():
1341
  return {"success": False, "error": "Repository URL is required"}
1342
-
1343
  # Parse extensions list
1344
- ext_list = [ext.strip() for ext in extensions.split(",") if ext.strip()]
 
 
1345
  if not ext_list:
1346
  ext_list = [".md", ".mdx"]
1347
-
1348
 
1349
  # Get files list
1350
  files, status_message = fetch_repository_files(
1351
  repo_url=repo_url,
1352
  file_extensions=ext_list,
1353
  github_token=os.getenv("GITHUB_API_KEY"),
1354
- branch=branch
1355
  )
1356
-
1357
  if files:
1358
  return {
1359
  "success": True,
@@ -1362,7 +1434,7 @@ with gr.Blocks(title="Doc-MCP") as demo:
1362
  "extensions": ext_list,
1363
  "total_files": len(files),
1364
  "files": files,
1365
- "status": status_message
1366
  }
1367
  else:
1368
  return {
@@ -1372,36 +1444,36 @@ with gr.Blocks(title="Doc-MCP") as demo:
1372
  "extensions": ext_list,
1373
  "total_files": 0,
1374
  "files": [],
1375
- "error": status_message or "No files found"
1376
  }
1377
-
1378
  except Exception as e:
1379
  return {
1380
  "success": False,
1381
  "error": f"Failed to list files: {str(e)}",
1382
  "repository": repo_url,
1383
- "branch": branch
1384
  }
1385
 
1386
  def get_single_file(repo_url: str, file_path: str, branch: str = "main"):
1387
  """
1388
  Retrieve a single file from GitHub repository
1389
-
1390
  Args:
1391
- repo_url: GitHub repository URL or owner/repo format
1392
  file_path: Path to the file in the repository
1393
  branch: Branch name (default: main)
1394
-
1395
  Returns:
1396
  JSON response with file content and metadata
1397
  """
1398
  try:
1399
  if not repo_url.strip():
1400
  return {"success": False, "error": "Repository URL is required"}
1401
-
1402
  if not file_path.strip():
1403
  return {"success": False, "error": "File path is required"}
1404
-
1405
  # Parse repo name
1406
  if "github.com" in repo_url:
1407
  repo_name = (
@@ -1411,15 +1483,15 @@ with gr.Blocks(title="Doc-MCP") as demo:
1411
  )
1412
  else:
1413
  repo_name = repo_url.strip()
1414
-
1415
  # Load single file
1416
  documents, failed = load_github_files(
1417
  repo_name=repo_name,
1418
  file_paths=[file_path.strip()],
1419
  branch=branch,
1420
- github_token=os.getenv("GITHUB_API_KEY")
1421
  )
1422
-
1423
  if documents and len(documents) > 0:
1424
  doc = documents[0]
1425
  return {
@@ -1432,7 +1504,7 @@ with gr.Blocks(title="Doc-MCP") as demo:
1432
  "content": doc.text,
1433
  "metadata": doc.metadata,
1434
  "url": doc.metadata.get("url", ""),
1435
- "raw_url": doc.metadata.get("raw_url", "")
1436
  }
1437
  else:
1438
  error_msg = f"Failed to retrieve file: {failed[0] if failed else 'File not found or access denied'}"
@@ -1441,43 +1513,52 @@ with gr.Blocks(title="Doc-MCP") as demo:
1441
  "repository": repo_name,
1442
  "branch": branch,
1443
  "file_path": file_path,
1444
- "error": error_msg
1445
  }
1446
-
1447
  except Exception as e:
1448
  return {
1449
  "success": False,
1450
  "error": f"Failed to get single file: {str(e)}",
1451
  "repository": repo_url,
1452
  "file_path": file_path,
1453
- "branch": branch
1454
  }
1455
 
1456
- def get_multiple_files(repo_url: str, file_paths_str: str, branch: str = "main"):
 
 
1457
  """
1458
  Retrieve multiple files from GitHub repository
1459
-
1460
  Args:
1461
  repo_url: GitHub repository URL or owner/repo format
1462
  file_paths_str: Comma-separated string of file paths
1463
  branch: Branch name (default: main)
1464
-
1465
  Returns:
1466
  JSON response with multiple file contents and metadata
1467
  """
1468
  try:
1469
  if not repo_url.strip():
1470
  return {"success": False, "error": "Repository URL is required"}
1471
-
1472
  if not file_paths_str.strip():
1473
  return {"success": False, "error": "File paths are required"}
1474
-
1475
  # Parse file paths from comma-separated string
1476
- file_paths = [path.strip() for path in file_paths_str.split(",") if path.strip()]
1477
-
 
 
 
 
1478
  if not file_paths:
1479
- return {"success": False, "error": "No valid file paths provided"}
1480
-
 
 
 
1481
  # Parse repo name
1482
  if "github.com" in repo_url:
1483
  repo_name = (
@@ -1487,15 +1568,15 @@ with gr.Blocks(title="Doc-MCP") as demo:
1487
  )
1488
  else:
1489
  repo_name = repo_url.strip()
1490
-
1491
  # Load multiple files
1492
  documents, failed = load_github_files(
1493
  repo_name=repo_name,
1494
  file_paths=file_paths,
1495
  branch=branch,
1496
- github_token=os.getenv("GITHUB_API_KEY")
1497
  )
1498
-
1499
  # Process successful documents
1500
  successful_files = []
1501
  for doc in documents:
@@ -1506,10 +1587,10 @@ with gr.Blocks(title="Doc-MCP") as demo:
1506
  "content": doc.text,
1507
  "metadata": doc.metadata,
1508
  "url": doc.metadata.get("url", ""),
1509
- "raw_url": doc.metadata.get("raw_url", "")
1510
  }
1511
  successful_files.append(file_data)
1512
-
1513
  return {
1514
  "success": True,
1515
  "repository": repo_name,
@@ -1520,16 +1601,16 @@ with gr.Blocks(title="Doc-MCP") as demo:
1520
  "files": successful_files,
1521
  "failed_file_paths": failed,
1522
  "total_content_size": sum(len(doc.text) for doc in documents),
1523
- "requested_file_paths": file_paths
1524
  }
1525
-
1526
  except Exception as e:
1527
  return {
1528
  "success": False,
1529
  "error": f"Failed to get multiple files: {str(e)}",
1530
  "repository": repo_url,
1531
  "file_paths": file_paths_str,
1532
- "branch": branch
1533
  }
1534
 
1535
  # Wire up the GitHub file search events - all output to single JSON component
@@ -1537,21 +1618,208 @@ with gr.Blocks(title="Doc-MCP") as demo:
1537
  fn=list_repository_files,
1538
  inputs=[api_repo_input, api_branch_input, api_extensions_input],
1539
  outputs=[api_response_output],
1540
- api_name="list_repository_files"
1541
  )
1542
-
1543
  get_single_btn.click(
1544
  fn=get_single_file,
1545
  inputs=[single_repo_input, single_file_input, single_branch_input],
1546
  outputs=[api_response_output],
1547
- api_name="get_single_file"
1548
  )
1549
-
1550
  get_multiple_btn.click(
1551
  fn=get_multiple_files,
1552
- inputs=[multiple_repo_input, multiple_files_input, multiple_branch_input],
 
 
 
 
1553
  outputs=[api_response_output],
1554
- api_name="get_multiple_files"
 
 
 
 
 
 
 
 
 
1555
  )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1556
  if __name__ == "__main__":
1557
  demo.launch(mcp_server=True)
 
9
  from llama_index.core import Settings
10
  from llama_index.core.text_splitter import SentenceSplitter
11
 
12
+ from rag.config import (
13
+ delete_repository_data,
14
+ embed_model,
15
+ get_available_repos,
16
+ get_repo_details,
17
+ get_repository_stats,
18
+ llm,
19
+ )
20
+ from rag.github_file_loader import fetch_markdown_files as fetch_files_with_loader
21
  from rag.github_file_loader import fetch_repository_files, load_github_files
22
  from rag.ingest import ingest_documents_async
23
  from rag.query import QueryRetriever
 
31
 
32
  def get_available_repositories():
33
  return get_available_repos()
34
+
35
 
36
  def start_file_loading(
37
  repo_url: str, selected_files: List[str], current_progress: Dict
 
174
  )
175
 
176
  return current_progress
177
+
178
+
179
  def start_vector_ingestion(current_progress: Dict):
180
  """Step 2: Ingest loaded documents into vector store"""
181
  print("\n๐Ÿ”„ STARTING VECTOR INGESTION STEP")
 
239
  if isinstance(failed_files_data, list):
240
  failed_files_count = len(failed_files_data)
241
  else:
242
+ failed_files_count = (
243
+ failed_files_data if isinstance(failed_files_data, int) else 0
244
+ )
245
 
246
  # Update final success state with repository update flag
247
  current_progress.update(
 
275
  if isinstance(failed_files_data, list):
276
  failed_files_count = len(failed_files_data)
277
  else:
278
+ failed_files_count = (
279
+ failed_files_data if isinstance(failed_files_data, int) else 0
280
+ )
281
 
282
  current_progress.update(
283
  {
 
295
 
296
  return current_progress
297
 
298
+
299
  def start_file_loading_generator(
300
  repo_url: str, selected_files: List[str], current_progress: Dict
301
  ):
302
  """Step 1: Load files from GitHub with yield-based real-time updates"""
303
+
304
  print("\n๐Ÿ”„ STARTING FILE LOADING STEP")
305
  print(f"๐Ÿ“ Repository: {repo_url}")
306
  print(f"๐Ÿ“‹ Selected files: {len(selected_files)} files")
 
361
  "repo_name": repo_name,
362
  }
363
  yield initial_progress
364
+
365
  time.sleep(0.5)
366
 
367
  for i in range(0, len(selected_files), batch_size):
 
430
  "repo_name": repo_name,
431
  }
432
  yield batch_complete_progress
433
+
434
  time.sleep(0.3)
435
 
436
  except Exception as batch_error:
437
  print(f"โŒ Batch processing error: {batch_error}")
438
  all_failed.extend(batch)
439
+
440
  error_progress = {
441
  "status": "loading",
442
  "message": f"โš ๏ธ Error in batch {current_batch_num}",
 
460
  "message": f"โœ… File Loading Complete! Loaded {len(all_documents)} documents",
461
  "progress": 100,
462
  "phase": "Files Loaded Successfully",
463
+ "details": f"๐ŸŽฏ Final Results:\nโœ… Successfully loaded: {len(all_documents)} documents\nโŒ Failed files: {len(all_failed)}\nโฑ๏ธ Total time: {loading_time:.1f}s\n๐Ÿ“Š Success rate: {(len(all_documents) / (len(all_documents) + len(all_failed)) * 100):.1f}%",
464
  "step": "file_loading_complete",
465
  "loaded_documents": all_documents,
466
  "failed_files": all_failed,
 
490
  yield error_progress
491
  return error_progress
492
 
493
+
494
  # Progress display component
495
  def format_progress_display(progress_state: Dict) -> str:
496
  """Format progress state into readable display with enhanced details"""
 
506
  # Enhanced progress bar
507
  filled = int(progress / 2.5) # 40 chars total
508
  progress_bar = "โ–ˆ" * filled + "โ–‘" * (40 - filled)
509
+
510
  # Status emoji mapping
511
  status_emoji = {
512
  "loading": "โณ",
513
+ "loaded": "โœ…",
514
  "vectorizing": "๐Ÿง ",
515
  "complete": "๐ŸŽ‰",
516
+ "error": "โŒ",
517
  }
518
+
519
  emoji = status_emoji.get(status, "๐Ÿ”„")
520
 
521
  output = f"{emoji} **{message}**\n\n"
522
+
523
  # Phase and progress section
524
  output += f"๐Ÿ“Š **Current Phase:** {phase}\n"
525
  output += f"๐Ÿ“ˆ **Progress:** {progress:.1f}%\n"
 
531
  total = progress_state.get("total_files", 0)
532
  successful = progress_state.get("successful_files", 0)
533
  failed = progress_state.get("failed_files", 0)
534
+
535
  if total > 0:
536
  output += "๐Ÿ“ **File Processing Status:**\n"
537
  output += f" โ€ข Total files: {total}\n"
538
  output += f" โ€ข Processed: {processed}/{total}\n"
539
  output += f" โ€ข โœ… Successful: {successful}\n"
540
  output += f" โ€ข โŒ Failed: {failed}\n"
541
+
542
  if "current_batch" in progress_state and "total_batches" in progress_state:
543
  output += f" โ€ข ๐Ÿ“ฆ Current batch: {progress_state['current_batch']}/{progress_state['total_batches']}\n"
544
  output += "\n"
 
547
  elif progress_state.get("step") == "vector_ingestion":
548
  docs_count = progress_state.get("documents_count", 0)
549
  repo_name = progress_state.get("repo_name", "Unknown")
550
+
551
  if docs_count > 0:
552
  output += "๐Ÿง  **Vector Processing Status:**\n"
553
  output += f" โ€ข Repository: {repo_name}\n"
 
574
  output += f"โฑ๏ธ **Total time:** {total_time:.1f} seconds\n"
575
  output += f" โ”œโ”€ File loading: {loading_time:.1f}s\n"
576
  output += f" โ””โ”€ Vector processing: {vector_time:.1f}s\n"
577
+ output += (
578
+ f"๐Ÿ“Š **Processing rate:** {docs_processed / total_time:.1f} docs/second\n\n"
579
+ )
580
  output += "๐Ÿš€ **Next Step:** Go to the 'Query Interface' tab to start asking questions!"
581
 
582
  elif status == "error":
583
  error = progress_state.get("error", "Unknown error")
584
  output += "\n๐Ÿ’ฅ **ERROR OCCURRED**\n"
585
  output += "โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”\n"
586
+ output += (
587
+ f"โŒ **Error Details:** {error[:300]}{'...' if len(error) > 300 else ''}\n"
588
+ )
589
  output += "\n๐Ÿ”ง **Troubleshooting Tips:**\n"
590
  output += " โ€ข Check your GitHub token permissions\n"
591
  output += " โ€ข Verify repository URL format\n"
 
610
  with gr.TabItem("๐Ÿ“ฅ Documentation Ingestion"):
611
  gr.Markdown("### ๐Ÿš€ Two-Step Documentation Processing Pipeline")
612
  gr.Markdown(
613
+ "**Step 1:** Fetch markdown files from GitHub repository โ†’ **Step 2:** Generate vector embeddings and store in MongoDB Atlas"
614
  )
615
 
616
  with gr.Row():
 
619
  label="๐Ÿ“‚ GitHub Repository URL",
620
  placeholder="Enter: owner/repo or https://github.com/owner/repo (e.g., gradio-app/gradio)",
621
  value="",
622
+ info="Enter any GitHub repository containing markdown documentation",
623
+ )
624
+ load_btn = gr.Button(
625
+ "๐Ÿ” Discover Documentation Files", variant="secondary"
626
  )
 
627
 
628
  with gr.Column(scale=1):
629
  status_output = gr.Textbox(
630
+ label="Repository Discovery Status",
631
+ interactive=False,
632
+ lines=4,
633
+ placeholder="Repository scanning results will appear here...",
634
  )
635
  with gr.Row():
636
+ select_all_btn = gr.Button(
637
+ "๐Ÿ“‹ Select All Documents", variant="secondary"
638
+ )
639
  clear_all_btn = gr.Button("๐Ÿ—‘๏ธ Clear Selection", variant="secondary")
640
 
641
  # File selection
642
  with gr.Accordion(label="Available Documentation Files"):
643
  file_selector = gr.CheckboxGroup(
644
+ choices=[],
645
+ label="Select Markdown Files for RAG Processing",
646
+ visible=False,
647
  )
648
 
649
  # Two-step ingestion controls
650
  gr.Markdown("### ๐Ÿ”„ RAG Pipeline Execution")
651
+ gr.Markdown(
652
+ "Process your documentation through our advanced RAG pipeline using Nebius AI embeddings and MongoDB Atlas vector storage."
653
+ )
654
 
655
  with gr.Row():
656
  with gr.Column():
 
680
  lines=25,
681
  value="๐Ÿš€ Ready to start two-step ingestion process...\n\n๐Ÿ“‹ Steps:\n1๏ธโƒฃ Load files from GitHub repository\n2๏ธโƒฃ Generate embeddings and store in vector database",
682
  max_lines=30,
 
683
  )
684
 
685
  # Event handlers
 
717
  gr.Button(interactive=False),
718
  )
719
 
720
+ def start_step1_generator(
721
+ repo_url: str, selected_files: List[str], current_progress: Dict
722
+ ):
723
  """Start Step 1 with generator-based real-time progress updates"""
724
+ for progress_update in start_file_loading_generator(
725
+ repo_url, selected_files, current_progress.copy()
726
+ ):
727
  progress_text = format_progress_display(progress_update)
728
+ step2_enabled = (
729
+ progress_update.get("step") == "file_loading_complete"
730
+ )
731
+
732
  yield (
733
  progress_update,
734
  progress_text,
 
828
  # Repository selection - Dropdown that becomes textbox when selected
829
  with gr.Row():
830
  repo_dropdown = gr.Dropdown(
831
+ choices=get_available_repositories()
832
+ or ["No repositories available"],
833
  label="๐Ÿ“š Select Documentation Repository",
834
  value=None,
835
  interactive=True,
836
  allow_custom_value=True,
837
+ info="Choose from available repositories",
838
  )
839
+
840
  # Hidden textbox that will become visible when repo is selected
841
  selected_repo_textbox = gr.Textbox(
842
  label="๐ŸŽฏ Selected Repository",
843
  value="",
844
  interactive=False,
845
  visible=False,
846
+ info="Currently selected repository for querying",
847
  )
848
+
849
  refresh_repos_btn = gr.Button(
850
  "๐Ÿ”„ Refresh Repository List", variant="secondary", size="sm"
851
  )
 
863
  label="๐Ÿ’ญ Ask About Your Documentation",
864
  placeholder="How do I implement a custom component? What are the available API endpoints? How to configure the system?",
865
  lines=3,
866
+ info="Ask natural language questions about your documentation",
867
  )
868
 
869
+ query_btn = gr.Button(
870
+ "๐Ÿš€ Search Documentation", variant="primary", size="lg"
871
+ )
872
 
873
  # Response display as text area
874
  response_output = gr.Textbox(
 
876
  value="Your AI-powered documentation response will appear here with contextual information and source citations...",
877
  lines=10,
878
  interactive=False,
879
+ info="Generated using Nebius LLM with retrieved documentation context",
880
  )
881
 
882
  with gr.Column(scale=2):
883
  gr.Markdown("### ๐Ÿ“– Source References")
884
+ gr.Markdown(
885
+ "View the exact documentation sources used to generate the response, with relevance scores and GitHub links."
886
+ )
887
 
888
  # Source nodes display as JSON
889
  sources_output = gr.JSON(
890
  label="๐Ÿ“Ž Source Citations & Metadata",
891
  value={
892
  "message": "Source documentation excerpts with relevance scores will appear here after your query...",
893
+ "info": "Each source includes file path, relevance score, and content snippet",
894
  },
895
  )
896
 
897
  # Event handlers
898
  def handle_repo_selection(selected_repo):
899
  """Handle repository selection from dropdown"""
900
+ if not selected_repo or selected_repo in [
901
+ "No repositories available",
902
+ "",
903
+ ]:
904
  return (
905
  gr.Dropdown(visible=True), # Keep dropdown visible
906
  gr.Textbox(visible=False, value=""), # Hide textbox
907
+ gr.Button(interactive=False), # Disable query button
908
  )
909
  else:
910
  return (
911
  gr.Dropdown(visible=False), # Hide dropdown
912
+ gr.Textbox(
913
+ visible=True, value=selected_repo
914
+ ), # Show textbox with selected repo
915
+ gr.Button(interactive=True), # Enable query button
916
  )
917
 
918
  def reset_repo_selection():
919
  """Reset to show dropdown again"""
920
  try:
921
+ repos = get_available_repositories() or [
922
+ "No repositories available"
923
+ ]
924
  return (
925
+ gr.Dropdown(
926
+ choices=repos, value=None, visible=True
927
+ ), # Show dropdown with refreshed choices
928
  gr.Textbox(visible=False, value=""), # Hide textbox
929
+ gr.Button(interactive=False), # Disable query button
930
  )
931
  except Exception as e:
932
  print(f"Error refreshing repository list: {e}")
933
  return (
934
+ gr.Dropdown(
935
+ choices=["Error loading repositories"],
936
+ value=None,
937
+ visible=True,
938
+ ),
939
  gr.Textbox(visible=False, value=""),
940
+ gr.Button(interactive=False),
941
  )
942
 
943
  def get_available_docs_repo():
 
950
  try:
951
  repos = get_available_repositories()
952
  if not repos:
953
+ repos = [
954
+ "No repositories available - Please ingest documentation first"
955
+ ]
956
  return gr.Dropdown(choices=repos, value=None)
957
  except Exception as e:
958
  print(f"Error refreshing repository list: {e}")
959
+ return gr.Dropdown(
960
+ choices=["Error loading repositories"], value=None
961
+ )
962
 
963
  # Simple query handler
964
  def handle_query(repo: str, mode: str, query: str):
 
974
  if not query.strip():
975
  return {"error": "Please enter a query."}
976
 
977
+ if not repo or repo in [
978
+ "No repositories available",
979
+ "Error loading repositories",
980
+ "",
981
+ ]:
982
  return {"error": "Please select a valid repository."}
983
 
984
  try:
 
985
  # Create query retriever for the selected repo
986
  retriever = QueryRetriever(repo)
987
 
 
1021
  return response_text, source_nodes
1022
 
1023
  # Wire up events
1024
+
1025
  # Handle repository selection from dropdown
1026
  repo_dropdown.change(
1027
  fn=handle_repo_selection,
1028
  inputs=[repo_dropdown],
1029
  outputs=[repo_dropdown, selected_repo_textbox, query_btn],
1030
+ show_api=False,
1031
  )
1032
+
1033
  # Handle refresh button - resets to dropdown view
1034
  refresh_repos_btn.click(
1035
  fn=reset_repo_selection,
1036
  outputs=[repo_dropdown, selected_repo_textbox, query_btn],
1037
+ show_api=False,
1038
  )
1039
+
1040
  # Also provide API endpoint for listing repositories
1041
  refresh_repos_btn.click(
1042
  fn=get_available_docs_repo,
 
1047
  # Query button uses the textbox value (not dropdown)
1048
  query_btn.click(
1049
  fn=make_query,
1050
+ inputs=[
1051
+ selected_repo_textbox,
1052
+ query_mode,
1053
+ query_input,
1054
+ ], # Use textbox, not dropdown
1055
  outputs=[response_output, sources_output],
1056
  api_name="query_documentation",
1057
  )
 
1059
  # Also allow Enter key to trigger query
1060
  query_input.submit(
1061
  fn=make_query,
1062
+ inputs=[
1063
+ selected_repo_textbox,
1064
+ query_mode,
1065
+ query_input,
1066
+ ], # Use textbox, not dropdown
1067
  outputs=[response_output, sources_output],
1068
  show_api=False,
1069
  )
 
1072
  # Tab 3: Repository Management
1073
  # ================================
1074
  with gr.TabItem("๐Ÿ—‚๏ธ Repository Management"):
1075
+ gr.Markdown(
1076
+ "Manage your ingested repositories - view details and delete repositories when needed."
1077
+ )
1078
+
1079
  with gr.Row():
1080
  with gr.Column(scale=1):
1081
  gr.Markdown("### ๐Ÿ“Š Repository Statistics")
1082
  stats_display = gr.JSON(
1083
  label="Database Statistics",
1084
+ value={"message": "Click refresh to load statistics..."},
1085
+ )
1086
+ refresh_stats_btn = gr.Button(
1087
+ "๐Ÿ”„ Refresh Statistics", variant="secondary"
1088
  )
1089
+
 
1090
  with gr.Column(scale=2):
1091
  gr.Markdown("### ๐Ÿ“‹ Repository Details")
1092
  repos_table = gr.Dataframe(
 
1094
  datatype=["str", "number", "str"],
1095
  label="Ingested Repositories",
1096
  interactive=False,
1097
+ wrap=True,
1098
+ )
1099
+ refresh_repos_btn = gr.Button(
1100
+ "๐Ÿ”„ Refresh Repository List", variant="secondary"
1101
  )
 
1102
 
1103
  gr.Markdown("### ๐Ÿ—‘๏ธ Delete Repository")
1104
+ gr.Markdown(
1105
+ "**โš ๏ธ Warning:** This will permanently delete all documents and metadata for the selected repository."
1106
+ )
1107
+
1108
  with gr.Row():
1109
  with gr.Column(scale=2):
1110
  delete_repo_dropdown = gr.Dropdown(
 
1114
  interactive=True,
1115
  allow_custom_value=False,
1116
  )
1117
+
1118
  # Confirmation checkbox
1119
  confirm_delete = gr.Checkbox(
1120
+ label="I understand this action cannot be undone", value=False
 
1121
  )
1122
+
1123
  delete_btn = gr.Button(
1124
+ "๐Ÿ—‘๏ธ Delete Repository",
1125
+ variant="stop",
1126
  size="lg",
1127
+ interactive=False,
1128
  )
1129
+
1130
  with gr.Column(scale=1):
1131
  deletion_status = gr.Textbox(
1132
  label="Deletion Status",
1133
  value="Select a repository and confirm to enable deletion.",
1134
  interactive=False,
1135
+ lines=6,
1136
  )
1137
 
1138
  # Management functions
1139
  def load_repository_stats():
1140
  """Load overall repository statistics"""
1141
  try:
 
1142
  stats = get_repository_stats()
1143
  return stats
1144
  except Exception as e:
 
1147
  def load_repository_details():
1148
  """Load detailed repository information as a table"""
1149
  try:
 
1150
  details = get_repo_details()
1151
+
1152
  if not details:
1153
  return [["No repositories found", 0, "N/A"]]
1154
+
1155
  # Format for dataframe
1156
  table_data = []
1157
  for repo in details:
1158
  last_updated = repo.get("last_updated", "Unknown")
1159
+ if hasattr(last_updated, "strftime"):
1160
  last_updated = last_updated.strftime("%Y-%m-%d %H:%M")
1161
  elif last_updated != "Unknown":
1162
  last_updated = str(last_updated)
1163
+
1164
+ table_data.append(
1165
+ [
1166
+ repo.get("repo_name", "Unknown"),
1167
+ repo.get("file_count", 0),
1168
+ last_updated,
1169
+ ]
1170
+ )
1171
+
1172
  return table_data
1173
+
1174
  except Exception as e:
1175
  return [["Error loading repositories", 0, str(e)]]
1176
 
 
1193
  def delete_repository(repo_name: str, confirmed: bool):
1194
  """Delete the selected repository"""
1195
  if not repo_name:
1196
+ return (
1197
+ "โŒ No repository selected.",
1198
+ gr.Dropdown(choices=[]),
1199
+ gr.Checkbox(value=False),
1200
+ )
 
1201
 
1202
+ if not confirmed:
1203
+ return (
1204
+ "โŒ Please confirm deletion by checking the checkbox.",
1205
+ gr.Dropdown(choices=[]),
1206
+ gr.Checkbox(value=False),
1207
+ )
1208
 
1209
+ try:
1210
  # Perform deletion
1211
  result = delete_repository_data(repo_name)
1212
+
1213
  # Prepare status message
1214
  status_msg = result["message"]
1215
  if result["success"]:
 
1217
  status_msg += f"\n- Vector documents removed: {result['vector_docs_deleted']}"
1218
  status_msg += f"\n- Repository record deleted: {'Yes' if result['repo_record_deleted'] else 'No'}"
1219
  status_msg += f"\n\nโœ… Repository '{repo_name}' has been completely removed."
1220
+
1221
  # Update dropdown (remove deleted repo)
1222
  updated_dropdown = update_delete_dropdown()
1223
+
1224
  # Reset confirmation checkbox
1225
  reset_checkbox = gr.Checkbox(value=False)
1226
+
1227
  return status_msg, updated_dropdown, reset_checkbox
1228
+
1229
  except Exception as e:
1230
  error_msg = f"โŒ Error deleting repository: {str(e)}"
1231
  return error_msg, gr.Dropdown(choices=[]), gr.Checkbox(value=False)
1232
 
1233
  # Wire up management events
1234
  refresh_stats_btn.click(
1235
+ fn=load_repository_stats, outputs=[stats_display], show_api=False
 
 
1236
  )
1237
+
1238
  refresh_repos_btn.click(
1239
+ fn=load_repository_details, outputs=[repos_table], show_api=False
 
 
1240
  )
1241
+
1242
  # Update delete dropdown when refreshing repos
1243
  refresh_repos_btn.click(
1244
  fn=update_delete_dropdown,
1245
  outputs=[delete_repo_dropdown],
1246
+ show_api=False,
1247
  )
1248
+
1249
  # Enable/disable delete button based on selection and confirmation
1250
  delete_repo_dropdown.change(
1251
  fn=check_delete_button_state,
1252
  inputs=[delete_repo_dropdown, confirm_delete],
1253
  outputs=[delete_btn],
1254
+ show_api=False,
1255
  )
1256
+
1257
  confirm_delete.change(
1258
  fn=check_delete_button_state,
1259
  inputs=[delete_repo_dropdown, confirm_delete],
1260
  outputs=[delete_btn],
1261
+ show_api=False,
1262
  )
1263
+
1264
  # Delete repository
1265
  delete_btn.click(
1266
  fn=delete_repository,
1267
  inputs=[delete_repo_dropdown, confirm_delete],
1268
  outputs=[deletion_status, delete_repo_dropdown, confirm_delete],
1269
+ show_api=False,
1270
  )
1271
 
1272
  # Load data on tab load
1273
+ demo.load(fn=load_repository_stats, outputs=[stats_display], show_api=False)
1274
+
1275
+ demo.load(fn=load_repository_details, outputs=[repos_table], show_api=False)
1276
+
 
 
 
 
 
 
 
 
1277
  demo.load(
1278
  fn=update_delete_dropdown,
1279
  outputs=[delete_repo_dropdown],
1280
+ show_api=False,
1281
  )
1282
 
1283
  # ================================
 
1285
  # ================================
1286
  with gr.TabItem("๐Ÿ” GitHub File Search", visible=False):
1287
  gr.Markdown("### ๐Ÿ”ง GitHub Repository File Search API")
1288
+ gr.Markdown(
1289
+ "Pure API endpoints for GitHub file operations - all responses in JSON format"
1290
+ )
1291
+
1292
  with gr.Row():
1293
  with gr.Column():
1294
  gr.Markdown("#### ๐Ÿ“‹ List Repository Files")
1295
+
1296
  # Repository input for file operations
1297
  api_repo_input = gr.Textbox(
1298
  label="Repository URL",
1299
  placeholder="owner/repo or https://github.com/owner/repo",
1300
  value="",
1301
+ info="GitHub repository to scan",
1302
  )
1303
+
1304
  # Branch selection
1305
  api_branch_input = gr.Textbox(
1306
  label="Branch",
1307
  value="main",
1308
  placeholder="main",
1309
+ info="Branch to search (default: main)",
1310
  )
1311
+
1312
  # File extensions
1313
  api_extensions_input = gr.Textbox(
1314
  label="File Extensions (comma-separated)",
1315
  value=".md,.mdx",
1316
  placeholder=".md,.mdx,.txt",
1317
+ info="File extensions to include",
1318
  )
1319
+
1320
  # List files button
1321
  list_files_btn = gr.Button("๐Ÿ“‹ List Files", variant="primary")
1322
+
1323
  with gr.Column():
1324
  gr.Markdown("#### ๐Ÿ“„ Get Single File")
1325
+
1326
  # Single file inputs
1327
  single_repo_input = gr.Textbox(
1328
  label="Repository URL",
1329
  placeholder="owner/repo or https://github.com/owner/repo",
1330
  value="",
1331
+ info="GitHub repository",
1332
  )
1333
+
1334
  single_file_input = gr.Textbox(
1335
  label="File Path",
1336
  placeholder="docs/README.md",
1337
  value="",
1338
+ info="Path to specific file in repository",
1339
  )
1340
+
1341
  single_branch_input = gr.Textbox(
1342
  label="Branch",
1343
  value="main",
1344
  placeholder="main",
1345
+ info="Branch name (default: main)",
1346
  )
1347
+
1348
  # Get single file button
1349
+ get_single_btn = gr.Button(
1350
+ "๐Ÿ“„ Get Single File", variant="secondary"
1351
+ )
1352
 
1353
  with gr.Row():
1354
  with gr.Column():
1355
  gr.Markdown("#### ๐Ÿ“š Get Multiple Files")
1356
+
1357
  # Multiple files inputs
1358
  multiple_repo_input = gr.Textbox(
1359
  label="Repository URL",
1360
  placeholder="owner/repo or https://github.com/owner/repo",
1361
  value="",
1362
+ info="GitHub repository",
1363
  )
1364
+
1365
  multiple_files_input = gr.Textbox(
1366
  label="File Paths (comma-separated)",
1367
  placeholder="README.md,docs/guide.md,api/overview.md",
1368
  value="",
1369
  lines=3,
1370
+ info="Comma-separated list of file paths",
1371
  )
1372
+
1373
  multiple_branch_input = gr.Textbox(
1374
  label="Branch",
1375
  value="main",
1376
  placeholder="main",
1377
+ info="Branch name (default: main)",
1378
  )
1379
+
1380
  # Get multiple files button
1381
+ get_multiple_btn = gr.Button(
1382
+ "๐Ÿ“š Get Multiple Files", variant="secondary"
1383
+ )
1384
 
1385
  # Single JSON output for all operations
1386
  gr.Markdown("### ๐Ÿ“Š API Response")
 
1388
  label="JSON Response",
1389
  value={
1390
  "message": "API responses will appear here",
1391
+ "info": "Use the buttons above to interact with GitHub repositories",
1392
+ },
1393
  )
1394
 
1395
  # Pure API Functions (JSON only responses)
1396
+ def list_repository_files(
1397
+ repo_url: str, branch: str = "main", extensions: str = ".md,.mdx"
1398
+ ):
1399
  """
1400
  List all files in a GitHub repository with specified extensions
1401
+
1402
  Args:
1403
  repo_url: GitHub repository URL or owner/repo format
1404
  branch: Branch name to search (default: main)
1405
  extensions: Comma-separated file extensions (default: .md,.mdx)
1406
+
1407
  Returns:
1408
  JSON response with file list and metadata
1409
  """
1410
  try:
1411
  if not repo_url.strip():
1412
  return {"success": False, "error": "Repository URL is required"}
1413
+
1414
  # Parse extensions list
1415
+ ext_list = [
1416
+ ext.strip() for ext in extensions.split(",") if ext.strip()
1417
+ ]
1418
  if not ext_list:
1419
  ext_list = [".md", ".mdx"]
 
1420
 
1421
  # Get files list
1422
  files, status_message = fetch_repository_files(
1423
  repo_url=repo_url,
1424
  file_extensions=ext_list,
1425
  github_token=os.getenv("GITHUB_API_KEY"),
1426
+ branch=branch,
1427
  )
1428
+
1429
  if files:
1430
  return {
1431
  "success": True,
 
1434
  "extensions": ext_list,
1435
  "total_files": len(files),
1436
  "files": files,
1437
+ "status": status_message,
1438
  }
1439
  else:
1440
  return {
 
1444
  "extensions": ext_list,
1445
  "total_files": 0,
1446
  "files": [],
1447
+ "error": status_message or "No files found",
1448
  }
1449
+
1450
  except Exception as e:
1451
  return {
1452
  "success": False,
1453
  "error": f"Failed to list files: {str(e)}",
1454
  "repository": repo_url,
1455
+ "branch": branch,
1456
  }
1457
 
1458
  def get_single_file(repo_url: str, file_path: str, branch: str = "main"):
1459
  """
1460
  Retrieve a single file from GitHub repository
1461
+
1462
  Args:
1463
+ repo_url: GitHub repository URL or owner/repo format
1464
  file_path: Path to the file in the repository
1465
  branch: Branch name (default: main)
1466
+
1467
  Returns:
1468
  JSON response with file content and metadata
1469
  """
1470
  try:
1471
  if not repo_url.strip():
1472
  return {"success": False, "error": "Repository URL is required"}
1473
+
1474
  if not file_path.strip():
1475
  return {"success": False, "error": "File path is required"}
1476
+
1477
  # Parse repo name
1478
  if "github.com" in repo_url:
1479
  repo_name = (
 
1483
  )
1484
  else:
1485
  repo_name = repo_url.strip()
1486
+
1487
  # Load single file
1488
  documents, failed = load_github_files(
1489
  repo_name=repo_name,
1490
  file_paths=[file_path.strip()],
1491
  branch=branch,
1492
+ github_token=os.getenv("GITHUB_API_KEY"),
1493
  )
1494
+
1495
  if documents and len(documents) > 0:
1496
  doc = documents[0]
1497
  return {
 
1504
  "content": doc.text,
1505
  "metadata": doc.metadata,
1506
  "url": doc.metadata.get("url", ""),
1507
+ "raw_url": doc.metadata.get("raw_url", ""),
1508
  }
1509
  else:
1510
  error_msg = f"Failed to retrieve file: {failed[0] if failed else 'File not found or access denied'}"
 
1513
  "repository": repo_name,
1514
  "branch": branch,
1515
  "file_path": file_path,
1516
+ "error": error_msg,
1517
  }
1518
+
1519
  except Exception as e:
1520
  return {
1521
  "success": False,
1522
  "error": f"Failed to get single file: {str(e)}",
1523
  "repository": repo_url,
1524
  "file_path": file_path,
1525
+ "branch": branch,
1526
  }
1527
 
1528
+ def get_multiple_files(
1529
+ repo_url: str, file_paths_str: str, branch: str = "main"
1530
+ ):
1531
  """
1532
  Retrieve multiple files from GitHub repository
1533
+
1534
  Args:
1535
  repo_url: GitHub repository URL or owner/repo format
1536
  file_paths_str: Comma-separated string of file paths
1537
  branch: Branch name (default: main)
1538
+
1539
  Returns:
1540
  JSON response with multiple file contents and metadata
1541
  """
1542
  try:
1543
  if not repo_url.strip():
1544
  return {"success": False, "error": "Repository URL is required"}
1545
+
1546
  if not file_paths_str.strip():
1547
  return {"success": False, "error": "File paths are required"}
1548
+
1549
  # Parse file paths from comma-separated string
1550
+ file_paths = [
1551
+ path.strip()
1552
+ for path in file_paths_str.split(",")
1553
+ if path.strip()
1554
+ ]
1555
+
1556
  if not file_paths:
1557
+ return {
1558
+ "success": False,
1559
+ "error": "No valid file paths provided",
1560
+ }
1561
+
1562
  # Parse repo name
1563
  if "github.com" in repo_url:
1564
  repo_name = (
 
1568
  )
1569
  else:
1570
  repo_name = repo_url.strip()
1571
+
1572
  # Load multiple files
1573
  documents, failed = load_github_files(
1574
  repo_name=repo_name,
1575
  file_paths=file_paths,
1576
  branch=branch,
1577
+ github_token=os.getenv("GITHUB_API_KEY"),
1578
  )
1579
+
1580
  # Process successful documents
1581
  successful_files = []
1582
  for doc in documents:
 
1587
  "content": doc.text,
1588
  "metadata": doc.metadata,
1589
  "url": doc.metadata.get("url", ""),
1590
+ "raw_url": doc.metadata.get("raw_url", ""),
1591
  }
1592
  successful_files.append(file_data)
1593
+
1594
  return {
1595
  "success": True,
1596
  "repository": repo_name,
 
1601
  "files": successful_files,
1602
  "failed_file_paths": failed,
1603
  "total_content_size": sum(len(doc.text) for doc in documents),
1604
+ "requested_file_paths": file_paths,
1605
  }
1606
+
1607
  except Exception as e:
1608
  return {
1609
  "success": False,
1610
  "error": f"Failed to get multiple files: {str(e)}",
1611
  "repository": repo_url,
1612
  "file_paths": file_paths_str,
1613
+ "branch": branch,
1614
  }
1615
 
1616
  # Wire up the GitHub file search events - all output to single JSON component
 
1618
  fn=list_repository_files,
1619
  inputs=[api_repo_input, api_branch_input, api_extensions_input],
1620
  outputs=[api_response_output],
1621
+ api_name="list_repository_files",
1622
  )
1623
+
1624
  get_single_btn.click(
1625
  fn=get_single_file,
1626
  inputs=[single_repo_input, single_file_input, single_branch_input],
1627
  outputs=[api_response_output],
1628
+ api_name="get_single_file",
1629
  )
1630
+
1631
  get_multiple_btn.click(
1632
  fn=get_multiple_files,
1633
+ inputs=[
1634
+ multiple_repo_input,
1635
+ multiple_files_input,
1636
+ multiple_branch_input,
1637
+ ],
1638
  outputs=[api_response_output],
1639
+ api_name="get_multiple_files",
1640
+ )
1641
+
1642
+ # ================================
1643
+ # Tab 5: About & MCP Configuration
1644
+ # ================================
1645
+ with gr.TabItem("โ„น๏ธ About & MCP Setup"):
1646
+ gr.Markdown("# ๐Ÿ“š Doc-MCP: Documentation RAG System")
1647
+ gr.Markdown(
1648
+ "**Transform GitHub documentation repositories into accessible MCP servers for AI agents.**"
1649
  )
1650
+
1651
+ with gr.Row():
1652
+ with gr.Column(scale=2):
1653
+ # Project Overview
1654
+ with gr.Accordion("๐ŸŽฏ What is Doc-MCP?", open=True):
1655
+ gr.Markdown("""
1656
+ **Doc-MCP** converts GitHub documentation into AI-queryable knowledge bases via the Model Context Protocol.
1657
+
1658
+ **๐Ÿ”‘ Key Features:**
1659
+ - ๐Ÿ“ฅ **GitHub Integration** - Automatic markdown file extraction
1660
+ - ๐Ÿง  **AI Embeddings** - Nebius AI-powered vector search
1661
+ - ๐Ÿ” **Smart Search** - Semantic, keyword & hybrid modes
1662
+ - ๐Ÿค– **MCP Server** - Direct AI agent integration
1663
+ - โšก **Real-time** - Live processing progress
1664
+ """)
1665
+
1666
+ # Quick Start Guide
1667
+ with gr.Accordion("๐Ÿš€ Quick Start", open=False):
1668
+ gr.Markdown("""
1669
+ **1. Ingest Documentation** โ†’ Enter GitHub repo URL โ†’ Select files โ†’ Run 2-step pipeline
1670
+
1671
+ **2. Query with AI** โ†’ Select repository โ†’ Ask questions โ†’ Get answers with sources
1672
+
1673
+ **3. Manage Repos** โ†’ View stats โ†’ Delete old repositories
1674
+
1675
+ **4. Use MCP Tools** โ†’ Configure your AI agent โ†’ Query docs directly from IDE
1676
+ """)
1677
+
1678
+ with gr.Column(scale=2):
1679
+ # MCP Server Configuration
1680
+ with gr.Accordion("๐Ÿ”ง MCP Server Setup", open=True):
1681
+ gr.Markdown("### ๐ŸŒ Server URL")
1682
+
1683
+ # Server URL
1684
+ gr.Textbox(
1685
+ value="https://agents-mcp-hackathon-doc-mcp.hf.space/gradio_api/mcp/sse",
1686
+ label="MCP Endpoint",
1687
+ interactive=False,
1688
+ info="Copy this URL for your MCP client configuration",
1689
+ )
1690
+
1691
+ gr.Markdown("### โš™๏ธ Configuration")
1692
+
1693
+ # SSE Configuration
1694
+ with gr.Accordion("For Cursor, Windsurf, Cline", open=False):
1695
+ sse_config = """{
1696
+ "mcpServers": {
1697
+ "doc-mcp": {
1698
+ "url": "https://agents-mcp-hackathon-doc-mcp.hf.space/gradio_api/mcp/sse"
1699
+ }
1700
+ }
1701
+ }"""
1702
+ gr.Code(
1703
+ value=sse_config,
1704
+ label="SSE Configuration",
1705
+ language="json",
1706
+ interactive=False,
1707
+ )
1708
+
1709
+ # STDIO Configuration
1710
+ with gr.Accordion(
1711
+ "For STDIO Clients (Experimental)", open=False
1712
+ ):
1713
+ stdio_config = """{
1714
+ "mcpServers": {
1715
+ "doc-mcp": {
1716
+ "command": "npx",
1717
+ "args": ["mcp-remote", "https://agents-mcp-hackathon-doc-mcp.hf.space/gradio_api/mcp/sse", "--transport", "sse-only"]
1718
+ }
1719
+ }
1720
+ }"""
1721
+ gr.Code(
1722
+ value=stdio_config,
1723
+ label="STDIO Configuration",
1724
+ language="json",
1725
+ interactive=False,
1726
+ )
1727
+
1728
+ # MCP Tools Overview
1729
+ with gr.Row():
1730
+ with gr.Column():
1731
+ gr.Markdown("### ๐Ÿ› ๏ธ Available MCP Tools")
1732
+
1733
+ with gr.Row():
1734
+ with gr.Column():
1735
+ gr.Markdown("**๐Ÿ” Documentation Query Tools**")
1736
+ gr.Markdown(
1737
+ "โ€ข `get_available_docs_repo` - List repositories"
1738
+ )
1739
+ gr.Markdown("โ€ข `make_query` - Search documentation with AI")
1740
+
1741
+ with gr.Column():
1742
+ gr.Markdown("**๐Ÿ“ GitHub File Tools**")
1743
+ gr.Markdown("โ€ข `list_repository_files` - Scan repo files")
1744
+ gr.Markdown("โ€ข `get_single_file` - Fetch one file")
1745
+ gr.Markdown("โ€ข `get_multiple_files` - Fetch multiple files")
1746
+
1747
+ # Technology Stack & Project Info
1748
+ with gr.Row():
1749
+ with gr.Column():
1750
+ with gr.Accordion("โš™๏ธ Technology Stack", open=False):
1751
+ gr.Markdown("**๐Ÿ–ฅ๏ธ Frontend & API**")
1752
+ gr.Markdown("โ€ข **Gradio** - Web interface & API framework")
1753
+ gr.Markdown("โ€ข **Hugging Face Spaces** - Cloud hosting")
1754
+
1755
+ gr.Markdown("**๐Ÿค– AI & ML**")
1756
+ gr.Markdown("โ€ข **Nebius AI** - LLM & embedding models")
1757
+ gr.Markdown("โ€ข **LlamaIndex** - RAG framework")
1758
+
1759
+ gr.Markdown("**๐Ÿ’พ Database & Storage**")
1760
+ gr.Markdown("โ€ข **MongoDB Atlas** - Vector database")
1761
+ gr.Markdown("โ€ข **GitHub API** - Source file access")
1762
+
1763
+ gr.Markdown("**๐Ÿ”Œ Integration**")
1764
+ gr.Markdown("โ€ข **Model Context Protocol** - AI agent standard")
1765
+ gr.Markdown(
1766
+ "โ€ข **Server-Sent Events** - Real-time communication"
1767
+ )
1768
+
1769
+ with gr.Column():
1770
+ with gr.Accordion("๐Ÿ‘ฅ Project Information", open=False):
1771
+ gr.Markdown("**๐Ÿ† MCP Hackathon Project**")
1772
+ gr.Markdown(
1773
+ "Created to showcase AI agent integration with documentation systems."
1774
+ )
1775
+
1776
+ gr.Markdown("**๐Ÿ’ก Inspiration**")
1777
+ gr.Markdown("โ€ข Making Gradio docs easily searchable")
1778
+ gr.Markdown("โ€ข Leveraging Hugging Face AI ecosystem")
1779
+ gr.Markdown(
1780
+ "โ€ข Improving developer experience with AI assistants"
1781
+ )
1782
+
1783
+ gr.Markdown("**๐Ÿ”ฎ Future Plans**")
1784
+ gr.Markdown("โ€ข Support for PDF, HTML files")
1785
+ gr.Markdown("โ€ข Multi-language documentation")
1786
+ gr.Markdown("โ€ข Custom embedding fine-tuning")
1787
+
1788
+ gr.Markdown("**๐Ÿ“„ License:** MIT - Free to use and modify")
1789
+
1790
+ # Usage Examples
1791
+ with gr.Row():
1792
+ with gr.Column():
1793
+ with gr.Accordion("๐Ÿ’ก Usage Examples", open=False):
1794
+ gr.Markdown("### Example Workflow")
1795
+
1796
+ with gr.Row():
1797
+ with gr.Column():
1798
+ gr.Markdown("**๐Ÿ“ฅ Step 1: Ingest Docs**")
1799
+ gr.Code(
1800
+ value="1. Enter: gradio-app/gradio\n2. Select markdown files\n3. Run ingestion pipeline",
1801
+ label="Ingestion Process",
1802
+ interactive=False,
1803
+ )
1804
+
1805
+ with gr.Column():
1806
+ gr.Markdown("**๐Ÿค– Step 2: Query with AI**")
1807
+ gr.Code(
1808
+ value='Query: "How to create custom components?"\nResponse: Detailed answer with source links',
1809
+ label="AI Query Example",
1810
+ interactive=False,
1811
+ )
1812
+
1813
+ gr.Markdown("### MCP Tool Usage")
1814
+ gr.Code(
1815
+ value="""# In your AI agent:
1816
+ 1. Call: get_available_docs_repo() -> ["gradio-app/gradio", ...]
1817
+ 2. Call: make_query("gradio-app/gradio", "default", "custom components")
1818
+ 3. Get: AI response + source citations""",
1819
+ label="MCP Integration Example",
1820
+ language="python",
1821
+ interactive=False,
1822
+ )
1823
+
1824
  if __name__ == "__main__":
1825
  demo.launch(mcp_server=True)