ABDALLALSWAITI
commited on
Commit
Β·
edca24f
1
Parent(s):
ccee463
Add hackathon files file
Browse files- README.md +44 -6
- app.py +649 -0
- requirements.txt +6 -0
README.md
CHANGED
@@ -1,14 +1,52 @@
|
|
1 |
---
|
2 |
-
title:
|
3 |
-
emoji:
|
4 |
colorFrom: green
|
5 |
-
colorTo:
|
6 |
sdk: gradio
|
7 |
-
sdk_version: 5.33.1
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
license: apache-2.0
|
11 |
-
short_description:
|
|
|
12 |
---
|
13 |
|
14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
title: Hugging Face Information Server
|
3 |
+
emoji: π
|
4 |
colorFrom: green
|
5 |
+
colorTo: blue
|
6 |
sdk: gradio
|
7 |
+
sdk_version: '5.33.1'
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
license: apache-2.0
|
11 |
+
short_description: A research assistant for the Hugging Face ecosystem.
|
12 |
+
tags: ["mcp-server-track"]
|
13 |
---
|
14 |
|
15 |
+
# Hugging Face Information Server π
|
16 |
+
|
17 |
+
[](https://opensource.org/licenses/Apache-2.0)
|
18 |
+
|
19 |
+
The **Hugging Face Information Server** is a research assistant designed to accelerate development and learning within the AI ecosystem. It acts as an intelligent bridge to the vast resources on Hugging Face, transforming simple queries into structured, actionable intelligence.
|
20 |
+
|
21 |
+
Instead of just returning links, this tool provides comprehensive, formatted summaries directly, allowing developers and researchers to find practical answers without leaving their workflow.
|
22 |
+
|
23 |
+
## π Demo Video
|
24 |
+
|
25 |
+
*(Submission Guideline)*
|
26 |
+
|
27 |
+
Watch a demonstration of this MCP server in action with an AI agent client.
|
28 |
+
|
29 |
+
**[β‘οΈ Click here to watch the demo video](https://www.YOUR_VIDEO_LINK_HERE.com)**
|
30 |
+
|
31 |
+
## Key Features
|
32 |
+
|
33 |
+
- **Comprehensive Documentation Search:** Delivers structured summaries of official documentation, complete with overviews, installation steps, and parameter lists.
|
34 |
+
- **In-Depth Model & Dataset Analysis:** Goes beyond basic stats to provide rich profiles of models and datasets, including descriptions, download counts, and ready-to-use code snippets.
|
35 |
+
- **Task-Oriented Model Discovery:** Helps you find the right tool for the job by searching for models based on specific tasks like `text-classification` or `image-generation`.
|
36 |
+
- **Live & Relevant Data:** Utilizes a combination of live API calls and intelligent web scraping to ensure the information is always up-to-date.
|
37 |
+
|
38 |
+
## How to Use the Web Interface
|
39 |
+
|
40 |
+
The user interface is organized into clear tabs for different functions:
|
41 |
+
|
42 |
+
1. **Select a tab** at the top (e.g., "Model Information", "Documentation Search").
|
43 |
+
2. **Enter your query** into the textbox.
|
44 |
+
3. **Click the button** to get a detailed, formatted response with code examples and usage instructions.
|
45 |
+
|
46 |
+
## How to Use as an MCP Server for AI Agents
|
47 |
+
|
48 |
+
This application is also a fully compliant **Model Context Protocol (MCP)** server, allowing AI agents to use its functions as tools.
|
49 |
+
|
50 |
+
### Connection Endpoint
|
51 |
+
|
52 |
+
An AI agent (MCP client) can connect to this server using the following Server-Sent Events (SSE) endpoint:
|
app.py
ADDED
@@ -0,0 +1,649 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import gradio as gr
|
2 |
+
import requests
|
3 |
+
from bs4 import BeautifulSoup
|
4 |
+
import json
|
5 |
+
from typing import List, Dict, Any, Optional
|
6 |
+
import re
|
7 |
+
from urllib.parse import urljoin, urlparse
|
8 |
+
import time
|
9 |
+
from functools import lru_cache
|
10 |
+
import logging
|
11 |
+
from datetime import datetime, timedelta
|
12 |
+
|
13 |
+
# Configure logging
|
14 |
+
logging.basicConfig(level=logging.INFO)
|
15 |
+
logger = logging.getLogger(__name__)
|
16 |
+
|
17 |
+
class HuggingFaceInfoServer:
|
18 |
+
def __init__(self):
|
19 |
+
self.base_url = "https://huggingface.co"
|
20 |
+
self.docs_url = "https://huggingface.co/docs"
|
21 |
+
self.api_url = "https://huggingface.co/api"
|
22 |
+
self.session = requests.Session()
|
23 |
+
self.session.headers.update({
|
24 |
+
'User-Agent': 'HF-Info-Server/1.0 (Educational Purpose)',
|
25 |
+
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
|
26 |
+
'Accept-Language': 'en-US,en;q=0.5',
|
27 |
+
'Accept-Encoding': 'gzip, deflate',
|
28 |
+
'Connection': 'keep-alive',
|
29 |
+
'Upgrade-Insecure-Requests': '1'
|
30 |
+
})
|
31 |
+
self.cache = {}
|
32 |
+
self.cache_ttl = 3600 # 1 hour cache TTL
|
33 |
+
|
34 |
+
def _is_cache_valid(self, cache_key: str) -> bool:
|
35 |
+
if cache_key not in self.cache:
|
36 |
+
return False
|
37 |
+
cache_time = self.cache[cache_key].get('timestamp', 0)
|
38 |
+
return time.time() - cache_time < self.cache_ttl
|
39 |
+
|
40 |
+
def _get_from_cache(self, cache_key: str) -> Optional[str]:
|
41 |
+
if self._is_cache_valid(cache_key):
|
42 |
+
return self.cache[cache_key]['content']
|
43 |
+
return None
|
44 |
+
|
45 |
+
def _store_in_cache(self, cache_key: str, content: str):
|
46 |
+
self.cache[cache_key] = {
|
47 |
+
'content': content,
|
48 |
+
'timestamp': time.time()
|
49 |
+
}
|
50 |
+
|
51 |
+
def _fetch_with_retry(self, url: str, max_retries: int = 3) -> Optional[str]:
|
52 |
+
cache_key = f"url_{hash(url)}"
|
53 |
+
cached_content = self._get_from_cache(cache_key)
|
54 |
+
if cached_content:
|
55 |
+
logger.info(f"Cache hit for {url}")
|
56 |
+
return cached_content
|
57 |
+
for attempt in range(max_retries):
|
58 |
+
try:
|
59 |
+
logger.info(f"Fetching {url} (attempt {attempt + 1})")
|
60 |
+
response = self.session.get(url, timeout=20)
|
61 |
+
response.raise_for_status()
|
62 |
+
content = response.text
|
63 |
+
self._store_in_cache(cache_key, content)
|
64 |
+
return content
|
65 |
+
except requests.exceptions.RequestException as e:
|
66 |
+
logger.warning(f"Attempt {attempt + 1} failed for {url}: {e}")
|
67 |
+
if attempt < max_retries - 1:
|
68 |
+
time.sleep(2 ** attempt)
|
69 |
+
else:
|
70 |
+
logger.error(f"All attempts failed for {url}")
|
71 |
+
return None
|
72 |
+
return None
|
73 |
+
|
74 |
+
def _extract_code_examples(self, soup: BeautifulSoup) -> List[Dict[str, str]]:
|
75 |
+
code_blocks = []
|
76 |
+
code_elements = soup.find_all(['code', 'pre'])
|
77 |
+
for code_elem in code_elements:
|
78 |
+
lang_class = code_elem.get('class', [])
|
79 |
+
language = 'python'
|
80 |
+
for cls in lang_class:
|
81 |
+
if 'language-' in str(cls):
|
82 |
+
language = str(cls).replace('language-', '')
|
83 |
+
break
|
84 |
+
elif any(lang in str(cls).lower() for lang in ['python', 'bash', 'javascript', 'json']):
|
85 |
+
language = str(cls).lower()
|
86 |
+
break
|
87 |
+
code_text = code_elem.get_text(strip=True)
|
88 |
+
if len(code_text) > 20 and any(keyword in code_text.lower() for keyword in ['import', 'from', 'def', 'class', 'pip install', 'transformers']):
|
89 |
+
code_blocks.append({'code': code_text, 'language': language, 'type': 'usage' if any(word in code_text.lower() for word in ['import', 'load', 'pipeline']) else 'example'})
|
90 |
+
highlight_blocks = soup.find_all('div', class_=re.compile(r'highlight|code-block|language'))
|
91 |
+
for block in highlight_blocks:
|
92 |
+
code_text = block.get_text(strip=True)
|
93 |
+
if len(code_text) > 20:
|
94 |
+
code_blocks.append({'code': code_text, 'language': 'python', 'type': 'example'})
|
95 |
+
seen = set()
|
96 |
+
unique_blocks = []
|
97 |
+
for block in code_blocks:
|
98 |
+
code_hash = hash(block['code'][:100])
|
99 |
+
if code_hash not in seen:
|
100 |
+
seen.add(code_hash)
|
101 |
+
unique_blocks.append(block)
|
102 |
+
if len(unique_blocks) >= 5:
|
103 |
+
break
|
104 |
+
return unique_blocks
|
105 |
+
|
106 |
+
def _extract_practical_content(self, soup: BeautifulSoup, topic: str) -> Dict[str, Any]:
|
107 |
+
content = {'overview': '', 'code_examples': [], 'usage_instructions': [], 'parameters': [], 'methods': [], 'installation': '', 'quickstart': ''}
|
108 |
+
main_content = soup.find('main') or soup.find('article') or soup.find('div', class_=re.compile(r'content|docs|prose'))
|
109 |
+
if not main_content:
|
110 |
+
return content
|
111 |
+
overview_sections = main_content.find_all('p', limit=5)
|
112 |
+
overview_texts = []
|
113 |
+
for p in overview_sections:
|
114 |
+
text = p.get_text(strip=True)
|
115 |
+
if len(text) > 30 and not text.startswith('Table of contents'):
|
116 |
+
overview_texts.append(text)
|
117 |
+
if overview_texts:
|
118 |
+
overview = ' '.join(overview_texts)
|
119 |
+
content['overview'] = overview[:1000] + "..." if len(overview) > 1000 else overview
|
120 |
+
content['code_examples'] = self._extract_code_examples(main_content)
|
121 |
+
install_headings = main_content.find_all(['h1', 'h2', 'h3', 'h4'], string=re.compile(r'install|setup|getting started', re.IGNORECASE))
|
122 |
+
for heading in install_headings:
|
123 |
+
next_elem = heading.find_next_sibling()
|
124 |
+
install_text = []
|
125 |
+
while next_elem and next_elem.name not in ['h1', 'h2', 'h3', 'h4'] and len(install_text) < 3:
|
126 |
+
if next_elem.name in ['p', 'pre', 'code']:
|
127 |
+
text = next_elem.get_text(strip=True)
|
128 |
+
if text and len(text) > 10:
|
129 |
+
install_text.append(text)
|
130 |
+
next_elem = next_elem.find_next_sibling()
|
131 |
+
if install_text:
|
132 |
+
content['installation'] = ' '.join(install_text)
|
133 |
+
break
|
134 |
+
usage_headings = main_content.find_all(['h1', 'h2', 'h3', 'h4'])
|
135 |
+
for heading in usage_headings:
|
136 |
+
heading_text = heading.get_text(strip=True).lower()
|
137 |
+
if any(keyword in heading_text for keyword in ['usage', 'example', 'how to', 'quickstart', 'getting started']):
|
138 |
+
next_elem = heading.find_next_sibling()
|
139 |
+
instruction_parts = []
|
140 |
+
while next_elem and next_elem.name not in ['h1', 'h2', 'h3', 'h4']:
|
141 |
+
if next_elem.name in ['p', 'li', 'div', 'ol', 'ul']:
|
142 |
+
text = next_elem.get_text(strip=True)
|
143 |
+
if text and len(text) > 15:
|
144 |
+
instruction_parts.append(text)
|
145 |
+
next_elem = next_elem.find_next_sibling()
|
146 |
+
if len(instruction_parts) >= 5:
|
147 |
+
break
|
148 |
+
if instruction_parts:
|
149 |
+
content['usage_instructions'].extend(instruction_parts)
|
150 |
+
tables = main_content.find_all('table')
|
151 |
+
for table in tables:
|
152 |
+
headers = [th.get_text(strip=True).lower() for th in table.find_all('th')]
|
153 |
+
if any(keyword in ' '.join(headers) for keyword in ['parameter', 'argument', 'option', 'attribute', 'name', 'type']):
|
154 |
+
rows = table.find_all('tr')[1:]
|
155 |
+
for row in rows[:8]:
|
156 |
+
cells = [td.get_text(strip=True) for td in row.find_all('td')]
|
157 |
+
if len(cells) >= 2:
|
158 |
+
param_info = {'name': cells[0], 'description': cells[1] if len(cells) > 1 else '', 'type': cells[2] if len(cells) > 2 else '', 'default': cells[3] if len(cells) > 3 else ''}
|
159 |
+
content['parameters'].append(param_info)
|
160 |
+
return content
|
161 |
+
|
162 |
+
def search_documentation(self, query: str, max_results: int = 3) -> str:
|
163 |
+
"""
|
164 |
+
Searches the official Hugging Face documentation for a specific topic and returns a summary.
|
165 |
+
This tool is useful for finding how-to guides, explanations of concepts like 'pipeline' or 'tokenizer', and usage examples.
|
166 |
+
|
167 |
+
Args:
|
168 |
+
query (str): The topic or keyword to search for in the documentation (e.g., 'fine-tuning', 'peft', 'datasets').
|
169 |
+
max_results (int): The maximum number of documentation pages to retrieve and summarize. Defaults to 3.
|
170 |
+
"""
|
171 |
+
# ... (implementation from previous turn remains the same)
|
172 |
+
try:
|
173 |
+
max_results = int(max_results) if isinstance(max_results, str) else max_results
|
174 |
+
max_results = min(max_results, 5)
|
175 |
+
query_lower = query.lower().strip()
|
176 |
+
if not query_lower:
|
177 |
+
return "Please provide a search query."
|
178 |
+
doc_sections = {
|
179 |
+
'transformers': {'base_url': 'https://huggingface.co/docs/transformers', 'topics': {'pipeline': '/main_classes/pipelines', 'tokenizer': '/main_classes/tokenizer', 'trainer': '/main_classes/trainer', 'model': '/main_classes/model', 'quicktour': '/quicktour', 'installation': '/installation', 'fine-tuning': '/training', 'training': '/training', 'inference': '/main_classes/pipelines', 'preprocessing': '/preprocessing', 'tutorial': '/tutorials', 'configuration': '/main_classes/configuration', 'peft': '/peft', 'lora': '/peft', 'quantization': '/main_classes/quantization', 'generation': '/main_classes/text_generation', 'optimization': '/perf_train_gpu_one', 'deployment': '/deployment', 'custom': '/custom_models'}},
|
180 |
+
'datasets': {'base_url': 'https://huggingface.co/docs/datasets', 'topics': {'loading': '/load_hub', 'load': '/load_hub', 'processing': '/process', 'streaming': '/stream', 'audio': '/audio_process', 'image': '/image_process', 'text': '/nlp_process', 'arrow': '/about_arrow', 'cache': '/cache', 'upload': '/upload_dataset', 'custom': '/dataset_script'}},
|
181 |
+
'diffusers': {'base_url': 'https://huggingface.co/docs/diffusers', 'topics': {'pipeline': '/using-diffusers/loading', 'stable diffusion': '/using-diffusers/stable_diffusion', 'controlnet': '/using-diffusers/controlnet', 'inpainting': '/using-diffusers/inpaint', 'training': '/training/overview', 'optimization': '/optimization/fp16', 'schedulers': '/using-diffusers/schedulers'}},
|
182 |
+
'hub': {'base_url': 'https://huggingface.co/docs/hub', 'topics': {'repositories': '/repositories', 'git': '/repositories-getting-started', 'spaces': '/spaces', 'models': '/models', 'datasets': '/datasets'}}
|
183 |
+
}
|
184 |
+
relevant_urls = []
|
185 |
+
for section_name, section_data in doc_sections.items():
|
186 |
+
base_url = section_data['base_url']
|
187 |
+
topics = section_data['topics']
|
188 |
+
for topic, path in topics.items():
|
189 |
+
relevance = 0
|
190 |
+
if query_lower == topic.lower(): relevance = 1.0
|
191 |
+
elif query_lower in topic.lower(): relevance = 0.9
|
192 |
+
elif any(word in topic.lower() for word in query_lower.split()): relevance = 0.7
|
193 |
+
elif any(word in query_lower for word in topic.lower().split()): relevance = 0.6
|
194 |
+
if relevance > 0:
|
195 |
+
full_url = base_url + path
|
196 |
+
relevant_urls.append({'url': full_url, 'topic': topic, 'section': section_name, 'relevance': relevance})
|
197 |
+
relevant_urls.sort(key=lambda x: x['relevance'], reverse=True)
|
198 |
+
relevant_urls = relevant_urls[:max_results]
|
199 |
+
if not relevant_urls:
|
200 |
+
return f"β No documentation found for '{query}'. Try: pipeline, tokenizer, trainer, model, fine-tuning, datasets, diffusers, or peft."
|
201 |
+
result = f"# π Hugging Face Documentation: {query}\n\n"
|
202 |
+
for i, url_info in enumerate(relevant_urls, 1):
|
203 |
+
section_emoji = {'transformers': 'π€', 'datasets': 'π', 'diffusers': 'π¨', 'hub': 'π'}.get(url_info['section'], 'π')
|
204 |
+
result += f"## {i}. {section_emoji} {url_info['topic'].title()} ({url_info['section'].title()})\n\n"
|
205 |
+
content = self._fetch_with_retry(url_info['url'])
|
206 |
+
if content:
|
207 |
+
soup = BeautifulSoup(content, 'html.parser')
|
208 |
+
practical_content = self._extract_practical_content(soup, url_info['topic'])
|
209 |
+
if practical_content['overview']: result += f"**π Overview:**\n{practical_content['overview']}\n\n"
|
210 |
+
if practical_content['installation']: result += f"**βοΈ Installation:**\n{practical_content['installation']}\n\n"
|
211 |
+
if practical_content['code_examples']:
|
212 |
+
result += "**π» Code Examples:**\n\n"
|
213 |
+
for j, code_block in enumerate(practical_content['code_examples'][:3], 1):
|
214 |
+
lang = code_block.get('language', 'python')
|
215 |
+
code_type = code_block.get('type', 'example')
|
216 |
+
result += f"*{code_type.title()} {j}:*\n```{lang}\n{code_block['code']}\n```\n\n"
|
217 |
+
if practical_content['usage_instructions']:
|
218 |
+
result += "**π οΈ Usage Instructions:**\n"
|
219 |
+
for idx, instruction in enumerate(practical_content['usage_instructions'][:4], 1):
|
220 |
+
result += f"{idx}. {instruction}\n"
|
221 |
+
result += "\n"
|
222 |
+
if practical_content['parameters']:
|
223 |
+
result += "**βοΈ Parameters:**\n"
|
224 |
+
for param in practical_content['parameters'][:6]:
|
225 |
+
param_type = f" (`{param['type']}`)" if param.get('type') else ""
|
226 |
+
default_val = f" *Default: {param['default']}*" if param.get('default') else ""
|
227 |
+
result += f"β’ **{param['name']}**{param_type}: {param['description']}{default_val}\n"
|
228 |
+
result += "\n"
|
229 |
+
result += f"**π Full Documentation:** {url_info['url']}\n\n"
|
230 |
+
else:
|
231 |
+
result += f"β οΈ Could not fetch content. Visit directly: {url_info['url']}\n\n"
|
232 |
+
result += "---\n\n"
|
233 |
+
return result
|
234 |
+
except Exception as e:
|
235 |
+
logger.error(f"Error in search_documentation: {e}")
|
236 |
+
return f"β Error searching documentation: {str(e)}\n\nTry a simpler search term or check your internet connection."
|
237 |
+
|
238 |
+
|
239 |
+
def get_model_info(self, model_name: str) -> str:
|
240 |
+
"""
|
241 |
+
Fetches comprehensive information about a specific model from the Hugging Face Hub.
|
242 |
+
Provides statistics like downloads and likes, a description, usage examples, and a quick-start code snippet.
|
243 |
+
|
244 |
+
Args:
|
245 |
+
model_name (str): The full identifier of the model on the Hub, such as 'bert-base-uncased' or 'meta-llama/Llama-2-7b-hf'.
|
246 |
+
"""
|
247 |
+
# ... (implementation from previous turn remains the same)
|
248 |
+
try:
|
249 |
+
model_name = model_name.strip()
|
250 |
+
if not model_name: return "Please provide a model name."
|
251 |
+
api_url = f"{self.api_url}/models/{model_name}"
|
252 |
+
response = self.session.get(api_url, timeout=15)
|
253 |
+
if response.status_code == 404: return f"β Model '{model_name}' not found. Please check the model name."
|
254 |
+
elif response.status_code != 200: return f"β Error fetching model info (Status: {response.status_code})"
|
255 |
+
model_data = response.json()
|
256 |
+
result = f"# π€ Model: {model_name}\n\n"
|
257 |
+
downloads = model_data.get('downloads', 0)
|
258 |
+
likes = model_data.get('likes', 0)
|
259 |
+
task = model_data.get('pipeline_tag', 'N/A')
|
260 |
+
library = model_data.get('library_name', 'N/A')
|
261 |
+
result += f"**π Statistics:**\nβ’ **Downloads:** {downloads:,}\nβ’ **Likes:** {likes:,}\nβ’ **Task:** {task}\nβ’ **Library:** {library}\nβ’ **Created:** {model_data.get('createdAt', 'N/A')[:10]}\nβ’ **Updated:** {model_data.get('lastModified', 'N/A')[:10]}\n\n"
|
262 |
+
if 'tags' in model_data and model_data['tags']: result += f"**π·οΈ Tags:** {', '.join(model_data['tags'][:10])}\n\n"
|
263 |
+
model_url = f"{self.base_url}/{model_name}"
|
264 |
+
page_content = self._fetch_with_retry(model_url)
|
265 |
+
if page_content:
|
266 |
+
soup = BeautifulSoup(page_content, 'html.parser')
|
267 |
+
readme_content = soup.find('div', class_=re.compile(r'prose|readme|model-card'))
|
268 |
+
if readme_content:
|
269 |
+
paragraphs = readme_content.find_all('p')[:3]
|
270 |
+
description_parts = []
|
271 |
+
for p in paragraphs:
|
272 |
+
text = p.get_text(strip=True)
|
273 |
+
if len(text) > 30 and not any(skip in text.lower() for skip in ['table of contents', 'toc']):
|
274 |
+
description_parts.append(text)
|
275 |
+
if description_parts:
|
276 |
+
description = ' '.join(description_parts)
|
277 |
+
result += f"**π Description:**\n{description[:800]}{'...' if len(description) > 800 else ''}\n\n"
|
278 |
+
code_examples = self._extract_code_examples(soup)
|
279 |
+
if code_examples:
|
280 |
+
result += "**π» Usage Examples:**\n\n"
|
281 |
+
for i, code_block in enumerate(code_examples[:3], 1):
|
282 |
+
lang = code_block.get('language', 'python')
|
283 |
+
result += f"*Example {i}:*\n```{lang}\n{code_block['code']}\n```\n\n"
|
284 |
+
if task and task != 'N/A':
|
285 |
+
result += f"**π Quick Start Template:**\n"
|
286 |
+
if library == 'transformers':
|
287 |
+
result += f"```python\nfrom transformers import pipeline\n\n# Load the model\nmodel = pipeline('{task}', model='{model_name}')\n\n# Use the model\n# result = model(your_input_here)\nprint(result)\n```\n\n"
|
288 |
+
else:
|
289 |
+
result += f"```python\n# Load and use {model_name}\n# Refer to the documentation for specific usage\n```\n\n"
|
290 |
+
if 'siblings' in model_data:
|
291 |
+
files = [f['rfilename'] for f in model_data['siblings'][:10]]
|
292 |
+
if files:
|
293 |
+
result += f"**π Model Files:** {', '.join(files)}\n\n"
|
294 |
+
result += f"**π Model Page:** {model_url}\n"
|
295 |
+
return result
|
296 |
+
except requests.exceptions.RequestException as e: return f"β Network error: {str(e)}"
|
297 |
+
except Exception as e:
|
298 |
+
logger.error(f"Error in get_model_info: {e}")
|
299 |
+
return f"β Error fetching model info: {str(e)}"
|
300 |
+
|
301 |
+
def get_dataset_info(self, dataset_name: str) -> str:
|
302 |
+
"""
|
303 |
+
Retrieves detailed information about a specific dataset from the Hugging Face Hub.
|
304 |
+
Includes statistics, a description, and a quick-start code snippet showing how to load the dataset.
|
305 |
+
|
306 |
+
Args:
|
307 |
+
dataset_name (str): The full identifier of the dataset on the Hub, for example 'squad' or 'imdb'.
|
308 |
+
"""
|
309 |
+
# ... (implementation from previous turn remains the same)
|
310 |
+
try:
|
311 |
+
dataset_name = dataset_name.strip()
|
312 |
+
if not dataset_name: return "Please provide a dataset name."
|
313 |
+
api_url = f"{self.api_url}/datasets/{dataset_name}"
|
314 |
+
response = self.session.get(api_url, timeout=15)
|
315 |
+
if response.status_code == 404: return f"β Dataset '{dataset_name}' not found. Please check the dataset name."
|
316 |
+
elif response.status_code != 200: return f"β Error fetching dataset info (Status: {response.status_code})"
|
317 |
+
dataset_data = response.json()
|
318 |
+
result = f"# π Dataset: {dataset_name}\n\n"
|
319 |
+
downloads = dataset_data.get('downloads', 0)
|
320 |
+
likes = dataset_data.get('likes', 0)
|
321 |
+
result += f"**π Statistics:**\nβ’ **Downloads:** {downloads:,}\nβ’ **Likes:** {likes:,}\nβ’ **Created:** {dataset_data.get('createdAt', 'N/A')[:10]}\nβ’ **Updated:** {dataset_data.get('lastModified', 'N/A')[:10]}\n\n"
|
322 |
+
if 'tags' in dataset_data and dataset_data['tags']: result += f"**π·οΈ Tags:** {', '.join(dataset_data['tags'][:10])}\n\n"
|
323 |
+
dataset_url = f"{self.base_url}/datasets/{dataset_name}"
|
324 |
+
page_content = self._fetch_with_retry(dataset_url)
|
325 |
+
if page_content:
|
326 |
+
soup = BeautifulSoup(page_content, 'html.parser')
|
327 |
+
readme_content = soup.find('div', class_=re.compile(r'prose|readme|dataset-card'))
|
328 |
+
if readme_content:
|
329 |
+
paragraphs = readme_content.find_all('p')[:3]
|
330 |
+
description_parts = []
|
331 |
+
for p in paragraphs:
|
332 |
+
text = p.get_text(strip=True)
|
333 |
+
if len(text) > 30: description_parts.append(text)
|
334 |
+
if description_parts:
|
335 |
+
description = ' '.join(description_parts)
|
336 |
+
result += f"**π Description:**\n{description[:800]}{'...' if len(description) > 800 else ''}\n\n"
|
337 |
+
code_examples = self._extract_code_examples(soup)
|
338 |
+
if code_examples:
|
339 |
+
result += "**π» Usage Examples:**\n\n"
|
340 |
+
for i, code_block in enumerate(code_examples[:3], 1):
|
341 |
+
lang = code_block.get('language', 'python')
|
342 |
+
result += f"*Example {i}:*\n```{lang}\n{code_block['code']}\n```\n\n"
|
343 |
+
result += f"**π Quick Start Template:**\n"
|
344 |
+
result += f"```python\nfrom datasets import load_dataset\n\n# Load the dataset\ndataset = load_dataset('{dataset_name}')\n\n# Explore the dataset\nprint(dataset)\nprint(f\"Dataset keys: {{list(dataset.keys())}}\")\n\n# Access first example\nif 'train' in dataset:\n print(\"First example:\")\n print(dataset['train'][0])\n```\n\n"
|
345 |
+
result += f"**π Dataset Page:** {dataset_url}\n"
|
346 |
+
return result
|
347 |
+
except requests.exceptions.RequestException as e: return f"β Network error: {str(e)}"
|
348 |
+
except Exception as e:
|
349 |
+
logger.error(f"Error in get_dataset_info: {e}")
|
350 |
+
return f"β Error fetching dataset info: {str(e)}"
|
351 |
+
|
352 |
+
def search_models(self, task: str, limit: str = "5") -> str:
|
353 |
+
"""
|
354 |
+
Searches the Hugging Face Hub for models based on a specified task or keyword and returns a list of top models.
|
355 |
+
Each result includes statistics and a quick usage example.
|
356 |
+
|
357 |
+
Args:
|
358 |
+
task (str): The task to search for, such as 'text-classification', 'image-generation', or 'question-answering'.
|
359 |
+
limit (str): The maximum number of models to return. Defaults to '5'.
|
360 |
+
"""
|
361 |
+
# ... (implementation from previous turn remains the same)
|
362 |
+
try:
|
363 |
+
task = task.strip()
|
364 |
+
if not task: return "Please provide a search task or keyword."
|
365 |
+
limit = int(limit) if isinstance(limit, str) and limit.isdigit() else 5
|
366 |
+
limit = min(max(limit, 1), 10)
|
367 |
+
params = {'search': task, 'limit': limit * 3, 'sort': 'downloads', 'direction': -1}
|
368 |
+
response = self.session.get(f"{self.api_url}/models", params=params, timeout=20)
|
369 |
+
response.raise_for_status()
|
370 |
+
models = response.json()
|
371 |
+
if not models: return f"β No models found for task: '{task}'. Try different keywords."
|
372 |
+
filtered_models = []
|
373 |
+
for model in models:
|
374 |
+
if (model.get('downloads', 0) > 0 or model.get('likes', 0) > 0 or 'pipeline_tag' in model):
|
375 |
+
filtered_models.append(model)
|
376 |
+
if len(filtered_models) >= limit: break
|
377 |
+
if not filtered_models: filtered_models = models[:limit]
|
378 |
+
result = f"# π Top {len(filtered_models)} Models for '{task}'\n\n"
|
379 |
+
for i, model in enumerate(filtered_models, 1):
|
380 |
+
model_id = model.get('id', 'Unknown')
|
381 |
+
downloads = model.get('downloads', 0)
|
382 |
+
likes = model.get('likes', 0)
|
383 |
+
task_type = model.get('pipeline_tag', 'N/A')
|
384 |
+
library = model.get('library_name', 'N/A')
|
385 |
+
quality_score = ""
|
386 |
+
if downloads > 10000: quality_score = "β Popular"
|
387 |
+
elif downloads > 1000: quality_score = "π₯ Active"
|
388 |
+
elif likes > 10: quality_score = "π Liked"
|
389 |
+
result += f"## {i}. {model_id} {quality_score}\n\n"
|
390 |
+
result += f"**π Stats:**\nβ’ **Downloads:** {downloads:,}\nβ’ **Likes:** {likes}\nβ’ **Task:** {task_type}\nβ’ **Library:** {library}\n\n"
|
391 |
+
if task_type and task_type != 'N/A':
|
392 |
+
result += f"**π Quick Usage:**\n"
|
393 |
+
if library == 'transformers':
|
394 |
+
result += f"```python\nfrom transformers import pipeline\n\n# Load model\nmodel = pipeline('{task_type}', model='{model_id}')\n\n# Use model\nresult = model(\"Your input here\")\nprint(result)\n```\n\n"
|
395 |
+
else:
|
396 |
+
result += f"```python\n# Load and use {model_id}\n# Check model page for specific usage instructions\n```\n\n"
|
397 |
+
result += f"**π Model Page:** {self.base_url}/{model_id}\n\n---\n\n"
|
398 |
+
return result
|
399 |
+
except requests.exceptions.RequestException as e: return f"β Network error: {str(e)}"
|
400 |
+
except Exception as e:
|
401 |
+
logger.error(f"Error in search_models: {e}")
|
402 |
+
return f"β Error searching models: {str(e)}"
|
403 |
+
|
404 |
+
def get_transformers_docs(self, topic: str) -> str:
|
405 |
+
"""
|
406 |
+
Fetches detailed documentation specifically for the Hugging Face Transformers library on a given topic.
|
407 |
+
This provides in-depth explanations, code examples, and parameter descriptions for core library components.
|
408 |
+
|
409 |
+
Args:
|
410 |
+
topic (str): The Transformers library topic to look up, such as 'pipeline', 'tokenizer', 'trainer', or 'generation'.
|
411 |
+
"""
|
412 |
+
# ... (implementation from previous turn remains the same)
|
413 |
+
try:
|
414 |
+
topic = topic.strip().lower()
|
415 |
+
if not topic: return "Please provide a topic to search for."
|
416 |
+
docs_url = "https://huggingface.co/docs/transformers"
|
417 |
+
topic_map = {'pipeline': f"{docs_url}/main_classes/pipelines", 'pipelines': f"{docs_url}/main_classes/pipelines", 'tokenizer': f"{docs_url}/main_classes/tokenizer", 'tokenizers': f"{docs_url}/main_classes/tokenizer", 'trainer': f"{docs_url}/main_classes/trainer", 'training': f"{docs_url}/training", 'model': f"{docs_url}/main_classes/model", 'models': f"{docs_url}/main_classes/model", 'configuration': f"{docs_url}/main_classes/configuration", 'config': f"{docs_url}/main_classes/configuration", 'quicktour': f"{docs_url}/quicktour", 'quick': f"{docs_url}/quicktour", 'installation': f"{docs_url}/installation", 'install': f"{docs_url}/installation", 'tutorial': f"{docs_url}/tutorials", 'tutorials': f"{docs_url}/tutorials", 'generation': f"{docs_url}/main_classes/text_generation", 'text_generation': f"{docs_url}/main_classes/text_generation", 'preprocessing': f"{docs_url}/preprocessing", 'preprocess': f"{docs_url}/preprocessing", 'peft': f"{docs_url}/peft", 'lora': f"{docs_url}/peft", 'quantization': f"{docs_url}/main_classes/quantization", 'optimization': f"{docs_url}/perf_train_gpu_one", 'performance': f"{docs_url}/perf_train_gpu_one", 'deployment': f"{docs_url}/deployment", 'custom': f"{docs_url}/custom_models", 'fine-tuning': f"{docs_url}/training", 'finetuning': f"{docs_url}/training"}
|
418 |
+
url = topic_map.get(topic)
|
419 |
+
if not url:
|
420 |
+
for key, value in topic_map.items():
|
421 |
+
if topic in key or key in topic:
|
422 |
+
url = value
|
423 |
+
topic = key
|
424 |
+
break
|
425 |
+
if not url:
|
426 |
+
url = f"{docs_url}/quicktour"
|
427 |
+
topic = "quicktour"
|
428 |
+
content = self._fetch_with_retry(url)
|
429 |
+
if not content: return f"β Could not fetch documentation for '{topic}'. Please try again or visit: {url}"
|
430 |
+
soup = BeautifulSoup(content, 'html.parser')
|
431 |
+
practical_content = self._extract_practical_content(soup, topic)
|
432 |
+
result = f"# π Transformers Documentation: {topic.replace('_', ' ').title()}\n\n"
|
433 |
+
if practical_content['overview']: result += f"**π Overview:**\n{practical_content['overview']}\n\n"
|
434 |
+
if practical_content['installation']: result += f"**βοΈ Installation:**\n{practical_content['installation']}\n\n"
|
435 |
+
if practical_content['code_examples']:
|
436 |
+
result += "**π» Code Examples:**\n\n"
|
437 |
+
for i, code_block in enumerate(practical_content['code_examples'][:4], 1):
|
438 |
+
lang = code_block.get('language', 'python')
|
439 |
+
code_type = code_block.get('type', 'example')
|
440 |
+
result += f"### {code_type.title()} {i}:\n```{lang}\n{code_block['code']}\n```\n\n"
|
441 |
+
if practical_content['usage_instructions']:
|
442 |
+
result += "**π οΈ Step-by-Step Usage:**\n"
|
443 |
+
for i, instruction in enumerate(practical_content['usage_instructions'][:6], 1):
|
444 |
+
result += f"{i}. {instruction}\n"
|
445 |
+
result += "\n"
|
446 |
+
if practical_content['parameters']:
|
447 |
+
result += "**βοΈ Key Parameters:**\n"
|
448 |
+
for param in practical_content['parameters'][:10]:
|
449 |
+
param_type = f" (`{param['type']}`)" if param.get('type') else ""
|
450 |
+
default_val = f" *Default: `{param['default']}`*" if param.get('default') else ""
|
451 |
+
result += f"β’ **`{param['name']}`**{param_type}: {param['description']}{default_val}\n"
|
452 |
+
result += "\n"
|
453 |
+
related_topics = [k for k in topic_map.keys() if k != topic][:5]
|
454 |
+
if related_topics: result += f"**π Related Topics:** {', '.join(related_topics)}\n\n"
|
455 |
+
result += f"**π Full Documentation:** {url}\n"
|
456 |
+
return result
|
457 |
+
except Exception as e:
|
458 |
+
logger.error(f"Error in get_transformers_docs: {e}")
|
459 |
+
return f"β Error fetching Transformers documentation: {str(e)}"
|
460 |
+
|
461 |
+
def get_trending_models(self, limit: str = "10") -> str:
|
462 |
+
"""
|
463 |
+
Fetches a list of the most downloaded models currently trending on the Hugging Face Hub.
|
464 |
+
This is useful for discovering popular and widely-used models.
|
465 |
+
|
466 |
+
Args:
|
467 |
+
limit (str): The number of trending models to return. Defaults to '10'.
|
468 |
+
"""
|
469 |
+
# ... (implementation from previous turn remains the same)
|
470 |
+
try:
|
471 |
+
limit = int(limit) if isinstance(limit, str) and limit.isdigit() else 10
|
472 |
+
limit = min(max(limit, 1), 20)
|
473 |
+
params = {'sort': 'downloads', 'direction': -1, 'limit': limit}
|
474 |
+
response = self.session.get(f"{self.api_url}/models", params=params, timeout=20)
|
475 |
+
response.raise_for_status()
|
476 |
+
models = response.json()
|
477 |
+
if not models: return "β Could not fetch trending models."
|
478 |
+
result = f"# π₯ Trending Models (Top {len(models)})\n\n"
|
479 |
+
for i, model in enumerate(models, 1):
|
480 |
+
model_id = model.get('id', 'Unknown')
|
481 |
+
downloads = model.get('downloads', 0)
|
482 |
+
likes = model.get('likes', 0)
|
483 |
+
task = model.get('pipeline_tag', 'N/A')
|
484 |
+
if downloads > 1000000: trend = "π Mega Popular"
|
485 |
+
elif downloads > 100000: trend = "π₯ Very Popular"
|
486 |
+
elif downloads > 10000: trend = "β Popular"
|
487 |
+
else: trend = "π Trending"
|
488 |
+
result += f"## {i}. {model_id} {trend}\n"
|
489 |
+
result += f"β’ **Downloads:** {downloads:,} | **Likes:** {likes} | **Task:** {task}\n"
|
490 |
+
result += f"β’ **Link:** {self.base_url}/{model_id}\n\n"
|
491 |
+
return result
|
492 |
+
except Exception as e:
|
493 |
+
logger.error(f"Error in get_trending_models: {e}")
|
494 |
+
return f"β Error fetching trending models: {str(e)}"
|
495 |
+
|
496 |
+
# Initialize the server
|
497 |
+
hf_server = HuggingFaceInfoServer()
|
498 |
+
|
499 |
+
# Create Gradio interface
|
500 |
+
with gr.Blocks(
|
501 |
+
title="π€ Hugging Face Information Server",
|
502 |
+
theme=gr.themes.Soft(),
|
503 |
+
css="""
|
504 |
+
.gradio-container {
|
505 |
+
font-family: 'Inter', sans-serif;
|
506 |
+
}
|
507 |
+
.main-header {
|
508 |
+
text-align: center;
|
509 |
+
padding: 20px;
|
510 |
+
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
511 |
+
color: white;
|
512 |
+
border-radius: 10px;
|
513 |
+
margin-bottom: 20px;
|
514 |
+
}
|
515 |
+
"""
|
516 |
+
) as demo:
|
517 |
+
|
518 |
+
# Header
|
519 |
+
with gr.Row():
|
520 |
+
gr.HTML("""
|
521 |
+
<div class="main-header">
|
522 |
+
<h1>π€ Hugging Face Information Server</h1>
|
523 |
+
<p>Get comprehensive documentation with <strong>real code examples</strong>, <strong>usage instructions</strong>, and <strong>practical content</strong></p>
|
524 |
+
</div>
|
525 |
+
""")
|
526 |
+
|
527 |
+
with gr.Tab("π Documentation Search", elem_id="docs"):
|
528 |
+
gr.Markdown("### Search for documentation with **comprehensive code examples** and **step-by-step instructions**")
|
529 |
+
|
530 |
+
with gr.Row():
|
531 |
+
with gr.Column(scale=3):
|
532 |
+
doc_query = gr.Textbox(label="π Search Query", placeholder="e.g., tokenizer, pipeline, fine-tuning, peft, trainer, quantization")
|
533 |
+
with gr.Column(scale=1):
|
534 |
+
doc_max_results = gr.Number(label="Max Results", value=2, minimum=1, maximum=5)
|
535 |
+
|
536 |
+
doc_output = gr.Textbox(label="π Documentation with Examples", lines=25, max_lines=30)
|
537 |
+
|
538 |
+
with gr.Row():
|
539 |
+
doc_btn = gr.Button("π Search Documentation", variant="primary", size="lg")
|
540 |
+
doc_clear = gr.Button("ποΈ Clear", variant="secondary")
|
541 |
+
|
542 |
+
gr.Markdown("**Quick Examples:**")
|
543 |
+
with gr.Row():
|
544 |
+
gr.Button("Pipeline", size="sm").click(lambda: "pipeline", outputs=doc_query)
|
545 |
+
gr.Button("Tokenizer", size="sm").click(lambda: "tokenizer", outputs=doc_query)
|
546 |
+
gr.Button("Fine-tuning", size="sm").click(lambda: "fine-tuning", outputs=doc_query)
|
547 |
+
gr.Button("PEFT", size="sm").click(lambda: "peft", outputs=doc_query)
|
548 |
+
|
549 |
+
doc_btn.click(lambda q, m: hf_server.search_documentation(q, int(m) if str(m).isdigit() else 2), inputs=[doc_query, doc_max_results], outputs=doc_output)
|
550 |
+
doc_clear.click(lambda: "", outputs=doc_output)
|
551 |
+
|
552 |
+
# ... (The rest of the UI tabs remain the same) ...
|
553 |
+
with gr.Tab("π€ Model Information", elem_id="models"):
|
554 |
+
gr.Markdown("### Get detailed model information with **usage examples** and **code snippets**")
|
555 |
+
model_name = gr.Textbox(label="π€ Model Name", placeholder="e.g., bert-base-uncased, gpt2, microsoft/DialoGPT-medium, meta-llama/Llama-2-7b-hf")
|
556 |
+
model_output = gr.Textbox(label="π Model Information + Usage Examples", lines=25, max_lines=30)
|
557 |
+
with gr.Row():
|
558 |
+
model_btn = gr.Button("π Get Model Info", variant="primary", size="lg")
|
559 |
+
model_clear = gr.Button("ποΈ Clear", variant="secondary")
|
560 |
+
gr.Markdown("**Popular Models:**")
|
561 |
+
with gr.Row():
|
562 |
+
gr.Button("BERT", size="sm").click(lambda: "bert-base-uncased", outputs=model_name)
|
563 |
+
gr.Button("GPT-2", size="sm").click(lambda: "gpt2", outputs=model_name)
|
564 |
+
gr.Button("T5", size="sm").click(lambda: "t5-small", outputs=model_name)
|
565 |
+
gr.Button("DistilBERT", size="sm").click(lambda: "distilbert-base-uncased", outputs=model_name)
|
566 |
+
model_btn.click(hf_server.get_model_info, inputs=model_name, outputs=model_output)
|
567 |
+
model_clear.click(lambda: "", outputs=model_output)
|
568 |
+
|
569 |
+
with gr.Tab("π Dataset Information", elem_id="datasets"):
|
570 |
+
gr.Markdown("### Get dataset information with **loading examples** and **usage code**")
|
571 |
+
dataset_name = gr.Textbox(label="π Dataset Name", placeholder="e.g., squad, imdb, glue, common_voice, wikitext")
|
572 |
+
dataset_output = gr.Textbox(label="π Dataset Information + Usage Examples", lines=25, max_lines=30)
|
573 |
+
with gr.Row():
|
574 |
+
dataset_btn = gr.Button("π Get Dataset Info", variant="primary", size="lg")
|
575 |
+
dataset_clear = gr.Button("ποΈ Clear", variant="secondary")
|
576 |
+
gr.Markdown("**Popular Datasets:**")
|
577 |
+
with gr.Row():
|
578 |
+
gr.Button("SQuAD", size="sm").click(lambda: "squad", outputs=dataset_name)
|
579 |
+
gr.Button("IMDB", size="sm").click(lambda: "imdb", outputs=dataset_name)
|
580 |
+
gr.Button("GLUE", size="sm").click(lambda: "glue", outputs=dataset_name)
|
581 |
+
gr.Button("Common Voice", size="sm").click(lambda: "common_voice", outputs=dataset_name)
|
582 |
+
dataset_btn.click(hf_server.get_dataset_info, inputs=dataset_name, outputs=dataset_output)
|
583 |
+
dataset_clear.click(lambda: "", outputs=dataset_output)
|
584 |
+
|
585 |
+
with gr.Tab("π Model Search", elem_id="search"):
|
586 |
+
gr.Markdown("### Search models with **quick usage examples** and **quality indicators**")
|
587 |
+
with gr.Row():
|
588 |
+
with gr.Column(scale=3):
|
589 |
+
search_task = gr.Textbox(label="π Task or Keyword", placeholder="e.g., text-classification, image-generation, question-answering, sentiment-analysis")
|
590 |
+
with gr.Column(scale=1):
|
591 |
+
search_limit = gr.Number(label="Max Results", value=5, minimum=1, maximum=10)
|
592 |
+
search_output = gr.Textbox(label="π Models with Usage Examples", lines=25, max_lines=30)
|
593 |
+
with gr.Row():
|
594 |
+
search_btn = gr.Button("π Search Models", variant="primary", size="lg")
|
595 |
+
search_clear = gr.Button("ποΈ Clear", variant="secondary")
|
596 |
+
gr.Markdown("**Popular Tasks:**")
|
597 |
+
with gr.Row():
|
598 |
+
gr.Button("Text Classification", size="sm").click(lambda: "text-classification", outputs=search_task)
|
599 |
+
gr.Button("Question Answering", size="sm").click(lambda: "question-answering", outputs=search_task)
|
600 |
+
gr.Button("Text Generation", size="sm").click(lambda: "text-generation", outputs=search_task)
|
601 |
+
gr.Button("Image Classification", size="sm").click(lambda: "image-classification", outputs=search_task)
|
602 |
+
search_btn.click(lambda task, limit: hf_server.search_models(task, int(limit) if str(limit).isdigit() else 5), inputs=[search_task, search_limit], outputs=search_output)
|
603 |
+
search_clear.click(lambda: "", outputs=search_output)
|
604 |
+
|
605 |
+
with gr.Tab("β‘ Transformers Docs", elem_id="transformers"):
|
606 |
+
gr.Markdown("### Get comprehensive Transformers documentation with **detailed examples** and **parameters**")
|
607 |
+
transformers_topic = gr.Textbox(label="π Topic", placeholder="e.g., pipeline, tokenizer, trainer, model, peft, generation, quantization")
|
608 |
+
transformers_output = gr.Textbox(label="π Comprehensive Documentation", lines=25, max_lines=30)
|
609 |
+
with gr.Row():
|
610 |
+
transformers_btn = gr.Button("π Get Documentation", variant="primary", size="lg")
|
611 |
+
transformers_clear = gr.Button("ποΈ Clear", variant="secondary")
|
612 |
+
gr.Markdown("**Core Topics:**")
|
613 |
+
with gr.Row():
|
614 |
+
gr.Button("Pipeline", size="sm").click(lambda: "pipeline", outputs=transformers_topic)
|
615 |
+
gr.Button("Tokenizer", size="sm").click(lambda: "tokenizer", outputs=transformers_topic)
|
616 |
+
gr.Button("Trainer", size="sm").click(lambda: "trainer", outputs=transformers_topic)
|
617 |
+
gr.Button("Generation", size="sm").click(lambda: "generation", outputs=transformers_topic)
|
618 |
+
transformers_btn.click(hf_server.get_transformers_docs, inputs=transformers_topic, outputs=transformers_output)
|
619 |
+
transformers_clear.click(lambda: "", outputs=transformers_output)
|
620 |
+
|
621 |
+
with gr.Tab("π₯ Trending Models", elem_id="trending"):
|
622 |
+
gr.Markdown("### Discover the most popular and trending models")
|
623 |
+
trending_limit = gr.Number(label="Number of Models", value=10, minimum=1, maximum=20)
|
624 |
+
trending_output = gr.Textbox(label="π₯ Trending Models", lines=20, max_lines=25)
|
625 |
+
with gr.Row():
|
626 |
+
trending_btn = gr.Button("π₯ Get Trending Models", variant="primary", size="lg")
|
627 |
+
trending_clear = gr.Button("ποΈ Clear", variant="secondary")
|
628 |
+
trending_btn.click(lambda limit: hf_server.get_trending_models(int(limit) if str(limit).isdigit() else 10), inputs=trending_limit, outputs=trending_output)
|
629 |
+
trending_clear.click(lambda: "", outputs=trending_output)
|
630 |
+
|
631 |
+
# Footer
|
632 |
+
with gr.Row():
|
633 |
+
gr.HTML("""
|
634 |
+
<div style="text-align: center; padding: 20px; color: #666;">
|
635 |
+
<h3>π‘ Features</h3>
|
636 |
+
<p><strong>β
Real code examples</strong> β’ <strong>β
Step-by-step instructions</strong> β’ <strong>β
Parameter documentation</strong> β’ <strong>β
Quality indicators</strong></p>
|
637 |
+
<p><em>Get practical, actionable information, directly from the source.</em></p>
|
638 |
+
</div>
|
639 |
+
""")
|
640 |
+
|
641 |
+
if __name__ == "__main__":
|
642 |
+
print("π Starting Hugging Face Information Server...")
|
643 |
+
print("π Features: Code examples, usage instructions, comprehensive documentation")
|
644 |
+
demo.launch(
|
645 |
+
server_name="0.0.0.0",
|
646 |
+
server_port=7860,
|
647 |
+
show_error=True,
|
648 |
+
mcp_server=True
|
649 |
+
)
|
requirements.txt
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
gradio
|
2 |
+
requests
|
3 |
+
beautifulsoup4
|
4 |
+
sentence-transformers
|
5 |
+
faiss-cpu
|
6 |
+
numpy
|