## Step-by-Step Tutorial: Building "Bagoodex Web Search" This tutorial provides a structured walkthrough to create "Bagoodex Web Search," an open-source Perplexity-like app built with Python, Gradio, and external APIs. We'll be using the AI/ML API for AI capabilities. ### AI/ML API AI/ML API is a game-changing platform for developers and SaaS entrepreneurs looking to integrate cutting-edge AI capabilities into their products. It offers a single point of access to over 200 state-of-the-art AI models, covering everything from NLP to computer vision. Key Features for Developers: - Extensive Model Library: 200+ pre-trained models for rapid prototyping and deployment. π - Customization Options: Fine-tune models to fit your specific use case. π― - Developer-Friendly Integration: RESTful APIs and SDKs for seamless incorporation into your stack. π οΈ - Serverless Architecture: Focus on coding, not infrastructure management. βοΈ [Get Started for FREE](https://aimlapi.com/?via=ibrohim). [Deep Dive](https://docs.aimlapi.com/) into AI/ML API Documentation (very detailed, canβt agree more). --- ### Step 1: Setting Up the Environment **1.1 Create a Virtual Environment:** ```bash python -m venv .venv source .venv/bin/activate ``` **1.2 Install Dependencies:** Create and populate `[requirements.txt]` with: ```text openai gradio python-dotenv requests pytube ``` Then install them: ```bash pip install -r requirements.txt ``` **1.3 Environment Variables:** Create a `.env` file with your API keys: ```text AIML_API_KEY=your_api_key GOOGLE_MAPS_API_KEY=your_google_maps_api_key ``` > Here's a brief tutorial: [How to get API Key from AI/ML API. Quick step-by-step tutorial with screenshots for better understanding](https://medium.com/@abdibrokhim/how-to-get-api-key-from-ai-ml-api-225a69d0bb25). **1.4 Git Ignore:** Add `.gitignore`: ```text .env .venv __pycache__ *.pyc .DS_Store ``` --- ### Step 2: Project Structure Your final project directory should look like: ```bash Bagoodex_Web_Search/ βββ .env βββ .gitignore βββ requirements.txt βββ app.py βββ bagoodex_client.py βββ helpers.py βββ prompts.py βββ r_types.py ``` --- ### Step 3: Key Files Explained #### 3.1 `[bagoodex_client.py]` - Implements API interactions with `bagoodex` and GPT services. - Import necessary modules: ```py import os import requests from openai import OpenAI from dotenv import load_dotenv from r_types import ChatMessage from prompts import SYSTEM_PROMPT_BASE, SYSTEM_PROMPT_MAP from typing import List ``` - Load environment variables and set up the API client: ```py load_dotenv() API_KEY = os.getenv("AIML_API_KEY") API_URL = "https://api.aimlapi.com" ``` - Define the `BagoodexClient` class: ```py class BagoodexClient: def __init__(self, api_key=API_KEY, api_url=API_URL): self.api_key = api_key self.api_url = api_url self.client = OpenAI(base_url=self.api_url, api_key=self.api_key) ``` - Includes methods: - `complete_chat()`: Handles general chat interactions. ```py def complete_chat(self, query): """ Calls the standard chat completion endpoint using the provided query. Returns the generated followup ID and the text response. """ response = self.client.chat.completions.create( model="bagoodex/bagoodex-search-v1", messages=[ ChatMessage(role="user", content=SYSTEM_PROMPT_BASE), ChatMessage(role="user", content=query) ], ) followup_id = response.id # the unique ID for follow-up searches answer = response.choices[0].message.content return followup_id, answer ``` - `base_qna()`: Handles basic Q&A interactions. Basically we'll use this for follow-up questions. It's pretty reusable. We should pass the different system prompts based on our use case. ```py def base_qna(self, messages: List[ChatMessage], system_prompt=SYSTEM_PROMPT_BASE): response = self.client.chat.completions.create( model="gpt-4o", messages=[ ChatMessage(role="user", content=system_prompt), *messages ], ) return response.choices[0].message.content ``` - Retrieves IDs for fetching follow-up resources (links, images, videos, maps). ```py def get_links(self, followup_id): headers = {"Authorization": f"Bearer {self.api_key}"} params = {"followup_id": followup_id} response = requests.get( f"{self.api_url}/v1/bagoodex/links", headers=headers, params=params ) return response.json() def get_images(self, followup_id): headers = {"Authorization": f"Bearer {self.api_key}"} params = {"followup_id": followup_id} response = requests.get( f"{self.api_url}/v1/bagoodex/images", headers=headers, params=params ) return response.json() def get_videos(self, followup_id): headers = {"Authorization": f"Bearer {self.api_key}"} params = {"followup_id": followup_id} response = requests.get( f"{self.api_url}/v1/bagoodex/videos", headers=headers, params=params ) return response.json() def get_local_map(self, followup_id): headers = {"Authorization": f"Bearer {self.api_key}"} params = {"followup_id": followup_id} response = requests.get( f"{self.api_url}/v1/bagoodex/local-map", headers=headers, params=params ) return response.json() def get_knowledge(self, followup_id): headers = {"Authorization": f"Bearer {self.api_key}"} params = {"followup_id": followup_id} response = requests.get( f"{self.api_url}/v1/bagoodex/knowledge", headers=headers, params=params ) return response.json() ``` Note: First, you must first call the standard chat completion endpoint `complete_chat()` with your query. The chat completion endpoint returns an ID, which must then be passed as the sole input parameter `followup_id` to the `bagoodex/links`, `bagoodex/images`, `bagoodex/videos`, `bagoodex/local-map` and `bagoodex/knowledge` endpoints. #### 3.2 `[app.py]` - Import necessary modules: ```py import os import gradio as gr from bagoodex_client import BagoodexClient from r_types import ChatMessage from prompts import ( SYSTEM_PROMPT_FOLLOWUP, SYSTEM_PROMPT_MAP, SYSTEM_PROMPT_BASE, SYSTEM_PROMPT_KNOWLEDGE_BASE ) from helpers import ( embed_video, format_links, embed_google_map, format_knowledge, format_followup_questions ) ``` - Initialize the `BagoodexClient`: ```py client = BagoodexClient() ``` - Central application logic. ```py # ---------------------------- # Chat & Follow-up Functions # ---------------------------- def chat_function(message, history, followup_state, chat_history_state): """ Process a new user message. Appends the message and response to the conversation, and retrieves follow-up questions. """ # complete_chat returns a new followup id and answer followup_id_new, answer = client.complete_chat(message) # Update conversation history (if history is None, use an empty list) if history is None: history = [] updated_history = history + [ChatMessage({"role": "user", "content": message}), ChatMessage({"role": "assistant", "content": answer})] # Retrieve follow-up questions using the updated conversation followup_questions_raw = client.base_qna( messages=updated_history, system_prompt=SYSTEM_PROMPT_FOLLOWUP ) # Format them using the helper followup_md = format_followup_questions(followup_questions_raw) return answer, followup_id_new, updated_history, followup_md def handle_followup_click(question, followup_state, chat_history_state): """ When a follow-up question is clicked, send it as a new message. """ if not question: return chat_history_state, followup_state, "" # Process the follow-up question via complete_chat followup_id_new, answer = client.complete_chat(question) updated_history = chat_history_state + [ChatMessage({"role": "user", "content": question}), ChatMessage({"role": "assistant", "content": answer})] # Get new follow-up questions followup_questions_raw = client.base_qna( messages=updated_history, system_prompt=SYSTEM_PROMPT_FOLLOWUP ) followup_md = format_followup_questions(followup_questions_raw) return updated_history, followup_id_new, followup_md ``` - Next setup `Local map` and `Knowledge base` functions. ```py def handle_local_map_click(followup_state, chat_history_state): """ On local map click, try to get a local map. If issues occur, fall back to using the SYSTEM_PROMPT_MAP. """ if not followup_state: return chat_history_state try: result = client.get_local_map(followup_state) if result: map_url = result.get('link', '') # Use helper to produce an embedded map iframe html = embed_google_map(map_url) # Fall back: use the base_qna call with SYSTEM_PROMPT_MAP result = client.base_qna( messages=chat_history_state, system_prompt=SYSTEM_PROMPT_MAP ) # Assume result contains a 'link' field html = embed_google_map(result.get('link', '')) new_message = ChatMessage({"role": "assistant", "content": html}) return chat_history_state + [new_message] except Exception: return chat_history_state def handle_knowledge_click(followup_state, chat_history_state): """ On knowledge base click, fetch and format knowledge content. """ if not followup_state: return chat_history_state try: print('trying to get knowledge') result = client.get_knowledge(followup_state) knowledge_md = format_knowledge(result) if knowledge_md == 0000: print('falling back to base_qna') # Fall back: use the base_qna call with SYSTEM_PROMPT_KNOWLEDGE_BASE result = client.base_qna( messages=chat_history_state, system_prompt=SYSTEM_PROMPT_KNOWLEDGE_BASE ) knowledge_md = format_knowledge(result) new_message = ChatMessage({"role": "assistant", "content": knowledge_md}) return chat_history_state + [new_message] except Exception: return chat_history_state ``` - Advanced search functions. ```py # ---------------------------- # Advanced Search Functions # ---------------------------- def perform_image_search(followup_state): if not followup_state: return [] result = client.get_images(followup_state) # For images we simply return a list of original URLs return [item.get("original", "") for item in result] def perform_video_search(followup_state): if not followup_state: return "
No followup ID available.
" result = client.get_videos(followup_state) # Use the helper to produce the embed iframes (supports multiple videos) return embed_video(result) def perform_links_search(followup_state): if not followup_state: return gr.Markdown("No followup ID available.") result = client.get_links(followup_state) return format_links(result) ``` - Uses `Gradio` for UI. Settign up CSS. ```py # ---------------------------- # UI Build # ---------------------------- css = """ #chatbot { height: 100%; } h1, h2, h3, h4, h5, h6 { text-align: center; display: block; } """ ``` - Handles chat, follow-up interactions, and advanced search features (images, videos, links). ```py with gr.Blocks(css=css, fill_height=True) as demo: gr.Markdown(""" ## like perplexity, but with less features. #### built by [@abdibrokhim](https://yaps.gg). """) # State variables to hold followup ID and conversation history, plus follow-up questions text followup_state = gr.State(None) chat_history_state = gr.State([]) # holds conversation history as a list of messages followup_md_state = gr.State("") # holds follow-up questions as Markdown text with gr.Row(): with gr.Column(scale=3): with gr.Row(): btn_local_map = gr.Button("Local Map Search (coming soon...)", variant="secondary", size="sm", interactive=False) btn_knowledge = gr.Button("Knowledge Base (coming soon...)", variant="secondary", size="sm", interactive=False) # The ChatInterface now uses additional outputs for both followup_state and conversation history, # plus follow-up questions Markdown. chat = gr.ChatInterface( fn=chat_function, type="messages", additional_inputs=[followup_state, chat_history_state], additional_outputs=[followup_state, chat_history_state, followup_md_state], ) # Button callbacks to append local map and knowledge base results to chat btn_local_map.click( fn=handle_local_map_click, inputs=[followup_state, chat_history_state], outputs=chat.chatbot ) btn_knowledge.click( fn=handle_knowledge_click, inputs=[followup_state, chat_history_state], outputs=chat.chatbot ) # Radio-based follow-up questions followup_radio = gr.Radio( choices=[], label="Follow-up Questions (select one and click 'Send Follow-up')" ) btn_send_followup = gr.Button("Send Follow-up") # When the user clicks "Send Follow-up", the selected question is passed # to handle_followup_click btn_send_followup.click( fn=handle_followup_click, inputs=[followup_radio, followup_state, chat_history_state], outputs=[chat.chatbot, followup_state, followup_md_state] ) # Update the radio choices when followup_md_state changes def update_followup_radio(md_text): """ Parse Markdown lines to extract questions starting with '- '. """ lines = md_text.splitlines() questions = [] for line in lines: if line.startswith("- "): questions.append(line[2:]) return gr.update(choices=questions, value=None) followup_md_state.change( fn=update_followup_radio, inputs=[followup_md_state], outputs=[followup_radio] ) with gr.Column(scale=1): gr.Markdown("### Advanced Search Options") with gr.Column(variant="panel"): btn_images = gr.Button("Search Images") btn_videos = gr.Button("Search Videos") btn_links = gr.Button("Search Links") gallery_output = gr.Gallery(label="Image Results", columns=2) video_output = gr.HTML(label="Video Results") # HTML for embedded video iframes links_output = gr.Markdown(label="Links Results") btn_images.click( fn=perform_image_search, inputs=[followup_state], outputs=[gallery_output] ) btn_videos.click( fn=perform_video_search, inputs=[followup_state], outputs=[video_output] ) btn_links.click( fn=perform_links_search, inputs=[followup_state], outputs=[links_output] ) demo.launch() ``` Questions you may consider to ask: ```text how to make slingshot? who created light (e.g., electricity) Tesla or Edison in quick short? ``` #### 3.2 `[helpers.py]` - Utility functions for formatting results: - import the necessary modules: ```py from dotenv import load_dotenv import os import gradio as gr import urllib.parse import re from pytube import YouTube from typing import List, Optional, Dict from r_types import ( SearchVideosResponse, SearchImagesResponse, SearchLinksResponse, LocalMapResponse, KnowledgeBaseResponse ) import json ``` - `embed_video()` for YouTube videos ```py def get_video_id(url: str) -> Optional[str]: """ Safely retrieve the YouTube video_id from a given URL using pytube. Returns None if the URL is invalid or an error occurs. """ if not url: return None try: yt = YouTube(url) return yt.video_id except Exception: # If the URL is invalid or pytube fails, return None return None def embed_video(videos: List[SearchVideosResponse]) -> str: """ Given a list of video data (with 'link' and 'title'), returns an HTML string of embedded YouTube iframes. """ if not videos: return "No videos found.
" # Collect each iframe snippet iframes = [] for video in videos: url = video.get("link", "") video_id = get_video_id(url) if not video_id: # Skip invalid or non-parsable links continue title = video.get("title", "").replace('"', '\\"') # Escape quotes iframe = f""" """ iframes.append(iframe) # If no valid videos after processing, return a fallback message if not iframes: return "No valid YouTube videos found.
" # Join all iframes into one HTML string return "\n".join(iframes) ``` - `format_links()` for links ```py def format_links(links) -> str: """ Convert a list of {'title': str, 'link': str} objects into a bulleted Markdown string with clickable links. """ if not links: return "No links found." links_md = "**Links:**\n" for url in links: title = url.rstrip('/').split('/')[-1] links_md += f"- [{title}]({url})\n" return links_md ``` - `embed_google_map()` for maps ```py def embed_google_map(map_url: str) -> str: """ Extracts a textual location from the given Google Maps URL and returns an embedded Google Map iframe for that location. Assumes you have a valid API key in place of 'YOUR_API_KEY'. """ load_dotenv() GOOGLE_MAPS_API_KEY = os.getenv("GOOGLE_MAPS_API_KEY") if not map_url: return "Invalid Google Maps URL.
" # Attempt to extract "San+Francisco,+CA" from the URL match = re.search(r"/maps/place/([^/]+)", map_url) if not match: return "Invalid Google Maps URL. Could not extract location." location_text = match.group(1) # Remove query params or additional slashes from the captured group location_text = re.split(r"[/?]", location_text)[0] # URL-encode location to avoid issues with special characters encoded_location = urllib.parse.quote(location_text, safe="") embed_html = f""" """ return embed_html ``` - `format_knowledge()` for knowledge base info ```py def format_knowledge(raw_result: str) -> str: """ Given a dictionary of knowledge data (e.g., about a person), produce a Markdown string summarizing that info. """ if not raw_result: return 0000 # Clean up the raw JSON string clean_json_str = cleanup_raw_json(raw_result) print('Knowledge Data: ', clean_json_str) try: # Parse the cleaned JSON string result = json.loads(clean_json_str) title = result.get("title", "...") type_ = result.get("type", "...") born = result.get("born", "...") died = result.get("died", "...") content = f""" **{title}** Type: {type_} Born: {born} Died: {died} """ return content except json.JSONDecodeError: return "Error: Failed to parse knowledge data." ``` - Set up `format_followup_questions()` function that formats follow-up questions ```py def format_followup_questions(raw_questions: str) -> str: """ Extracts and formats follow-up questions from a raw JSON-like string. The input string may contain triple backticks (```json ... ```) which need to be removed before parsing. Expected input format: \```json { "followup_question": [ "What materials are needed to make a slingshot?", "How to make a slingshot more powerful?" ] } \``` Returns a Markdown-formatted string with the follow-up questions. """ if not raw_questions: return "No follow-up questions available." # Clean up the raw JSON string clean_json_str = cleanup_raw_json(raw_questions) try: # Parse the cleaned JSON string questions_dict = json.loads(clean_json_str) # Ensure the expected key exists followup_list = questions_dict.get("followup_question", []) if not isinstance(followup_list, list) or not followup_list: return "No follow-up questions available." # Format the questions into Markdown questions_md = "### Follow-up Questions\n\n" for question in followup_list: questions_md += f"- {question}\n" return questions_md except json.JSONDecodeError: return "Error: Failed to parse follow-up questions." ``` - `cleanup_row_json()` function to clean up raw JSON strings ```py def cleanup_raw_json(raw_json: str) -> str: """ Remove triple backticks and 'json' from the beginning and end of a raw JSON string. """ return re.sub(r"```json|```", "", raw_json).strip() ``` #### 3.3 `[prompts.py]` - Contains system prompts for various tasks: For example: `SYSTEM_PROMPT_BASE` - For general chat interactions. ```py SYSTEM_PROMPT_BASE = """ ######SYSTEM INIATED###### You will be given a conversation chat (e.g., text/ paragraph). Answer the given conversation chat with a relevant response. ######NOTE###### Be nice and polite in your responses! ######SYSTEM SHUTDOWN###### """ ``` `SYSTEM_PROMPT_MAP` - For providing places based on the given content. ```py SYSTEM_PROMPT_MAP = """ ######SYSTEM INIATED###### You will be given a content from conversation chat (e.g., text/ paragraph). Your task is to analyze the given content and provide different types of places as close as possible to the given content. For exampl: If the given content (conversation chat) was about "How to make a slingshot", you can provide places like "Hardware store", "Woodworking shop", "Outdoor sports store", etc. Make sure the places you provide are relevant to the given content. And as much as close to the given content, the better. Your final output should be a list of places. Here's JSON format example: \```json { "places": ["Hardware store", "Woodworking shop", "Outdoor sports store"] } \``` ######NOTE###### Make sure to return only JSON data! Nothing else! ######SYSTEM SHUTDOWN###### """ ``` `SYSTEM_PROMPT_FOLLOWUP` - For generating follow-up questions based on the given content. ```py SYSTEM_PROMPT_FOLLOWUP = """ ######SYSTEM INIATED###### You will be given a content from conversation chat (e.g., text/ paragraph). Your task is to analyze the given content and provide a follow-up question based on the given content. For example: If the given content (conversation chat) was about "How to make a slingshot", you can provide a follow-up question like "What materials are needed to make a slingshot?". Make sure the follow-up question you provide is relevant to the given content. Your final output should be a List of follow-up question. Here's JSON format example: \```json { "followup_question": ["What materials are needed to make a slingshot?", "How to make a slingshot more powerful?"] } \``` ######NOTE###### Make sure to return only JSON data! Nothing else! ######SYSTEM SHUTDOWN###### """ ``` `SYSTEM_PROMPT_KNOWLEDGE_BASE` - For generating knowledge base responses based on the given content. ```py SYSTEM_PROMPT_KNOWLEDGE_BASE = """ ######SYSTEM INIATED###### You will be given a content from conversation chat (e.g., text/ paragraph). Your task is to analyze the given content and provide a knowledge base response based on the given content. For example: If the given content (conversation chat) was about "How to make a slingshot". You should analyze it and find the exact creator or founder or inventor of the slingshot. Let's assume you just found out that the slingshot was invented by "Charles Goodyear". Then return `question` in a JSON format. (e.g., {"question": "Who is Charles Goodyear?"}). Your final output should be a JSON data with the knowledge base response. Here's JSON format example: \```json { "question": "Who is Charles Goodyear?", } \``` ######NOTE###### Make sure to return only JSON data! Nothing else! ######SYSTEM SHUTDOWN###### """ ``` #### 3.3 `[r_types.py]` - Placeholder for custom types and schemas. ```py from typing import TypedDict # [ChatMessage]: #