Spaces:

CyranoB
/

search_agent

Running

CyranoB commited on Feb 19

Commit

7406911

1 Parent(s): bb1d601

Enhance model support, improve documentation, and refactor core components

- Added support for new language models including DeepSeek, Qwen, and expanded model options
- Updated README with comprehensive project structure and usage details
- Refactored core modules (copywriter.py, models.py, nlp_rag.py) with improved type hints and docstrings
- Updated default models and embedding configurations
- Enhanced search agent with more flexible model selection and verbose output options
- Improved requirements.txt with latest library dependencies

Files changed (8) hide show

README.md +47 -15
copywriter.py +4 -9
models.py +29 -11
nlp_rag.py +41 -39
requirements.txt +4 -1
search_agent.py +42 -44
web_crawler.py +67 -7
web_rag.py +53 -20

README.md CHANGED Viewed

@@ -37,7 +37,7 @@ To run the script, users need to provide their API keys for the desired language
 ## Features
-- Supports multiple language model providers (Bedrock, OpenAI, Groq, Cohere, and Ollama)
 - Optimizes search queries using a language model
 - Fetches web pages and extracts main content (HTML and PDF)
 - Vectorizes the content for efficient retrieval
@@ -55,8 +55,9 @@ To run the script, users need to provide their API keys for the desired language
 3. Set up API keys:
-   - You will need API keys for the Brave Search API and LLM API.
-   - Add your API keys to the `.env` file. Use `dotenv.sample` to create this file.
 ## Usage
@@ -68,20 +69,29 @@ python search_agent.py [OPTIONS] SEARCH_QUERY
 ### Options:
-- `-h`, `--help`: Show this help message and exit.
-- `--version`: Show the program's version number and exit.
-- `-c`, `--copywrite`: First produce a draft, review it, and rewrite for a final text.
-- `-d DOMAIN`, `--domain=DOMAIN`: Limit search to a specific domain.
-- `-t TEMP`, `--temperature=TEMP`: Set the temperature of the LLM [default: 0.0].
-- `-m MODEL`, `--model=MODEL`: Use a specific model [default: openai:gpt-4o-mini].
-- `-e MODEL`, `--embedding_model=MODEL`: Use a specific embedding model [default: same provider as model].
-- `-n NUM`, `--max_pages=NUM`: Max number of pages to retrieve [default: 10].
-- `-x NUM`, `--max_extracts=NUM`: Max number of page extracts to consider [default: 7].
-- `-s`, `--use_selenium`: Use selenium to fetch content from the web [default: False].
-- `-o TEXT`, `--output=TEXT`: Output format (choices: text, markdown) [default: markdown].
 ### Examples
 ```bash
 python search_agent.py -m openai:gpt-4o-mini "Write a linked post about the current state of M&A for startups. Write in the style of Russ from Silicon Valley TV show."
 ```
@@ -98,4 +108,26 @@ python search_agent.py -m openai:gpt-4o-mini "Write a linked post about the curr
 This project is licensed under the Apache License Version 2.0. See the `LICENSE` file for details.
-Let me know if you have any other questions! The key components are using a web search API to find relevant information, extracting the key snippets from the search results, passing that as context to a large language model, and having the LLM generate a natural language answer based on the web search context.

 ## Features
+- Supports multiple language model providers (HuggingFace, Bedrock, OpenAI, Groq, Cohere, and Ollama)
 - Optimizes search queries using a language model
 - Fetches web pages and extracts main content (HTML and PDF)
 - Vectorizes the content for efficient retrieval
 3. Set up API keys:
+   - create a `.env` file and add your API keys. Use `dotenv.sample` to create this file.
+   - Get an API key from the following sources: https://brave.com/search/api/
+   - Optionally you can add API keys from other LLM providers.
 ## Usage
 ### Options:
+   -h --help                           Show this screen.
+   --version                           Show version.
+   -c --copywrite                      First produce a draft, review it and rewrite for a final text
+   -d domain --domain=domain           Limit search to a specific domain
+   -t temp --temperature=temp          Set the temperature of the LLM [default: 0.0]
+   -m model --model=model              Use a specific model [default: hf:Qwen/Qwen2.5-72B-Instruct]
+   -e model --embedding_model=model    Use an embedding model
+   -n num --max_pages=num              Max number of pages to retrieve [default: 10]
+   -x num --max_extracts=num           Max number of page extract to consider [default: 7]
+   -b --use_browser                    Use browser to fetch content from the web [default: False]
+   -o text --output=text               Output format (choices: text, markdown) [default: markdown]
+   -v --verbose                        Print verbose output [default: False]
+The model can be a language model provider and a model name separated by a colon. e.g. `openai:gpt-4o-mini`
+If a embedding model is not specified, spaCy will be used for semantic search.
 ### Examples
+```bash
+python search_agent.py 'What is the radioactive anomaly in the Pacific Ocean?'
+```
 ```bash
 python search_agent.py -m openai:gpt-4o-mini "Write a linked post about the current state of M&A for startups. Write in the style of Russ from Silicon Valley TV show."
 ```
 This project is licensed under the Apache License Version 2.0. See the `LICENSE` file for details.
+Let me know if you have any other questions! The key components are using a web search API to find relevant information, extracting the key snippets from the search results, passing that as context to a large language model, and having the LLM generate a natural language answer based on the web search context.
+## Project Structure
+The project consists of several key components:
+- `search_agent.py`: The main script that handles the core search agent functionality
+- `search_agent_ui.py`: Streamlit-based user interface for the search agent
+- `web_crawler.py`: Handles web content fetching and processing
+- `web_rag.py`: Implements the Retrieval-Augmented Generation (RAG) functionality
+- `nlp_rag.py`: Natural language processing utilities for RAG
+- `models.py`: Contains model definitions and configurations
+- `copywriter.py`: Implements content rewriting and optimization features
+## Additional Tools
+The project includes several development and configuration files:
+- `requirements.txt`: Lists all Python dependencies
+- `.env`: Configuration file for API keys and settings (use `dotenv.sample` as a template)
+- `.gitignore`: Specifies which files Git should ignore
+- `LICENSE`: Apache License Version 2.0
+- `.devcontainer/`: Contains development container configuration for consistent development environments

copywriter.py CHANGED Viewed

@@ -1,10 +1,4 @@
 from langchain.schema import SystemMessage, HumanMessage
-from langchain.prompts.chat import (
-    HumanMessagePromptTemplate,
-    SystemMessagePromptTemplate,
-    ChatPromptTemplate
-)
-from langchain.prompts.prompt import PromptTemplate
 from langsmith import traceable
@@ -21,7 +15,6 @@ def get_comments_prompt(query, draft):
             5. Ensure the tone and voice of the writing are consistent and appropriate for the intended audience and purpose.
             6. Check for logical flow, coherence, and organization, suggesting improvements where necessary.
             7. Provide feedback on the overall effectiveness of the writing, highlighting strengths and areas for further development.
             Your suggestions should be constructive, insightful, and designed to help the user elevate the quality of their writing.
             You never generate the corrected text by itself. *Only* give the comment.
         """
@@ -35,12 +28,14 @@ def get_comments_prompt(query, draft):
     )
     return [system_message, human_message]
 @traceable(run_type="llm", name="generate_comments")
 def generate_comments(chat_llm, query, draft, callbacks=[]):
     messages = get_comments_prompt(query, draft)
     response = chat_llm.invoke(messages, config={"callbacks": callbacks})
     return response.content
 def get_final_text_prompt(query, draft, comments):
     system_message = SystemMessage(
         content="""
@@ -73,7 +68,7 @@ def get_final_text_prompt(query, draft, comments):
 def generate_final_text(chat_llm, query, draft, comments, callbacks=[]):
     messages = get_final_text_prompt(query, draft, comments)
     response = chat_llm.invoke(messages, config={"callbacks": callbacks})
-    return response.content
 def get_compare_texts_prompts(query, draft_text, final_text):
@@ -109,4 +104,4 @@ def get_compare_texts_prompts(query, draft_text, final_text):
 def compare_text(chat_llm, query, draft, final, callbacks=[]):
     messages = get_compare_texts_prompts(query, draft_text=draft, final_text=final)
     response = chat_llm.invoke(messages, config={"callbacks": callbacks})
-    return response.content

 from langchain.schema import SystemMessage, HumanMessage
 from langsmith import traceable
             5. Ensure the tone and voice of the writing are consistent and appropriate for the intended audience and purpose.
             6. Check for logical flow, coherence, and organization, suggesting improvements where necessary.
             7. Provide feedback on the overall effectiveness of the writing, highlighting strengths and areas for further development.
             Your suggestions should be constructive, insightful, and designed to help the user elevate the quality of their writing.
             You never generate the corrected text by itself. *Only* give the comment.
         """
     )
     return [system_message, human_message]
 @traceable(run_type="llm", name="generate_comments")
 def generate_comments(chat_llm, query, draft, callbacks=[]):
     messages = get_comments_prompt(query, draft)
     response = chat_llm.invoke(messages, config={"callbacks": callbacks})
     return response.content
 def get_final_text_prompt(query, draft, comments):
     system_message = SystemMessage(
         content="""
 def generate_final_text(chat_llm, query, draft, comments, callbacks=[]):
     messages = get_final_text_prompt(query, draft, comments)
     response = chat_llm.invoke(messages, config={"callbacks": callbacks})
+    return response.content
 def get_compare_texts_prompts(query, draft_text, final_text):
 def compare_text(chat_llm, query, draft, final, callbacks=[]):
     messages = get_compare_texts_prompts(query, draft_text=draft, final_text=final)
     response = chat_llm.invoke(messages, config={"callbacks": callbacks})
+    return response.content

models.py CHANGED Viewed

@@ -30,6 +30,10 @@ from langchain.chat_models.base import BaseChatModel
 from langchain.embeddings.base import Embeddings
 def split_provider_model(provider_model: str) -> Tuple[str, Optional[str]]:
     parts = provider_model.split(":", 1)
     provider = parts[0]
     if len(parts) > 1:
@@ -48,7 +52,7 @@ def get_model(provider_model: str, temperature: float = 0.7) -> BaseChatModel:
         match provider.lower():
             case 'anthropic':
                 if model is None:
-                    model = "claude-3-sonnet-20240229"
                 chat_llm = ChatAnthropic(model=model, temperature=temperature)
             case 'bedrock':
                 if model is None:
@@ -58,22 +62,32 @@ def get_model(provider_model: str, temperature: float = 0.7) -> BaseChatModel:
                 if model is None:
                     model = 'command-r-plus'
                 chat_llm = ChatCohere(model=model, temperature=temperature)
             case 'fireworks':
                 if model is None:
-                    model = 'accounts/fireworks/models/llama-v3p1-8b-instruct'
                 chat_llm = ChatFireworks(model_name=model, temperature=temperature, max_tokens=120000)
             case 'googlegenerativeai':
                 if model is None:
-                    model = "gemini-1.5-flash"
                 chat_llm = ChatGoogleGenerativeAI(model=model, temperature=temperature,
                                                   max_tokens=None, timeout=None, max_retries=2,)
             case 'groq':
                 if model is None:
-                    model = 'llama-3.1-8b-instant'
                 chat_llm = ChatGroq(model_name=model, temperature=temperature)
             case 'huggingface' | 'hf':
                 if model is None:
-                    model = 'mistralai/Mistral-Nemo-Instruct-2407'
                 llm = HuggingFaceEndpoint(
                     repo_id=model,
                     max_length=8192,
@@ -91,23 +105,23 @@ def get_model(provider_model: str, temperature: float = 0.7) -> BaseChatModel:
                 chat_llm = ChatOpenAI(model=model, temperature=temperature)
             case 'openrouter':
                 if model is None:
-                    model = "google/gemini-flash-1.5-exp"
                 chat_llm = ChatOpenAI(model=model, temperature=temperature, base_url="https://openrouter.ai/api/v1", api_key=os.getenv("OPENROUTER_API_KEY"))
             case 'mistralai' | 'mistral':
                 if model is None:
-                    model = "open-mistral-nemo"
                 chat_llm = ChatMistralAI(model=model, temperature=temperature)
             case 'perplexity':
                 if model is None:
-                    model = 'llama-3.1-sonar-small-128k-online'
                 chat_llm = ChatPerplexity(model=model, temperature=temperature)
             case 'together':
                 if model is None:
-                    model = 'meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo'
                 chat_llm = ChatTogether(model=model, temperature=temperature)
             case 'xai':
                 if model is None:
-                    model = 'grok-beta'
                 chat_llm = ChatOpenAI(model=model,api_key=os.getenv("XAI_API_KEY"), base_url="https://api.x.ai/v1", temperature=temperature)
             case _:
                 raise ValueError(f"Unknown LLM provider {provider}")
@@ -118,6 +132,10 @@ def get_model(provider_model: str, temperature: float = 0.7) -> BaseChatModel:
 def get_embedding_model(provider_model: str) -> Embeddings:
     provider, model = split_provider_model(provider_model)
     match provider.lower():
         case 'bedrock':
@@ -126,7 +144,7 @@ def get_embedding_model(provider_model: str) -> Embeddings:
             embedding_model = BedrockEmbeddings(model_id=model)
         case 'cohere':
             if model is None:
-                model = "embed-multilingual-v3"
             embedding_model = CohereEmbeddings(model=model)
         case 'fireworks':
             if model is None:

 from langchain.embeddings.base import Embeddings
 def split_provider_model(provider_model: str) -> Tuple[str, Optional[str]]:
+    """
+    Split the provider and model name from a string.
+    returns Tuple[str, Optional[str]]
+    """
     parts = provider_model.split(":", 1)
     provider = parts[0]
     if len(parts) > 1:
         match provider.lower():
             case 'anthropic':
                 if model is None:
+                    model = "claude-3-5-haiku-20241022"
                 chat_llm = ChatAnthropic(model=model, temperature=temperature)
             case 'bedrock':
                 if model is None:
                 if model is None:
                     model = 'command-r-plus'
                 chat_llm = ChatCohere(model=model, temperature=temperature)
+            case 'deepseek':
+                if model is None:
+                    model='deepseek-chat'
+                chat_llm = ChatOpenAI(
+                    model=model,
+                    openai_api_key=os.getenv("DEEPSEEK_API_KEY"),
+                    openai_api_base='https://api.deepseek.com',
+                    max_tokens=8192
+                )
             case 'fireworks':
                 if model is None:
+                    model = 'accounts/fireworks/models/llama-v3p3-70b-instruct'
                 chat_llm = ChatFireworks(model_name=model, temperature=temperature, max_tokens=120000)
             case 'googlegenerativeai':
                 if model is None:
+                    model = "gemini-2.0-flash-exp"
                 chat_llm = ChatGoogleGenerativeAI(model=model, temperature=temperature,
                                                   max_tokens=None, timeout=None, max_retries=2,)
             case 'groq':
                 if model is None:
+                    model = 'qwen-2.5-32b'
                 chat_llm = ChatGroq(model_name=model, temperature=temperature)
             case 'huggingface' | 'hf':
                 if model is None:
+                    model = 'Qwen/Qwen2.5-72B-Instruct'
                 llm = HuggingFaceEndpoint(
                     repo_id=model,
                     max_length=8192,
                 chat_llm = ChatOpenAI(model=model, temperature=temperature)
             case 'openrouter':
                 if model is None:
+                    model = "cognitivecomputations/dolphin3.0-mistral-24b:free"
                 chat_llm = ChatOpenAI(model=model, temperature=temperature, base_url="https://openrouter.ai/api/v1", api_key=os.getenv("OPENROUTER_API_KEY"))
             case 'mistralai' | 'mistral':
                 if model is None:
+                    model = "mistral-small-latest"
                 chat_llm = ChatMistralAI(model=model, temperature=temperature)
             case 'perplexity':
                 if model is None:
+                    model = 'sonar'
                 chat_llm = ChatPerplexity(model=model, temperature=temperature)
             case 'together':
                 if model is None:
+                    model = 'meta-llama/Llama-3.3-70B-Instruct-Turbo-Free'
                 chat_llm = ChatTogether(model=model, temperature=temperature)
             case 'xai':
                 if model is None:
+                    model = 'grok-2-1212'
                 chat_llm = ChatOpenAI(model=model,api_key=os.getenv("XAI_API_KEY"), base_url="https://api.x.ai/v1", temperature=temperature)
             case _:
                 raise ValueError(f"Unknown LLM provider {provider}")
 def get_embedding_model(provider_model: str) -> Embeddings:
+    """
+    Get an embedding model from a provider and model name.
+    returns Embeddings
+    """
     provider, model = split_provider_model(provider_model)
     match provider.lower():
         case 'bedrock':
             embedding_model = BedrockEmbeddings(model_id=model)
         case 'cohere':
             if model is None:
+                model = "embed-multilingual-v3.0"
             embedding_model = CohereEmbeddings(model=model)
         case 'fireworks':
             if model is None:

nlp_rag.py CHANGED Viewed

@@ -6,6 +6,12 @@ from concurrent.futures import ThreadPoolExecutor, as_completed
 import numpy as np
 def get_nlp_model():
     if not spacy.util.is_package("en_core_web_md"):
         print("Downloading en_core_web_md model...")
         spacy.cli.download("en_core_web_md")
@@ -15,6 +21,17 @@ def get_nlp_model():
 def recursive_split_documents(contents, max_chunk_size=1000, overlap=100):
     from langchain_core.documents.base import Document
     from langchain.text_splitter import RecursiveCharacterTextSplitter
@@ -51,6 +68,19 @@ def recursive_split_documents(contents, max_chunk_size=1000, overlap=100):
 def semantic_search(query, chunks, nlp, similarity_threshold=0.5, top_n=10):
     # Precompute query vector and its norm
     query_vector = nlp(query).vector
     query_norm = np.linalg.norm(query_vector) + 1e-8  # Add epsilon to avoid division by zero
@@ -84,47 +114,19 @@ def semantic_search(query, chunks, nlp, similarity_threshold=0.5, top_n=10):
     return relevant_chunks[:top_n]
-# Perform semantic search using spaCy
-def semantic_search(query, chunks, nlp, similarity_threshold=0.5, top_n=10):
-    import numpy as np
-    from concurrent.futures import ThreadPoolExecutor
-    # Precompute query vector and its norm with epsilon to prevent division by zero
-    with nlp.disable_pipes(*[pipe for pipe in nlp.pipe_names if pipe != 'tok2vec']):
-        query_vector = nlp(query).vector
-    query_norm = np.linalg.norm(query_vector) + 1e-8  # Add epsilon
-    # Prepare texts from chunks
-    texts = [chunk['text'] for chunk in chunks]
-    # Function to process each text and compute its vector
-    def compute_vector(text):
-        with nlp.disable_pipes(*[pipe for pipe in nlp.pipe_names if pipe != 'tok2vec']):
-            doc = nlp(text)
-            vector = doc.vector
-        return vector
-    # Process texts in parallel using ThreadPoolExecutor
-    with ThreadPoolExecutor() as executor:
-        chunk_vectors = list(executor.map(compute_vector, texts))
-    chunk_vectors = np.array(chunk_vectors)
-    chunk_norms = np.linalg.norm(chunk_vectors, axis=1) + 1e-8  # Add epsilon
-    # Compute similarities using vectorized operations
-    similarities = np.dot(chunk_vectors, query_vector) / (chunk_norms * query_norm)
-    # Filter and sort results
-    relevant_chunks = [
-        (chunk, sim) for chunk, sim in zip(chunks, similarities) if sim > similarity_threshold
-    ]
-    relevant_chunks.sort(key=lambda x: x[1], reverse=True)
-    return relevant_chunks[:top_n]
 @traceable(run_type="llm", name="nlp_rag")
 def query_rag(chat_llm, query, relevant_results):
     import web_rag as wr
     formatted_chunks = ""

 import numpy as np
 def get_nlp_model():
+    """
+    Load and return the spaCy NLP model. Downloads the model if not already installed.
+    Returns:
+        nlp: The loaded spaCy NLP model.
+    """
     if not spacy.util.is_package("en_core_web_md"):
         print("Downloading en_core_web_md model...")
         spacy.cli.download("en_core_web_md")
 def recursive_split_documents(contents, max_chunk_size=1000, overlap=100):
+    """
+    Split documents into smaller chunks using a recursive character text splitter.
+    Args:
+        contents (list): List of content dictionaries with 'page_content', 'title', and 'link'.
+        max_chunk_size (int): Maximum size of each chunk.
+        overlap (int): Overlap between chunks.
+    Returns:
+        list: List of chunks with text and metadata.
+    """
     from langchain_core.documents.base import Document
     from langchain.text_splitter import RecursiveCharacterTextSplitter
 def semantic_search(query, chunks, nlp, similarity_threshold=0.5, top_n=10):
+    """
+    Perform semantic search to find relevant chunks based on similarity to the query.
+    Args:
+        query (str): The search query.
+        chunks (list): List of text chunks with vectors.
+        nlp: The spaCy NLP model.
+        similarity_threshold (float): Minimum similarity score to consider a chunk relevant.
+        top_n (int): Number of top relevant chunks to return.
+    Returns:
+        list: List of relevant chunks and their similarity scores.
+    """
     # Precompute query vector and its norm
     query_vector = nlp(query).vector
     query_norm = np.linalg.norm(query_vector) + 1e-8  # Add epsilon to avoid division by zero
     return relevant_chunks[:top_n]
 @traceable(run_type="llm", name="nlp_rag")
 def query_rag(chat_llm, query, relevant_results):
+    """
+    Generate a response using retrieval-augmented generation (RAG) based on relevant results.
+    Args:
+        chat_llm: The chat language model to use.
+        query (str): The user's query.
+        relevant_results (list): List of relevant chunks and their similarity scores.
+    Returns:
+        str: The generated response.
+    """
     import web_rag as wr
     formatted_chunks = ""

requirements.txt CHANGED Viewed

@@ -3,18 +3,21 @@ boto3 >= 1.34.131, < 1.35.0
 bs4
 chromedriver-py >= 128.0.6613.137
 cohere >= 5.9.2
-docopt >= 0.6.2
 faiss-cpu >= 1.8.0
 google-api-python-client >= 2.145.0
 pdfplumber >= 0.11.4
 python-dotenv >= 1.0.1
 langchain >= 0.3.0
 langchain-aws >= 0.2.0
 langchain-fireworks
 langchain_core >= 0.3.0
 langchain-cohere
 langchain_community
 langchain_experimental
 langchain_openai
 langchain-ollama
 langchain_groq

 bs4
 chromedriver-py >= 128.0.6613.137
 cohere >= 5.9.2
+docopt
 faiss-cpu >= 1.8.0
 google-api-python-client >= 2.145.0
 pdfplumber >= 0.11.4
 python-dotenv >= 1.0.1
 langchain >= 0.3.0
+langchain_anthropic
 langchain-aws >= 0.2.0
 langchain-fireworks
 langchain_core >= 0.3.0
 langchain-cohere
 langchain_community
 langchain_experimental
+langchain_huggingface
+langchain_mistralai
 langchain_openai
 langchain-ollama
 langchain_groq

search_agent.py CHANGED Viewed

@@ -1,7 +1,8 @@
-"""search_agent.py
 Usage:
-    search_agent.py
         [--domain=domain]
         [--provider=provider]
         [--model=model]
@@ -22,7 +23,7 @@ Options:
     -c --copywrite                      First produce a draft, review it and rewrite for a final text
     -d domain --domain=domain           Limit search to a specific domain
     -t temp --temperature=temp          Set the temperature of the LLM [default: 0.0]
-    -m model --model=model              Use a specific model [default: openai/gpt-4o-mini]
     -e model --embedding_model=model    Use an embedding model
     -n num --max_pages=num              Max number of pages to retrieve [default: 10]
     -x num --max_extracts=num           Max number of page extract to consider [default: 7]
@@ -35,7 +36,6 @@ Options:
 import os
 from docopt import docopt
-#from schema import Schema, Use, SchemaError
 import dotenv
 from langchain.callbacks import LangChainTracer
@@ -51,10 +51,13 @@ import copywriter as cw
 import models as md
 import nlp_rag as nr
 console = Console()
 dotenv.load_dotenv()
 def get_selenium_driver():
     from selenium import webdriver
     from selenium.webdriver.chrome.options import Options
     from selenium.common.exceptions import WebDriverException
@@ -76,72 +79,76 @@ def get_selenium_driver():
         print(f"Error creating Selenium WebDriver: {e}")
         return None
 callbacks = []
 if os.getenv("LANGCHAIN_API_KEY"):
     callbacks.append(
         LangChainTracer(client=Client())
     )
 @traceable(run_type="tool", name="search_agent")
 def main(arguments):
     verbose = arguments["--verbose"]
     copywrite_mode = arguments["--copywrite"]
     model = arguments["--model"]
     embedding_model = arguments["--embedding_model"]
     temperature = float(arguments["--temperature"])
-    domain=arguments["--domain"]
-    max_pages=int(arguments["--max_pages"])
-    max_extract=int(arguments["--max_extracts"])
-    output=arguments["--output"]
-    use_selenium=arguments["--use_browser"]
     query = arguments["SEARCH_QUERY"]
     chat = md.get_model(model, temperature)
     if embedding_model is None:
         use_nlp = True
         nlp = nr.get_nlp_model()
     else:
         embedding_model = md.get_embedding_model(embedding_model)
-        use_nlp = False
     if verbose:
         model_name = getattr(chat, 'model_name', None) or getattr(chat, 'model', None) or getattr(chat, 'model_id', None) or str(chat)
-        console.log(f"Using embedding model: {embedding_model_name}")
         if not use_nlp:
             embedding_model_name = getattr(embedding_model, 'model_name', None) or getattr(embedding_model, 'model', None) or getattr(embedding_model, 'model_id', None) or str(embedding_model)
-            console.log(f"Using model: {embedding_model_name}")
     with console.status(f"[bold green]Optimizing query for search: {query}"):
         optimized_search_query = wr.optimize_search_query(chat, query)
         if len(optimized_search_query) < 3:
             optimized_search_query = query
     console.log(f"Optimized search query: [bold blue]{optimized_search_query}")
     with console.status(
             f"[bold green]Searching sources using the optimized query: {optimized_search_query}"
         ):
         sources = wc.get_sources(optimized_search_query, max_pages=max_pages, domain=domain)
     console.log(f"Found {len(sources)} sources {'on ' + domain if domain else ''}")
     with console.status(
         f"[bold green]Fetching content for {len(sources)} sources", spinner="growVertical"
     ):
         contents = wc.get_links_contents(sources, get_selenium_driver, use_selenium=use_selenium)
     console.log(f"Managed to extract content from {len(contents)} sources")
     if use_nlp:
-        with console.status(f"[bold green]Splitting {len(contents)} sources for content", spinner="growVertical"):
             chunks = nr.recursive_split_documents(contents)
-            #chunks = nr.chunk_contents(nlp, contents)
             console.log(f"Split {len(contents)} sources into {len(chunks)} chunks")
         with console.status(f"[bold green]Searching relevant chunks", spinner="growVertical"):
-            import time
-            start_time = time.time()
             relevant_results = nr.semantic_search(optimized_search_query, chunks, nlp, top_n=max_extract)
-            end_time = time.time()
-            execution_time = end_time - start_time
-            console.log(f"Semantic search took {execution_time:.2f} seconds")
             console.log(f"Found {len(relevant_results)} relevant chunks")
         with console.status(f"[bold green]Writing content", spinner="growVertical"):
             draft = nr.query_rag(chat, query, relevant_results)
@@ -149,38 +156,29 @@ def main(arguments):
         with console.status(f"[bold green]Embedding {len(contents)} sources for content", spinner="growVertical"):
             vector_store = wc.vectorize(contents, embedding_model)
         with console.status("[bold green]Writing content", spinner='dots8Bit'):
-            draft = wr.query_rag(chat, query, optimized_search_query, vector_store, top_k = max_extract)
-    console.rule(f"[bold green]Response")
-    if output == "text":
-        console.print(draft)
-    else:
-        console.print(Markdown(draft))
-    console.rule("[bold green]")
     if(copywrite_mode):
         with console.status("[bold green]Getting comments from the reviewer", spinner="dots8Bit"):
             comments = cw.generate_comments(chat, query, draft)
-        console.rule("[bold green]Response from reviewer")
-        if output == "text":
-            console.print(comments)
-        else:
-            console.print(Markdown(comments))
-        console.rule("[bold green]")
         with console.status("[bold green]Writing the final text", spinner="dots8Bit"):
             final_text = cw.generate_final_text(chat, query, draft, comments)
-        console.rule("[bold green]Final text")
-        if output == "text":
-            console.print(final_text)
-        else:
-            console.print(Markdown(final_text))
-        console.rule("[bold green]")
 if __name__ == '__main__':
     arguments = docopt(__doc__, version='Search Agent 0.1')
     main(arguments)

+"""
+search_agent.py
 Usage:
+    search_agent.py
         [--domain=domain]
         [--provider=provider]
         [--model=model]
     -c --copywrite                      First produce a draft, review it and rewrite for a final text
     -d domain --domain=domain           Limit search to a specific domain
     -t temp --temperature=temp          Set the temperature of the LLM [default: 0.0]
+    -m model --model=model              Use a specific model [default: hf:Qwen/Qwen2.5-72B-Instruct]
     -e model --embedding_model=model    Use an embedding model
     -n num --max_pages=num              Max number of pages to retrieve [default: 10]
     -x num --max_extracts=num           Max number of page extract to consider [default: 7]
 import os
 from docopt import docopt
 import dotenv
 from langchain.callbacks import LangChainTracer
 import models as md
 import nlp_rag as nr
+# Initialize console for rich text output
 console = Console()
+# Load environment variables from a .env file
 dotenv.load_dotenv()
 def get_selenium_driver():
+    """Initialize and return a headless Selenium WebDriver for Chrome."""
     from selenium import webdriver
     from selenium.webdriver.chrome.options import Options
     from selenium.common.exceptions import WebDriverException
         print(f"Error creating Selenium WebDriver: {e}")
         return None
+# Initialize callbacks list
 callbacks = []
+# Add LangChainTracer to callbacks if API key is set
 if os.getenv("LANGCHAIN_API_KEY"):
     callbacks.append(
         LangChainTracer(client=Client())
     )
 @traceable(run_type="tool", name="search_agent")
 def main(arguments):
+    """Main function to execute the search agent logic."""
     verbose = arguments["--verbose"]
     copywrite_mode = arguments["--copywrite"]
     model = arguments["--model"]
     embedding_model = arguments["--embedding_model"]
     temperature = float(arguments["--temperature"])
+    domain = arguments["--domain"]
+    max_pages = int(arguments["--max_pages"])
+    max_extract = int(arguments["--max_extracts"])
+    output = arguments["--output"]
+    use_selenium = arguments["--use_browser"]
     query = arguments["SEARCH_QUERY"]
+    # Get the language model based on the provided model name and temperature
     chat = md.get_model(model, temperature)
+    # If no embedding model is provided, use spacy for semantic search
     if embedding_model is None:
         use_nlp = True
         nlp = nr.get_nlp_model()
     else:
+        use_nlp = False
         embedding_model = md.get_embedding_model(embedding_model)
+    # Log model details if verbose mode is enabled
     if verbose:
         model_name = getattr(chat, 'model_name', None) or getattr(chat, 'model', None) or getattr(chat, 'model_id', None) or str(chat)
+        console.log(f"Using model: {model_name}")
         if not use_nlp:
             embedding_model_name = getattr(embedding_model, 'model_name', None) or getattr(embedding_model, 'model', None) or getattr(embedding_model, 'model_id', None) or str(embedding_model)
+            console.log(f"Using embedding model: {embedding_model_name}")
+    # Optimize the search query
     with console.status(f"[bold green]Optimizing query for search: {query}"):
         optimized_search_query = wr.optimize_search_query(chat, query)
         if len(optimized_search_query) < 3:
             optimized_search_query = query
     console.log(f"Optimized search query: [bold blue]{optimized_search_query}")
+    # Retrieve sources using the optimized query
     with console.status(
             f"[bold green]Searching sources using the optimized query: {optimized_search_query}"
         ):
         sources = wc.get_sources(optimized_search_query, max_pages=max_pages, domain=domain)
     console.log(f"Found {len(sources)} sources {'on ' + domain if domain else ''}")
+    # Fetch content from the retrieved sources
     with console.status(
         f"[bold green]Fetching content for {len(sources)} sources", spinner="growVertical"
     ):
         contents = wc.get_links_contents(sources, get_selenium_driver, use_selenium=use_selenium)
     console.log(f"Managed to extract content from {len(contents)} sources")
+    # Process content using spaCy or embedding model
     if use_nlp:
+        with console.status(f"[bold green]Splitting {len(contents)} sources for content", spinner="growVertical"):
             chunks = nr.recursive_split_documents(contents)
             console.log(f"Split {len(contents)} sources into {len(chunks)} chunks")
         with console.status(f"[bold green]Searching relevant chunks", spinner="growVertical"):
             relevant_results = nr.semantic_search(optimized_search_query, chunks, nlp, top_n=max_extract)
             console.log(f"Found {len(relevant_results)} relevant chunks")
         with console.status(f"[bold green]Writing content", spinner="growVertical"):
             draft = nr.query_rag(chat, query, relevant_results)
         with console.status(f"[bold green]Embedding {len(contents)} sources for content", spinner="growVertical"):
             vector_store = wc.vectorize(contents, embedding_model)
         with console.status("[bold green]Writing content", spinner='dots8Bit'):
+            draft = wr.query_rag(chat, query, optimized_search_query, vector_store, top_k=max_extract)
+    # If copywrite mode is enabled, generate comments and final text
     if(copywrite_mode):
         with console.status("[bold green]Getting comments from the reviewer", spinner="dots8Bit"):
             comments = cw.generate_comments(chat, query, draft)
         with console.status("[bold green]Writing the final text", spinner="dots8Bit"):
             final_text = cw.generate_final_text(chat, query, draft, comments)
+    else:
+        final_text = draft
+    # Output the answer
+    console.rule(f"[bold green]Response")
+    if output == "text":
+        console.print(final_text)
+    else:
+        console.print(Markdown(final_text))
+    console.rule("[bold green]")
+    return final_text
 if __name__ == '__main__':
+    # Parse command-line arguments and execute the main function
     arguments = docopt(__doc__, version='Search Agent 0.1')
     main(arguments)

web_crawler.py CHANGED Viewed

@@ -7,20 +7,32 @@ import io
 from trafilatura import extract
 from selenium.common.exceptions import TimeoutException
 from langchain_core.documents.base import Document
 from langchain_experimental.text_splitter import SemanticChunker
 from langchain.text_splitter import RecursiveCharacterTextSplitter, TokenTextSplitter
-from langchain_community.vectorstores.faiss import FAISS
 from langsmith import traceable
 import requests
 import pdfplumber
 @traceable(run_type="tool", name="get_sources")
-def get_sources(query, max_pages=10, domain=None):
     search_query = query
     if domain:
         search_query += f" site:{domain}"
     url = f"https://api.search.brave.com/res/v1/web/search?q={quote(search_query)}&count={max_pages}"
     headers = {
         'Accept': 'application/json',
         'Accept-Encoding': 'gzip',
@@ -52,9 +64,18 @@ def get_sources(query, max_pages=10, domain=None):
         print('Error fetching search results:', error)
         raise
-def fetch_with_selenium(url, driver, timeout=8,):
     try:
         driver.set_page_load_timeout(timeout)
         driver.get(url)
@@ -65,10 +86,20 @@ def fetch_with_selenium(url, driver, timeout=8,):
         html = None
     finally:
         driver.quit()
     return html
 def fetch_with_timeout(url, timeout=8):
     try:
         response = requests.get(url, timeout=timeout)
         response.raise_for_status()
@@ -76,8 +107,16 @@ def fetch_with_timeout(url, timeout=8):
     except requests.RequestException as error:
         return None
 def process_source(source):
     url = source['link']
     response = fetch_with_timeout(url, 2)
     if response:
@@ -109,6 +148,17 @@ def process_source(source):
 @traceable(run_type="tool", name="get_links_contents")
 def get_links_contents(sources, get_driver_func=None, use_selenium=False):
     with ThreadPoolExecutor() as executor:
         results = list(executor.map(process_source, sources))
@@ -128,6 +178,16 @@ def get_links_contents(sources, get_driver_func=None, use_selenium=False):
 @traceable(run_type="embedding")
 def vectorize(contents, embedding_model):
     documents = []
     for content in contents:
         try:
@@ -151,7 +211,7 @@ def vectorize(contents, embedding_model):
     for i in range(0, len(split_documents), batch_size):
         batch = split_documents[i:i+batch_size]
         if vector_store is None:
             vector_store = FAISS.from_documents(batch, embedding_model)
         else:
@@ -163,4 +223,4 @@ def vectorize(contents, embedding_model):
                 metadatas
             )
-    return vector_store

 from trafilatura import extract
 from selenium.common.exceptions import TimeoutException
 from langchain_core.documents.base import Document
+from langchain_community.vectorstores.faiss import FAISS
 from langchain_experimental.text_splitter import SemanticChunker
 from langchain.text_splitter import RecursiveCharacterTextSplitter, TokenTextSplitter
 from langsmith import traceable
 import requests
 import pdfplumber
 @traceable(run_type="tool", name="get_sources")
+def get_sources(query, max_pages=10, domain=None):
+    """
+    Fetch search results from the Brave Search API based on the given query.
+    Args:
+        query (str): The search query.
+        max_pages (int): Maximum number of pages to retrieve.
+        domain (str, optional): Limit search to a specific domain.
+    Returns:
+        list: A list of search results with title, link, snippet, and favicon.
+    """
     search_query = query
     if domain:
         search_query += f" site:{domain}"
     url = f"https://api.search.brave.com/res/v1/web/search?q={quote(search_query)}&count={max_pages}"
     headers = {
         'Accept': 'application/json',
         'Accept-Encoding': 'gzip',
         print('Error fetching search results:', error)
         raise
+def fetch_with_selenium(url, driver, timeout=8):
+    """
+    Fetch the HTML content of a webpage using Selenium.
+    Args:
+        url (str): The URL of the webpage.
+        driver: Selenium WebDriver instance.
+        timeout (int): Page load timeout in seconds.
+    Returns:
+        str: The HTML content of the page.
+    """
     try:
         driver.set_page_load_timeout(timeout)
         driver.get(url)
         html = None
     finally:
         driver.quit()
     return html
 def fetch_with_timeout(url, timeout=8):
+    """
+    Fetch a webpage with a specified timeout.
+    Args:
+        url (str): The URL of the webpage.
+        timeout (int): Request timeout in seconds.
+    Returns:
+        Response: The HTTP response object, or None if an error occurred.
+    """
     try:
         response = requests.get(url, timeout=timeout)
         response.raise_for_status()
     except requests.RequestException as error:
         return None
 def process_source(source):
+    """
+    Process a single source to extract its content.
+    Args:
+        source (dict): A dictionary containing the source's link and other metadata.
+    Returns:
+        dict: The source with its extracted page content.
+    """
     url = source['link']
     response = fetch_with_timeout(url, 2)
     if response:
 @traceable(run_type="tool", name="get_links_contents")
 def get_links_contents(sources, get_driver_func=None, use_selenium=False):
+    """
+    Retrieve and process the content of multiple sources.
+    Args:
+        sources (list): A list of source dictionaries.
+        get_driver_func (callable, optional): Function to get a Selenium WebDriver.
+        use_selenium (bool): Whether to use Selenium for fetching content.
+    Returns:
+        list: A list of processed sources with their page content.
+    """
     with ThreadPoolExecutor() as executor:
         results = list(executor.map(process_source, sources))
 @traceable(run_type="embedding")
 def vectorize(contents, embedding_model):
+    """
+    Vectorize the contents using the specified embedding model.
+    Args:
+        contents (list): A list of content dictionaries.
+        embedding_model: The embedding model to use.
+    Returns:
+        FAISS: A FAISS vector store containing the vectorized documents.
+    """
     documents = []
     for content in contents:
         try:
     for i in range(0, len(split_documents), batch_size):
         batch = split_documents[i:i+batch_size]
         if vector_store is None:
             vector_store = FAISS.from_documents(batch, embedding_model)
         else:
                 metadatas
             )
+    return vector_store

web_rag.py CHANGED Viewed

@@ -19,6 +19,7 @@ Perform RAG using a single query to retrieve relevant documents.
 """
 import os
 import json
 from langchain.schema import SystemMessage, HumanMessage
 from langchain.prompts.chat import (
     HumanMessagePromptTemplate,
@@ -53,13 +54,13 @@ def get_optimized_search_messages(query):
         content="""
             You are a prompt optimizer for web search. Your task is to take a given chat prompt or question and transform it into an optimized search string that will yield the most relevant and useful information from a search engine like Google.
             The goal is to create a search query that will help users find the most accurate and pertinent information related to their original prompt or question. An effective search string should be concise, use relevant keywords, and leverage search engine syntax for better results.
             To optimize the prompt:
             - Identify the key information being requested
             - Consider any implicit information or context that might be useful for the search.
             - Arrange the keywords into a concise search string
             - Put the most important keywords first
             Some tips and things to be sure to remove:
             - Remove any conversational or instructional phrases
             - Removed style such as "in the style of", "engaging", "short", "long"
@@ -68,7 +69,7 @@ def get_optimized_search_messages(query):
             - Remove lenght instruction (example: essay, article, letter, etc)
             You should answer only with the optimized search query and add "**" to the end of the search string to indicate the end of the query
             Example:
                 Question: How do I bake chocolate chip cookies from scratch?
                 chocolate chip cookies recipe from scratch**
@@ -105,9 +106,9 @@ def get_optimized_search_messages(query):
         """
     )
     human_message = HumanMessage(
-        content=f"""
             Question: {query}
         """
     )
     return [system_message, human_message]
@@ -150,14 +151,14 @@ def get_optimized_search_messages2(query):
             3. Adding quotation marks around exact phrases if applicable
             4. Including relevant synonyms or related terms (in parentheses) to broaden the search
             5. Using Boolean operators if needed to refine the search
             You should answer only with the optimized search query and add "**" to the end of the search string to indicate the end of the optimized search query
         """
     )
     human_message = HumanMessage(
-        content=f"""
             Question: {query}
         """
     )
     return [system_message, human_message]
@@ -165,20 +166,31 @@ def get_optimized_search_messages2(query):
 @traceable(run_type="llm", name="optimize_search_query")
 def optimize_search_query(chat_llm, query, callbacks=[]):
     messages = get_optimized_search_messages(query)
     response = chat_llm.invoke(messages)
     optimized_search_query = response.content.strip()
     # Split by '**' and take the first part, then strip whitespace
     optimized_search_query = optimized_search_query.split("**", 1)[0].strip()
     # Remove surrounding quotes if present
     optimized_search_query = optimized_search_query.strip('"')
     # If the result is empty, fall back to the original query
     if not optimized_search_query:
         optimized_search_query = query
     return optimized_search_query
 def get_rag_prompt_template():
@@ -193,7 +205,7 @@ def get_rag_prompt_template():
             input_variables=[],
             template="""
                 You are an expert research assistant.
-                You are provided with a Context in JSON format and a Question.
                 Each JSON entry contains: content, title, link
                 Use RAG to answer the Question, providing references and links to the Context material you retrieve and use in your answer:
@@ -203,7 +215,7 @@ def get_rag_prompt_template():
                 - Synthesize the retrieved information into a clear, informative answer to the question
                 - Format your answer in Markdown, using heading levels 2-3 as needed
                 - Include a "References" section at the end with the full citations and link for each source you used
                 If the provided context is not relevant to the question, say it and answer with your internal knowledge.
                 If you cannot answer the question using either the extracts or your internal knowledge, state that you don't have enough information to provide an accurate answer.
                 If the information in the provided context is in contradiction with your internal knowledge, answer but warn the user about the contradiction.
@@ -214,7 +226,7 @@ def get_rag_prompt_template():
         prompt=PromptTemplate(
             input_variables=["context", "query"],
             template="""
-                Context:
                 ---------------------
                 {context}
                 ---------------------
@@ -229,6 +241,15 @@ def get_rag_prompt_template():
     )
 def format_docs(docs):
     formatted_docs = []
     for d in docs:
         content = d.page_content
@@ -241,6 +262,19 @@ def format_docs(docs):
 def multi_query_rag(chat_llm, question, search_query, vectorstore, callbacks = []):
     retriever_from_llm = MultiQueryRetriever.from_llm(
         retriever=vectorstore.as_retriever(), llm=chat_llm, include_original=True,
     )
@@ -259,7 +293,7 @@ def get_context_size(chat_llm):
         else:
             return 16385
     if isinstance(chat_llm, ChatFireworks):
-        32768
     if isinstance(chat_llm, ChatGroq):
         return 32768
     if isinstance(chat_llm, ChatOllama):
@@ -278,9 +312,10 @@ def get_context_size(chat_llm):
                 return 128000
             return 32000
     return 4096
-@traceable(run_type="retriever")
 def build_rag_prompt(chat_llm, question, search_query, vectorstore, top_k = 10, callbacks = []):
     done = False
     while not done:
         unique_docs = vectorstore.similarity_search(
@@ -292,14 +327,12 @@ def build_rag_prompt(chat_llm, question, search_query, vectorstore, top_k = 10,
             done = True
         else:
             top_k = int(top_k * 0.75)
     return prompt
 @traceable(run_type="llm", name="query_rag")
 def query_rag(chat_llm, question, search_query, vectorstore, top_k = 10, callbacks = []):
     prompt = build_rag_prompt(chat_llm, question, search_query, vectorstore, top_k=top_k, callbacks = callbacks)
     response = chat_llm.invoke(prompt, config={"callbacks": callbacks})
     # Ensure we're returning a string
     if isinstance(response.content, list):
         # If it's a list, join the elements into a single string

 """
 import os
 import json
+from docopt import re
 from langchain.schema import SystemMessage, HumanMessage
 from langchain.prompts.chat import (
     HumanMessagePromptTemplate,
         content="""
             You are a prompt optimizer for web search. Your task is to take a given chat prompt or question and transform it into an optimized search string that will yield the most relevant and useful information from a search engine like Google.
             The goal is to create a search query that will help users find the most accurate and pertinent information related to their original prompt or question. An effective search string should be concise, use relevant keywords, and leverage search engine syntax for better results.
             To optimize the prompt:
             - Identify the key information being requested
             - Consider any implicit information or context that might be useful for the search.
             - Arrange the keywords into a concise search string
             - Put the most important keywords first
             Some tips and things to be sure to remove:
             - Remove any conversational or instructional phrases
             - Removed style such as "in the style of", "engaging", "short", "long"
             - Remove lenght instruction (example: essay, article, letter, etc)
             You should answer only with the optimized search query and add "**" to the end of the search string to indicate the end of the query
             Example:
                 Question: How do I bake chocolate chip cookies from scratch?
                 chocolate chip cookies recipe from scratch**
         """
     )
     human_message = HumanMessage(
+        content=f"""
             Question: {query}
         """
     )
     return [system_message, human_message]
             3. Adding quotation marks around exact phrases if applicable
             4. Including relevant synonyms or related terms (in parentheses) to broaden the search
             5. Using Boolean operators if needed to refine the search
             You should answer only with the optimized search query and add "**" to the end of the search string to indicate the end of the optimized search query
         """
     )
     human_message = HumanMessage(
+        content=f"""
             Question: {query}
         """
     )
     return [system_message, human_message]
 @traceable(run_type="llm", name="optimize_search_query")
 def optimize_search_query(chat_llm, query, callbacks=[]):
+    """
+    Optimize the search query using the chat language model.
+    Args:
+        chat_llm: The chat language model to use.
+        query (str): The user's query.
+        callbacks (list): Optional callbacks for tracing.
+    Returns:
+        str: The optimized search query.
+    """
     messages = get_optimized_search_messages(query)
     response = chat_llm.invoke(messages)
     optimized_search_query = response.content.strip()
     # Split by '**' and take the first part, then strip whitespace
     optimized_search_query = optimized_search_query.split("**", 1)[0].strip()
     # Remove surrounding quotes if present
     optimized_search_query = optimized_search_query.strip('"')
     # If the result is empty, fall back to the original query
     if not optimized_search_query:
         optimized_search_query = query
     return optimized_search_query
 def get_rag_prompt_template():
             input_variables=[],
             template="""
                 You are an expert research assistant.
+                You are provided with a Context in JSON format and a Question.
                 Each JSON entry contains: content, title, link
                 Use RAG to answer the Question, providing references and links to the Context material you retrieve and use in your answer:
                 - Synthesize the retrieved information into a clear, informative answer to the question
                 - Format your answer in Markdown, using heading levels 2-3 as needed
                 - Include a "References" section at the end with the full citations and link for each source you used
                 If the provided context is not relevant to the question, say it and answer with your internal knowledge.
                 If you cannot answer the question using either the extracts or your internal knowledge, state that you don't have enough information to provide an accurate answer.
                 If the information in the provided context is in contradiction with your internal knowledge, answer but warn the user about the contradiction.
         prompt=PromptTemplate(
             input_variables=["context", "query"],
             template="""
+                Context:
                 ---------------------
                 {context}
                 ---------------------
     )
 def format_docs(docs):
+    """
+    Format the retrieved documents into a JSON string.
+    Args:
+        docs (list): A list of documents to format.
+    Returns:
+        str: The formatted documents as a JSON string.
+    """
     formatted_docs = []
     for d in docs:
         content = d.page_content
 def multi_query_rag(chat_llm, question, search_query, vectorstore, callbacks = []):
+    """
+    Perform RAG using multiple queries to retrieve relevant documents.
+    Args:
+        chat_llm: The chat language model to use.
+        question (str): The user's question.
+        search_query (str): The search query to use.
+        vectorstore: The vector store for document retrieval.
+        callbacks (list): Optional callbacks for tracing.
+    Returns:
+        str: The generated answer to the question.
+    """
     retriever_from_llm = MultiQueryRetriever.from_llm(
         retriever=vectorstore.as_retriever(), llm=chat_llm, include_original=True,
     )
         else:
             return 16385
     if isinstance(chat_llm, ChatFireworks):
+        return 32768
     if isinstance(chat_llm, ChatGroq):
         return 32768
     if isinstance(chat_llm, ChatOllama):
                 return 128000
             return 32000
     return 4096
+@traceable(run_type="retriever")
 def build_rag_prompt(chat_llm, question, search_query, vectorstore, top_k = 10, callbacks = []):
+    prompt = ""
     done = False
     while not done:
         unique_docs = vectorstore.similarity_search(
             done = True
         else:
             top_k = int(top_k * 0.75)
     return prompt
 @traceable(run_type="llm", name="query_rag")
 def query_rag(chat_llm, question, search_query, vectorstore, top_k = 10, callbacks = []):
     prompt = build_rag_prompt(chat_llm, question, search_query, vectorstore, top_k=top_k, callbacks = callbacks)
     response = chat_llm.invoke(prompt, config={"callbacks": callbacks})
     # Ensure we're returning a string
     if isinstance(response.content, list):
         # If it's a list, join the elements into a single string