Spaces:

mtyrrell
/

chatfed_generator

Sleeping

App Files Files Community

mtyrrell commited on Jul 7

Commit

23162a1

1 Parent(s): 8170b18

refactored generator

Browse files

Files changed (4) hide show

README.md +37 -30
app/generator.py +224 -0
app/main.py +9 -8
app/utils.py +3 -135

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-title: Chatfed Generation Service
 emoji: 🤖
 colorFrom: blue
 colorTo: purple
@@ -8,46 +8,53 @@ pinned: false
 license: mit
 ---
-# Generation Module
-This is an LLM-based generation service designed to be deployed as a modular component of a broader RAG system. The service runs on a docker container and exposes a gradio UI on port 7860 as well as an MCP endpoint.
-## Configuration
-1. The module requires an API key (set as an environment variable) for an inference provider to run. Multiple inference providers are supported. Make sure to set the appropriate environment variables:
-- OpenAI: `OPENAI_API_KEY`
-- Anthropic: `ANTHROPIC_API_KEY`
-- Cohere: `COHERE_API_KEY`
-- HuggingFace: `HF_TOKEN`
-2. Inference provider and model settings are accessible via params.cfg
 ## MCP Endpoint
-## Available Tools
-### `rag_generate`
-Generate an answer to a query using provided context through RAG. This function takes a user query and relevant context, then uses a language model to generate a comprehensive answer based on the provided information.
-**Input Schema:**
-| Parameter | Type | Description |
-|-----------|------|-------------|
-| `query` | string | The user's question or query |
-| `context` | string | The relevant context/documents to use for answering |
-**Returns:** The generated answer based on the query and context
-**Example Usage:**
-```json
-{
-  "query": "What are the benefits of renewable energy?",
-  "context": "Documents and information about renewable energy sources..."
-}
 ```
----
-*This tool uses an LLM to generate answers using the most relevant information from the context, along with the input query.*

 ---
+title: ChatFed Generator
 emoji: 🤖
 colorFrom: blue
 colorTo: purple
 license: mit
 ---
+# ChatFed Generator - MCP Server
+A language model-based generation service designed for ChatFed RAG (Retrieval-Augmented Generation) pipelines. This module serves as an **MCP (Model Context Protocol) server** that generates contextual responses using configurable LLM providers with support for retrieval result processing.
 ## MCP Endpoint
+The main MCP function is `generate` which provides context-aware text generation using configurable LLM providers when properly configured with API credentials.
+**Parameters**:
+- `query` (str, required): The question or query to be answered
+- `context` (str|list, required): Context for answering - can be plain text or list of retrieval result dictionaries
+**Returns**: String containing the generated answer based on the provided context and query.
+**Example usage**:
+```python
+from gradio_client import Client
+client = Client("ENTER CONTAINER URL / SPACE ID")
+result = client.predict(
+		query="What are the key findings?",
+		context="Your relevant documents or context here...",
+		api_name="/generate"
+)
+print(result)
+```
+## Configuration
+### LLM Provider Configuration
+1. Set your preferred inference provider in `params.cfg`
+2. Configure the model and generation parameters
+3. Set the required API key environment variable
+4. [Optional] Adjust temperature and max_tokens settings
+5. Run the app:
+```bash
+docker build -t chatfed-generator .
+docker run -p 7860:7860 chatfed-generator
 ```
+## Environment Variables Required
+# Make sure to set the appropriate environment variables:
+# - OpenAI: `OPENAI_API_KEY`
+# - Anthropic: `ANTHROPIC_API_KEY`
+# - Cohere: `COHERE_API_KEY`
+# - HuggingFace: `HF_TOKEN`

app/generator.py ADDED Viewed

	@@ -0,0 +1,224 @@

+import logging
+import asyncio
+import json
+import ast
+from typing import List, Dict, Any, Union
+from dotenv import load_dotenv
+# LangChain imports
+from langchain_openai import ChatOpenAI
+from langchain_anthropic import ChatAnthropic
+from langchain_cohere import ChatCohere
+from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint
+from langchain_core.messages import SystemMessage, HumanMessage
+# Local imports
+from .utils import getconfig, get_auth
+# ---------------------------------------------------------------------
+# Model / client initialization (non exaustive list of providers)
+# ---------------------------------------------------------------------
+config = getconfig("params.cfg")
+PROVIDER = config.get("generator", "PROVIDER")
+MODEL = config.get("generator", "MODEL")
+MAX_TOKENS = int(config.get("generator", "MAX_TOKENS"))
+TEMPERATURE = float(config.get("generator", "TEMPERATURE"))
+# Set up authentication for the selected provider
+auth_config = get_auth(PROVIDER)
+def get_chat_model():
+    """Initialize the appropriate LangChain chat model based on provider"""
+    common_params = {
+        "temperature": TEMPERATURE,
+        "max_tokens": MAX_TOKENS,
+    }
+    if PROVIDER == "openai":
+        return ChatOpenAI(
+            model=MODEL,
+            openai_api_key=auth_config["api_key"],
+            **common_params
+        )
+    elif PROVIDER == "anthropic":
+        return ChatAnthropic(
+            model=MODEL,
+            anthropic_api_key=auth_config["api_key"],
+            **common_params
+        )
+    elif PROVIDER == "cohere":
+        return ChatCohere(
+            model=MODEL,
+            cohere_api_key=auth_config["api_key"],
+            **common_params
+        )
+    elif PROVIDER == "huggingface":
+        # Initialize HuggingFaceEndpoint with explicit parameters
+        llm = HuggingFaceEndpoint(
+            repo_id=MODEL,
+            huggingfacehub_api_token=auth_config["api_key"],
+            task="text-generation",
+            temperature=TEMPERATURE,
+            max_new_tokens=MAX_TOKENS
+        )
+        return ChatHuggingFace(llm=llm)
+    else:
+        raise ValueError(f"Unsupported provider: {PROVIDER}")
+# Initialize provider-agnostic chat model
+chat_model = get_chat_model()
+# ---------------------------------------------------------------------
+# Context processing - may need further refinement (i.e. to manage other data sources)
+# ---------------------------------------------------------------------
+def extract_relevant_fields(retrieval_results: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
+    """
+    Extract only relevant fields from retrieval results.
+    Args:
+        retrieval_results: List of JSON objects from retriever
+    Returns:
+        List of processed objects with only relevant fields
+    """
+    retrieval_results = ast.literal_eval(retrieval_results)
+    processed_results = []
+    for result in retrieval_results:
+        # Extract the answer content
+        answer = result.get('answer', '')
+        # Extract document identification from metadata
+        metadata = result.get('answer_metadata', {})
+        doc_info = {
+            'answer': answer,
+            'filename': metadata.get('filename', 'Unknown'),
+            'page': metadata.get('page', 'Unknown'),
+            'year': metadata.get('year', 'Unknown'),
+            'source': metadata.get('source', 'Unknown'),
+            'document_id': metadata.get('_id', 'Unknown')
+        }
+        processed_results.append(doc_info)
+    return processed_results
+def format_context_from_results(processed_results: List[Dict[str, Any]]) -> str:
+    """
+    Format processed retrieval results into a context string for the LLM.
+    Args:
+        processed_results: List of processed objects with relevant fields
+    Returns:
+        Formatted context string
+    """
+    if not processed_results:
+        return ""
+    context_parts = []
+    for i, result in enumerate(processed_results, 1):
+        doc_reference = f"[Document {i}: {result['filename']}"
+        if result['page'] != 'Unknown':
+            doc_reference += f", Page {result['page']}"
+        if result['year'] != 'Unknown':
+            doc_reference += f", Year {result['year']}"
+        doc_reference += "]"
+        context_part = f"{doc_reference}\n{result['answer']}\n"
+        context_parts.append(context_part)
+    return "\n".join(context_parts)
+# ---------------------------------------------------------------------
+# Core generation function for both Gradio UI and MCP
+# ---------------------------------------------------------------------
+async def _call_llm(messages: list) -> str:
+    """
+    Provider-agnostic LLM call using LangChain.
+    Args:
+        messages: List of LangChain message objects
+    Returns:
+        Generated response content as string
+    """
+    try:
+        # Use async invoke for better performance
+        response = await chat_model.ainvoke(messages)
+        return response.content.strip()
+    except Exception as e:
+        logging.exception(f"LLM generation failed with provider '{PROVIDER}' and model '{MODEL}': {e}")
+        raise
+def build_messages(question: str, context: str) -> list:
+    """
+    Build messages in LangChain format.
+    Args:
+        question: The user's question
+        context: The relevant context for answering
+    Returns:
+        List of LangChain message objects
+    """
+    system_content = (
+        "You are an expert assistant. Answer the USER question using only the "
+        "CONTEXT provided. If the context is insufficient say 'I don't know.'"
+    )
+    user_content = f"### CONTEXT\n{context}\n\n### USER QUESTION\n{question}"
+    return [
+        SystemMessage(content=system_content),
+        HumanMessage(content=user_content)
+    ]
+async def generate(query: str, context: Union[str, List[Dict[str, Any]]]) -> str:
+    """
+    Generate an answer to a query using provided context through RAG.
+    This function takes a user query and relevant context, then uses a language model
+    to generate a comprehensive answer based on the provided information.
+    Args:
+        query (str): User query
+        context (list): List of retrieval result objects (dictionaries)
+    Returns:
+        str: The generated answer based on the query and context
+    """
+    if not query.strip():
+        return "Error: Query cannot be empty"
+    # Handle both string context (for Gradio UI) and list context (from retriever)
+    if isinstance(context, list):
+        if not context:
+            return "Error: No retrieval results provided"
+        # Process the retrieval results
+        processed_results = extract_relevant_fields(context)
+        formatted_context = format_context_from_results(processed_results)
+        if not formatted_context.strip():
+            return "Error: No valid content found in retrieval results"
+    elif isinstance(context, str):
+        if not context.strip():
+            return "Error: Context cannot be empty"
+        formatted_context = context
+    else:
+        return "Error: Context must be either a string or list of retrieval results"
+    try:
+        messages = build_messages(query, formatted_context)
+        answer = await _call_llm(messages)
+        return answer
+    except Exception as e:
+        logging.exception("Generation failed")
+        return f"Error: {str(e)}"

app/main.py CHANGED Viewed

@@ -1,23 +1,23 @@
 import gradio as gr
-from .utils import rag_generate
 # ---------------------------------------------------------------------
 # Gradio Interface with MCP support
 # ---------------------------------------------------------------------
 ui = gr.Interface(
-    fn=rag_generate,
     inputs=[
         gr.Textbox(
             label="Query",
             lines=2,
-            placeholder="What would you like to know?",
-            info="Enter your question here"
         ),
         gr.Textbox(
             label="Context",
             lines=8,
-            placeholder="Paste relevant documents or context here...",
-            info="Provide the context/documents to use for answering"
         ),
     ],
     outputs=gr.Textbox(
@@ -25,8 +25,9 @@ ui = gr.Interface(
         lines=6,
         show_copy_button=True
     ),
-    title="RAG Generation Service",
-    description="Ask questions based on provided context. Intended for use in RAG pipelines (i.e. context supplied by semantic retriever service) as an MCP server.",
 )
 # Launch with MCP server enabled

 import gradio as gr
+from .generator import generate
 # ---------------------------------------------------------------------
 # Gradio Interface with MCP support
 # ---------------------------------------------------------------------
 ui = gr.Interface(
+    fn=generate,
     inputs=[
         gr.Textbox(
             label="Query",
             lines=2,
+            placeholder="Enter query here",
+            info="The query to search for in the vector database"
         ),
         gr.Textbox(
             label="Context",
             lines=8,
+            placeholder="Paste relevant context here",
+            info="Provide the context/documents to use for answering. The API expects a list of dictionaries, but the UI should except anything"
         ),
     ],
     outputs=gr.Textbox(
         lines=6,
         show_copy_button=True
     ),
+    title="ChatFed Generation Module",
+    description="Ask questions based on provided context. Intended for use in RAG pipelines as an MCP server with other ChatFed modules (i.e. context supplied by semantic retriever service).",
+    api_name="generate"
 )
 # Launch with MCP server enabled

app/utils.py CHANGED Viewed

@@ -1,14 +1,9 @@
-import os, asyncio, logging
 import configparser
 import logging
 from dotenv import load_dotenv
-# LangChain imports
-from langchain_openai import ChatOpenAI
-from langchain_anthropic import ChatAnthropic
-from langchain_cohere import ChatCohere
-from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint
-from langchain_core.messages import SystemMessage, HumanMessage
 # Local .env file
 load_dotenv()
@@ -30,7 +25,7 @@ def getconfig(configfile_path: str):
 # ---------------------------------------------------------------------
 # Provider-agnostic authentication and configuration
 # ---------------------------------------------------------------------
-def get_auth_config(provider: str) -> dict:
     """Get authentication configuration for different providers"""
     auth_configs = {
         "openai": {"api_key": os.getenv("OPENAI_API_KEY")},
@@ -49,130 +44,3 @@ def get_auth_config(provider: str) -> dict:
         raise RuntimeError(f"Missing API key for provider '{provider}'. Please set the appropriate environment variable.")
     return auth_config
-# ---------------------------------------------------------------------
-# Model / client initialization
-# ---------------------------------------------------------------------
-config = getconfig("params.cfg")
-PROVIDER = config.get("generator", "PROVIDER")
-MODEL = config.get("generator", "MODEL")
-MAX_TOKENS = int(config.get("generator", "MAX_TOKENS"))
-TEMPERATURE = float(config.get("generator", "TEMPERATURE"))
-# Set up authentication for the selected provider
-auth_config = get_auth_config(PROVIDER)
-def get_chat_model():
-    """Initialize the appropriate LangChain chat model based on provider"""
-    common_params = {
-        "temperature": TEMPERATURE,
-        "max_tokens": MAX_TOKENS,
-    }
-    if PROVIDER == "openai":
-        return ChatOpenAI(
-            model=MODEL,
-            openai_api_key=auth_config["api_key"],
-            **common_params
-        )
-    elif PROVIDER == "anthropic":
-        return ChatAnthropic(
-            model=MODEL,
-            anthropic_api_key=auth_config["api_key"],
-            **common_params
-        )
-    elif PROVIDER == "cohere":
-        return ChatCohere(
-            model=MODEL,
-            cohere_api_key=auth_config["api_key"],
-            **common_params
-        )
-    elif PROVIDER == "huggingface":
-        # Initialize HuggingFaceEndpoint with explicit parameters
-        llm = HuggingFaceEndpoint(
-            repo_id=MODEL,
-            huggingfacehub_api_token=auth_config["api_key"],
-            task="text-generation",
-            temperature=TEMPERATURE,
-            max_new_tokens=MAX_TOKENS
-        )
-        return ChatHuggingFace(llm=llm)
-    else:
-        raise ValueError(f"Unsupported provider: {PROVIDER}")
-# Initialize provider-agnostic chat model
-chat_model = get_chat_model()
-# ---------------------------------------------------------------------
-# Core generation function for both Gradio UI and MCP
-# ---------------------------------------------------------------------
-async def _call_llm(messages: list) -> str:
-    """
-    Provider-agnostic LLM call using LangChain.
-    Args:
-        messages: List of LangChain message objects
-    Returns:
-        Generated response content as string
-    """
-    try:
-        # Use async invoke for better performance
-        response = await chat_model.ainvoke(messages)
-        return response.content.strip()
-    except Exception as e:
-        logging.exception(f"LLM generation failed with provider '{PROVIDER}' and model '{MODEL}': {e}")
-        raise
-def build_messages(question: str, context: str) -> list:
-    """
-    Build messages in LangChain format.
-    Args:
-        question: The user's question
-        context: The relevant context for answering
-    Returns:
-        List of LangChain message objects
-    """
-    system_content = (
-        "You are an expert assistant. Answer the USER question using only the "
-        "CONTEXT provided. If the context is insufficient say 'I don't know.'"
-    )
-    user_content = f"### CONTEXT\n{context}\n\n### USER QUESTION\n{question}"
-    return [
-        SystemMessage(content=system_content),
-        HumanMessage(content=user_content)
-    ]
-async def rag_generate(query: str, context: str) -> str:
-    """
-    Generate an answer to a query using provided context through RAG.
-    This function takes a user query and relevant context, then uses a language model
-    to generate a comprehensive answer based on the provided information.
-    Args:
-        query (str): The user's question or query
-        context (str): The relevant context/documents to use for answering
-    Returns:
-        str: The generated answer based on the query and context
-    """
-    if not query.strip():
-        return "Error: Query cannot be empty"
-    if not context.strip():
-        return "Error: Context cannot be empty"
-    try:
-        messages = build_messages(query, context)
-        answer = await _call_llm(messages)
-        return answer
-    except Exception as e:
-        logging.exception("Generation failed")
-        return f"Error: {str(e)}"

+import os
 import configparser
 import logging
 from dotenv import load_dotenv
 # Local .env file
 load_dotenv()
 # ---------------------------------------------------------------------
 # Provider-agnostic authentication and configuration
 # ---------------------------------------------------------------------
+def get_auth(provider: str) -> dict:
     """Get authentication configuration for different providers"""
     auth_configs = {
         "openai": {"api_key": os.getenv("OPENAI_API_KEY")},
         raise RuntimeError(f"Missing API key for provider '{provider}'. Please set the appropriate environment variable.")
     return auth_config