rag-tool / Smolagent.md
Chris4K's picture
Create Smolagent.md
ecd7ff9 verified

A newer version of the Gradio SDK is available: 5.29.0

Upgrade

Advanced RAG Tool for smolagents

This repository contains an improved Retrieval-Augmented Generation (RAG) tool built for the smolagents library from Hugging Face. This tool allows you to:

  • Create vector stores from various document types (PDF, TXT, HTML, etc.)
  • Choose different embedding models for better semantic understanding
  • Configure chunk sizes and overlaps for optimal text splitting
  • Select between different vector stores (FAISS or Chroma)
  • Share your tool on the Hugging Face Hub

Installation

pip install smolagents langchain-community langchain-text-splitters faiss-cpu chromadb sentence-transformers pypdf2 gradio

Basic Usage

from rag_tool import RAGTool

# Initialize the RAG tool
rag_tool = RAGTool()

# Configure with custom settings
rag_tool.configure(
    documents_path="./my_document.pdf",  
    embedding_model="BAAI/bge-small-en-v1.5",
    vector_store_type="faiss",
    chunk_size=1000,
    chunk_overlap=200,
    persist_directory="./vector_store",
    device="cpu"  # Use "cuda" for GPU acceleration
)

# Query the documents
result = rag_tool("What is attention in transformer architecture?", top_k=3)
print(result)

Using with an Agent

import warnings
# Suppress LangChain deprecation warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

from smolagents import CodeAgent, InferenceClientModel
from rag_tool import RAGTool

# Initialize and configure the RAG tool
rag_tool = RAGTool()
rag_tool.configure(documents_path="./my_document.pdf")

# Create an agent model
model = InferenceClientModel(
    model_id="mistralai/Mistral-7B-Instruct-v0.2",
    token="your_huggingface_token"
)

# Create the agent with our RAG tool
agent = CodeAgent(tools=[rag_tool], model=model, add_base_tools=True)

# Run the agent
result = agent.run("Explain the key components of the transformer architecture")
print(result)

Gradio Interface

For an interactive experience, run the Gradio app:

python gradio_app.py

This provides a web interface where you can:

  • Upload documents
  • Configure embedding models and chunk settings
  • Query your documents with semantic search

Customization Options

Embedding Models

You can choose from various embedding models:

  • sentence-transformers/all-MiniLM-L6-v2 (fast, smaller model)
  • BAAI/bge-small-en-v1.5 (good balance of performance and speed)
  • BAAI/bge-base-en-v1.5 (better performance, slower)
  • thenlper/gte-small (good for general text embeddings)
  • thenlper/gte-base (larger GTE model)

Vector Store Types

  • faiss: Fast, in-memory vector database (better for smaller collections)
  • chroma: Persistent vector database with metadata filtering capabilities

Document Types

The tool supports multiple document types:

  • PDF documents
  • Text files (.txt)
  • Markdown files (.md)
  • HTML files (.html)
  • Entire directories of mixed document types

Sharing Your Tool

You can share your tool on the Hugging Face Hub:

rag_tool.push_to_hub("your-username/rag-retrieval-tool", token="your_huggingface_token")

Limitations

  • The tool currently doesn't support image content from PDFs
  • Very large documents may require additional memory
  • Some embedding models may be slow on CPU-only environments

Contributing

Contributions are welcome! Feel free to open an issue or submit a pull request.

License

MIT