|
# Advanced RAG Tool for smolagents |
|
|
|
This repository contains an improved Retrieval-Augmented Generation (RAG) tool built for the `smolagents` library from Hugging Face. This tool allows you to: |
|
|
|
- Create vector stores from various document types (PDF, TXT, HTML, etc.) |
|
- Choose different embedding models for better semantic understanding |
|
- Configure chunk sizes and overlaps for optimal text splitting |
|
- Select between different vector stores (FAISS or Chroma) |
|
- Share your tool on the Hugging Face Hub |
|
|
|
## Installation |
|
|
|
```bash |
|
pip install smolagents langchain-community langchain-text-splitters faiss-cpu chromadb sentence-transformers pypdf2 gradio |
|
``` |
|
|
|
## Basic Usage |
|
|
|
```python |
|
from rag_tool import RAGTool |
|
|
|
# Initialize the RAG tool |
|
rag_tool = RAGTool() |
|
|
|
# Configure with custom settings |
|
rag_tool.configure( |
|
documents_path="./my_document.pdf", |
|
embedding_model="BAAI/bge-small-en-v1.5", |
|
vector_store_type="faiss", |
|
chunk_size=1000, |
|
chunk_overlap=200, |
|
persist_directory="./vector_store", |
|
device="cpu" # Use "cuda" for GPU acceleration |
|
) |
|
|
|
# Query the documents |
|
result = rag_tool("What is attention in transformer architecture?", top_k=3) |
|
print(result) |
|
``` |
|
|
|
## Using with an Agent |
|
|
|
```python |
|
import warnings |
|
# Suppress LangChain deprecation warnings |
|
warnings.filterwarnings("ignore", category=DeprecationWarning) |
|
|
|
from smolagents import CodeAgent, InferenceClientModel |
|
from rag_tool import RAGTool |
|
|
|
# Initialize and configure the RAG tool |
|
rag_tool = RAGTool() |
|
rag_tool.configure(documents_path="./my_document.pdf") |
|
|
|
# Create an agent model |
|
model = InferenceClientModel( |
|
model_id="mistralai/Mistral-7B-Instruct-v0.2", |
|
token="your_huggingface_token" |
|
) |
|
|
|
# Create the agent with our RAG tool |
|
agent = CodeAgent(tools=[rag_tool], model=model, add_base_tools=True) |
|
|
|
# Run the agent |
|
result = agent.run("Explain the key components of the transformer architecture") |
|
print(result) |
|
``` |
|
|
|
## Gradio Interface |
|
|
|
For an interactive experience, run the Gradio app: |
|
|
|
```bash |
|
python gradio_app.py |
|
``` |
|
|
|
This provides a web interface where you can: |
|
- Upload documents |
|
- Configure embedding models and chunk settings |
|
- Query your documents with semantic search |
|
|
|
## Customization Options |
|
|
|
### Embedding Models |
|
|
|
You can choose from various embedding models: |
|
- `sentence-transformers/all-MiniLM-L6-v2` (fast, smaller model) |
|
- `BAAI/bge-small-en-v1.5` (good balance of performance and speed) |
|
- `BAAI/bge-base-en-v1.5` (better performance, slower) |
|
- `thenlper/gte-small` (good for general text embeddings) |
|
- `thenlper/gte-base` (larger GTE model) |
|
|
|
### Vector Store Types |
|
|
|
- `faiss`: Fast, in-memory vector database (better for smaller collections) |
|
- `chroma`: Persistent vector database with metadata filtering capabilities |
|
|
|
### Document Types |
|
|
|
The tool supports multiple document types: |
|
- PDF documents |
|
- Text files (.txt) |
|
- Markdown files (.md) |
|
- HTML files (.html) |
|
- Entire directories of mixed document types |
|
|
|
## Sharing Your Tool |
|
|
|
You can share your tool on the Hugging Face Hub: |
|
|
|
```python |
|
rag_tool.push_to_hub("your-username/rag-retrieval-tool", token="your_huggingface_token") |
|
``` |
|
|
|
## Limitations |
|
|
|
- The tool currently doesn't support image content from PDFs |
|
- Very large documents may require additional memory |
|
- Some embedding models may be slow on CPU-only environments |
|
|
|
## Contributing |
|
|
|
Contributions are welcome! Feel free to open an issue or submit a pull request. |
|
|
|
## License |
|
|
|
MIT |