File size: 3,408 Bytes
ecd7ff9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
# Advanced RAG Tool for smolagents
This repository contains an improved Retrieval-Augmented Generation (RAG) tool built for the `smolagents` library from Hugging Face. This tool allows you to:
- Create vector stores from various document types (PDF, TXT, HTML, etc.)
- Choose different embedding models for better semantic understanding
- Configure chunk sizes and overlaps for optimal text splitting
- Select between different vector stores (FAISS or Chroma)
- Share your tool on the Hugging Face Hub
## Installation
```bash
pip install smolagents langchain-community langchain-text-splitters faiss-cpu chromadb sentence-transformers pypdf2 gradio
```
## Basic Usage
```python
from rag_tool import RAGTool
# Initialize the RAG tool
rag_tool = RAGTool()
# Configure with custom settings
rag_tool.configure(
documents_path="./my_document.pdf",
embedding_model="BAAI/bge-small-en-v1.5",
vector_store_type="faiss",
chunk_size=1000,
chunk_overlap=200,
persist_directory="./vector_store",
device="cpu" # Use "cuda" for GPU acceleration
)
# Query the documents
result = rag_tool("What is attention in transformer architecture?", top_k=3)
print(result)
```
## Using with an Agent
```python
import warnings
# Suppress LangChain deprecation warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)
from smolagents import CodeAgent, InferenceClientModel
from rag_tool import RAGTool
# Initialize and configure the RAG tool
rag_tool = RAGTool()
rag_tool.configure(documents_path="./my_document.pdf")
# Create an agent model
model = InferenceClientModel(
model_id="mistralai/Mistral-7B-Instruct-v0.2",
token="your_huggingface_token"
)
# Create the agent with our RAG tool
agent = CodeAgent(tools=[rag_tool], model=model, add_base_tools=True)
# Run the agent
result = agent.run("Explain the key components of the transformer architecture")
print(result)
```
## Gradio Interface
For an interactive experience, run the Gradio app:
```bash
python gradio_app.py
```
This provides a web interface where you can:
- Upload documents
- Configure embedding models and chunk settings
- Query your documents with semantic search
## Customization Options
### Embedding Models
You can choose from various embedding models:
- `sentence-transformers/all-MiniLM-L6-v2` (fast, smaller model)
- `BAAI/bge-small-en-v1.5` (good balance of performance and speed)
- `BAAI/bge-base-en-v1.5` (better performance, slower)
- `thenlper/gte-small` (good for general text embeddings)
- `thenlper/gte-base` (larger GTE model)
### Vector Store Types
- `faiss`: Fast, in-memory vector database (better for smaller collections)
- `chroma`: Persistent vector database with metadata filtering capabilities
### Document Types
The tool supports multiple document types:
- PDF documents
- Text files (.txt)
- Markdown files (.md)
- HTML files (.html)
- Entire directories of mixed document types
## Sharing Your Tool
You can share your tool on the Hugging Face Hub:
```python
rag_tool.push_to_hub("your-username/rag-retrieval-tool", token="your_huggingface_token")
```
## Limitations
- The tool currently doesn't support image content from PDFs
- Very large documents may require additional memory
- Some embedding models may be slow on CPU-only environments
## Contributing
Contributions are welcome! Feel free to open an issue or submit a pull request.
## License
MIT |