File size: 3,408 Bytes
ecd7ff9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
# Advanced RAG Tool for smolagents

This repository contains an improved Retrieval-Augmented Generation (RAG) tool built for the `smolagents` library from Hugging Face. This tool allows you to:

- Create vector stores from various document types (PDF, TXT, HTML, etc.)
- Choose different embedding models for better semantic understanding
- Configure chunk sizes and overlaps for optimal text splitting
- Select between different vector stores (FAISS or Chroma)
- Share your tool on the Hugging Face Hub

## Installation

```bash
pip install smolagents langchain-community langchain-text-splitters faiss-cpu chromadb sentence-transformers pypdf2 gradio
```

## Basic Usage

```python
from rag_tool import RAGTool

# Initialize the RAG tool
rag_tool = RAGTool()

# Configure with custom settings
rag_tool.configure(
    documents_path="./my_document.pdf",  
    embedding_model="BAAI/bge-small-en-v1.5",
    vector_store_type="faiss",
    chunk_size=1000,
    chunk_overlap=200,
    persist_directory="./vector_store",
    device="cpu"  # Use "cuda" for GPU acceleration
)

# Query the documents
result = rag_tool("What is attention in transformer architecture?", top_k=3)
print(result)
```

## Using with an Agent

```python
import warnings
# Suppress LangChain deprecation warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

from smolagents import CodeAgent, InferenceClientModel
from rag_tool import RAGTool

# Initialize and configure the RAG tool
rag_tool = RAGTool()
rag_tool.configure(documents_path="./my_document.pdf")

# Create an agent model
model = InferenceClientModel(
    model_id="mistralai/Mistral-7B-Instruct-v0.2",
    token="your_huggingface_token"
)

# Create the agent with our RAG tool
agent = CodeAgent(tools=[rag_tool], model=model, add_base_tools=True)

# Run the agent
result = agent.run("Explain the key components of the transformer architecture")
print(result)
```

## Gradio Interface

For an interactive experience, run the Gradio app:

```bash
python gradio_app.py
```

This provides a web interface where you can:
- Upload documents
- Configure embedding models and chunk settings
- Query your documents with semantic search

## Customization Options

### Embedding Models

You can choose from various embedding models:
- `sentence-transformers/all-MiniLM-L6-v2` (fast, smaller model)
- `BAAI/bge-small-en-v1.5` (good balance of performance and speed)
- `BAAI/bge-base-en-v1.5` (better performance, slower)
- `thenlper/gte-small` (good for general text embeddings)
- `thenlper/gte-base` (larger GTE model)

### Vector Store Types

- `faiss`: Fast, in-memory vector database (better for smaller collections)
- `chroma`: Persistent vector database with metadata filtering capabilities

### Document Types

The tool supports multiple document types:
- PDF documents
- Text files (.txt)
- Markdown files (.md)
- HTML files (.html)
- Entire directories of mixed document types

## Sharing Your Tool

You can share your tool on the Hugging Face Hub:

```python
rag_tool.push_to_hub("your-username/rag-retrieval-tool", token="your_huggingface_token")
```

## Limitations

- The tool currently doesn't support image content from PDFs
- Very large documents may require additional memory
- Some embedding models may be slow on CPU-only environments

## Contributing

Contributions are welcome! Feel free to open an issue or submit a pull request.

## License

MIT