FlexRAG's picture
Update FlexRAG retriever
6da1e1c verified
---
language: en
library_name: FlexRAG
tags:
- FlexRAG
- retrieval
- search
- lexical
- RAG
---
# The BM25SRetriever for the wiki2021 corpus
The corpus was created by the [Atlas](https://github.com/facebookresearch/atlas) project and the index was built using the [FlexRAG](https://github.com/ictnlp/flexrag) library.
| Corpus Attribute | Value |
| ---------------- | --------------------------------------------------------------- |
| Language | English |
| Domain | Wikipedia |
| Size | 37.5M (33.1M text, 4.3M infobox) |
| Dump Date | Dec 2021 |
| Provideer | [Atlas](https://github.com/facebookresearch/atlas) |
| License | [CC-BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/) |
| Index Attribute | Value |
| --------------- | --------------------------------------------------------------- |
| Index Type | BM25S |
| Index Method | Lucene |
| Preprocessing | LengthFilter(min_char=10, max_char=4096) |
| Provideer | [FlexRAG](https://github.com/ictnlp/flexrag) |
| License | [CC-BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/) |
## Installation
You can install the `FlexRAG` library with `pip`:
```bash
pip install flexrag
```
## Loading a `FlexRAG` retriever
You can use this retriever for information retrieval tasks. Here is an example:
```python
from flexrag.retriever import LocalRetriever
# Load the retriever from the HuggingFace Hub
retriever = LocalRetriever.load_from_hub("FlexRAG/wiki2021_atlas_bm25s")
# You can retrieve now
results = retriever.search("Who is Bruce Wayne?")
```
## Running the RAG application with the retriever
You can run the **GUI application** of the RAG assistant with this retriever. Here is an example:
```bash
python -m flexrag.entrypoints.run_interactive \
assistant_type=modular \
modular_config.used_fields=[title,text] \
modular_config.retriever_type="FlexRAG/wiki2021_atlas_bm25s" \
modular_config.response_type=original \
modular_config.generator_type=openai \
modular_config.openai_config.model_name='gpt-4o-mini' \
modular_config.openai_config.api_key=$OPENAI_KEY \
modular_config.do_sample=False
```
You can also run the **FlexRAG's RAG evaluation pipeline** with this retriever. Here is an example that evaluates the **ModularAssistant** with the retriever on the *Natural Questions* test split:
```bash
OUTPUT_PATH=<path_to_output>
DB_PATH=<path_to_database>
OPENAI_KEY=<your_openai_key>
python -m flexrag.entrypoints.run_assistant \
name=nq \
split=test \
output_path=${OUTPUT_PATH} \
assistant_type=modular \
modular_config.used_fields=[title,text] \
modular_config.retriever_type="FlexRAG/wiki2021_atlas_bm25s" \
modular_config.generator_type=openai \
modular_config.openai_config.model_name='gpt-4o-mini' \
modular_config.openai_config.api_key=$OPENAI_KEY \
modular_config.do_sample=False \
eval_config.metrics_type=[retrieval_success_rate,generation_f1,generation_em] \
eval_config.retrieval_success_rate_config.context_preprocess.processor_type=[simplify_answer] \
eval_config.retrieval_success_rate_config.eval_field=text \
eval_config.response_preprocess.processor_type=[simplify_answer]
```
## License
As the corpus is based on the [CC-BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/) license, the retriever is also licensed under the same license.
## Related Links
FlexRAG Related Links:
* πŸ“š[Documentation](https://flexrag.readthedocs.io/en/latest/)
* πŸ’»[GitHub Repository](https://github.com/ictnlp/flexrag)