---
language: en
library_name: FlexRAG
tags:
- FlexRAG
- retrieval
- search
- lexical
- RAG
---

# The BM25SRetriever for the wiki2021 corpus

The corpus was created by the [Atlas](https://github.com/facebookresearch/atlas) project and the index was built using the [FlexRAG](https://github.com/ictnlp/flexrag) library.

| Corpus Attribute | Value                                                           |
| ---------------- | --------------------------------------------------------------- |
| Language         | English                                                         |
| Domain           | Wikipedia                                                       |
| Size             | 37.5M (33.1M text, 4.3M infobox)                                |
| Dump Date        | Dec 2021                                                        |
| Provideer        | [Atlas](https://github.com/facebookresearch/atlas)              |
| License          | [CC-BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/) |


| Index Attribute | Value                                                           |
| --------------- | --------------------------------------------------------------- |
| Index Type      | BM25S                                                           |
| Index Method    | Lucene                                                          |
| Preprocessing   | LengthFilter(min_char=10, max_char=4096)                        |
| Provideer       | [FlexRAG](https://github.com/ictnlp/flexrag)                    |
| License         | [CC-BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/) |


## Installation

You can install the `FlexRAG` library with `pip`:

```bash
pip install flexrag
```

## Loading a `FlexRAG` retriever

You can use this retriever for information retrieval tasks. Here is an example:

```python
from flexrag.retriever import LocalRetriever

# Load the retriever from the HuggingFace Hub
retriever = LocalRetriever.load_from_hub("FlexRAG/wiki2021_atlas_bm25s")

# You can retrieve now
results = retriever.search("Who is Bruce Wayne?")
```

## Running the RAG application with the retriever

You can run the **GUI application** of the RAG assistant with this retriever. Here is an example:

```bash
python -m flexrag.entrypoints.run_interactive \
    assistant_type=modular \
    modular_config.used_fields=[title,text] \
    modular_config.retriever_type="FlexRAG/wiki2021_atlas_bm25s" \
    modular_config.response_type=original \
    modular_config.generator_type=openai \
    modular_config.openai_config.model_name='gpt-4o-mini' \
    modular_config.openai_config.api_key=$OPENAI_KEY \
    modular_config.do_sample=False
```

You can also run the **FlexRAG's RAG evaluation pipeline** with this retriever. Here is an example that evaluates the **ModularAssistant** with the retriever on the *Natural Questions* test split:

```bash
OUTPUT_PATH=<path_to_output>
DB_PATH=<path_to_database>
OPENAI_KEY=<your_openai_key>

python -m flexrag.entrypoints.run_assistant \
    name=nq \
    split=test \
    output_path=${OUTPUT_PATH} \
    assistant_type=modular \
    modular_config.used_fields=[title,text] \
    modular_config.retriever_type="FlexRAG/wiki2021_atlas_bm25s" \
    modular_config.generator_type=openai \
    modular_config.openai_config.model_name='gpt-4o-mini' \
    modular_config.openai_config.api_key=$OPENAI_KEY \
    modular_config.do_sample=False \
    eval_config.metrics_type=[retrieval_success_rate,generation_f1,generation_em] \
    eval_config.retrieval_success_rate_config.context_preprocess.processor_type=[simplify_answer] \
    eval_config.retrieval_success_rate_config.eval_field=text \
    eval_config.response_preprocess.processor_type=[simplify_answer]
```

## License
As the corpus is based on the [CC-BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/) license, the retriever is also licensed under the same license.

## Related Links

FlexRAG Related Links:
* 📚[Documentation](https://flexrag.readthedocs.io/en/latest/)
* 💻[GitHub Repository](https://github.com/ictnlp/flexrag)