--- language: en library_name: FlexRAG tags: - FlexRAG - retrieval - search - lexical - RAG --- # The BM25SRetriever for the wiki2021 corpus The corpus was created by the [Atlas](https://github.com/facebookresearch/atlas) project and the index was built using the [FlexRAG](https://github.com/ictnlp/flexrag) library. | Corpus Attribute | Value | | ---------------- | --------------------------------------------------------------- | | Language | English | | Domain | Wikipedia | | Size | 37.5M (33.1M text, 4.3M infobox) | | Dump Date | Dec 2021 | | Provideer | [Atlas](https://github.com/facebookresearch/atlas) | | License | [CC-BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/) | | Index Attribute | Value | | --------------- | --------------------------------------------------------------- | | Index Type | BM25S | | Index Method | Lucene | | Preprocessing | LengthFilter(min_char=10, max_char=4096) | | Provideer | [FlexRAG](https://github.com/ictnlp/flexrag) | | License | [CC-BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/) | ## Installation You can install the `FlexRAG` library with `pip`: ```bash pip install flexrag ``` ## Loading a `FlexRAG` retriever You can use this retriever for information retrieval tasks. Here is an example: ```python from flexrag.retriever import LocalRetriever # Load the retriever from the HuggingFace Hub retriever = LocalRetriever.load_from_hub("FlexRAG/wiki2021_atlas_bm25s") # You can retrieve now results = retriever.search("Who is Bruce Wayne?") ``` ## Running the RAG application with the retriever You can run the **GUI application** of the RAG assistant with this retriever. Here is an example: ```bash python -m flexrag.entrypoints.run_interactive \ assistant_type=modular \ modular_config.used_fields=[title,text] \ modular_config.retriever_type="FlexRAG/wiki2021_atlas_bm25s" \ modular_config.response_type=original \ modular_config.generator_type=openai \ modular_config.openai_config.model_name='gpt-4o-mini' \ modular_config.openai_config.api_key=$OPENAI_KEY \ modular_config.do_sample=False ``` You can also run the **FlexRAG's RAG evaluation pipeline** with this retriever. Here is an example that evaluates the **ModularAssistant** with the retriever on the *Natural Questions* test split: ```bash OUTPUT_PATH= DB_PATH= OPENAI_KEY= python -m flexrag.entrypoints.run_assistant \ name=nq \ split=test \ output_path=${OUTPUT_PATH} \ assistant_type=modular \ modular_config.used_fields=[title,text] \ modular_config.retriever_type="FlexRAG/wiki2021_atlas_bm25s" \ modular_config.generator_type=openai \ modular_config.openai_config.model_name='gpt-4o-mini' \ modular_config.openai_config.api_key=$OPENAI_KEY \ modular_config.do_sample=False \ eval_config.metrics_type=[retrieval_success_rate,generation_f1,generation_em] \ eval_config.retrieval_success_rate_config.context_preprocess.processor_type=[simplify_answer] \ eval_config.retrieval_success_rate_config.eval_field=text \ eval_config.response_preprocess.processor_type=[simplify_answer] ``` ## License As the corpus is based on the [CC-BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/) license, the retriever is also licensed under the same license. ## Related Links FlexRAG Related Links: * 📚[Documentation](https://flexrag.readthedocs.io/en/latest/) * 💻[GitHub Repository](https://github.com/ictnlp/flexrag)