FlexRAG
/

wiki2021_atlas_bm25s

Model card Files Files and versions Community

wiki2021_atlas_bm25s / README.md

FlexRAG's picture

Update FlexRAG retriever

6da1e1c verified about 1 month ago

|

history blame contribute delete

4.08 kB

	---
	language: en
	library_name: FlexRAG
	tags:
	- FlexRAG
	- retrieval
	- search
	- lexical
	- RAG
	---

	# The BM25SRetriever for the wiki2021 corpus

	The corpus was created by the [Atlas](https://github.com/facebookresearch/atlas) project and the index was built using the [FlexRAG](https://github.com/ictnlp/flexrag) library.

	\| Corpus Attribute \| Value \|
	\| ---------------- \| --------------------------------------------------------------- \|
	\| Language \| English \|
	\| Domain \| Wikipedia \|
	\| Size \| 37.5M (33.1M text, 4.3M infobox) \|
	\| Dump Date \| Dec 2021 \|
	\| Provideer \| [Atlas](https://github.com/facebookresearch/atlas) \|
	\| License \| [CC-BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/) \|


	\| Index Attribute \| Value \|
	\| --------------- \| --------------------------------------------------------------- \|
	\| Index Type \| BM25S \|
	\| Index Method \| Lucene \|
	\| Preprocessing \| LengthFilter(min_char=10, max_char=4096) \|
	\| Provideer \| [FlexRAG](https://github.com/ictnlp/flexrag) \|
	\| License \| [CC-BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/) \|


	## Installation

	You can install the `FlexRAG` library with `pip`:

	```bash
	pip install flexrag
	```

	## Loading a `FlexRAG` retriever

	You can use this retriever for information retrieval tasks. Here is an example:

	```python
	from flexrag.retriever import LocalRetriever

	# Load the retriever from the HuggingFace Hub
	retriever = LocalRetriever.load_from_hub("FlexRAG/wiki2021_atlas_bm25s")

	# You can retrieve now
	results = retriever.search("Who is Bruce Wayne?")
	```

	## Running the RAG application with the retriever

	You can run the GUI application of the RAG assistant with this retriever. Here is an example:

	```bash
	python -m flexrag.entrypoints.run_interactive \
	assistant_type=modular \
	modular_config.used_fields=[title,text] \
	modular_config.retriever_type="FlexRAG/wiki2021_atlas_bm25s" \
	modular_config.response_type=original \
	modular_config.generator_type=openai \
	modular_config.openai_config.model_name='gpt-4o-mini' \
	modular_config.openai_config.api_key=$OPENAI_KEY \
	modular_config.do_sample=False
	```

	You can also run the FlexRAG's RAG evaluation pipeline with this retriever. Here is an example that evaluates the ModularAssistant with the retriever on the Natural Questions test split:

	```bash
	OUTPUT_PATH=<path_to_output>
	DB_PATH=<path_to_database>
	OPENAI_KEY=<your_openai_key>

	python -m flexrag.entrypoints.run_assistant \
	name=nq \
	split=test \
	output_path=${OUTPUT_PATH} \
	assistant_type=modular \
	modular_config.used_fields=[title,text] \
	modular_config.retriever_type="FlexRAG/wiki2021_atlas_bm25s" \
	modular_config.generator_type=openai \
	modular_config.openai_config.model_name='gpt-4o-mini' \
	modular_config.openai_config.api_key=$OPENAI_KEY \
	modular_config.do_sample=False \
	eval_config.metrics_type=[retrieval_success_rate,generation_f1,generation_em] \
	eval_config.retrieval_success_rate_config.context_preprocess.processor_type=[simplify_answer] \
	eval_config.retrieval_success_rate_config.eval_field=text \
	eval_config.response_preprocess.processor_type=[simplify_answer]
	```

	## License
	As the corpus is based on the [CC-BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/) license, the retriever is also licensed under the same license.

	## Related Links

	FlexRAG Related Links:
	* 📚[Documentation](https://flexrag.readthedocs.io/en/latest/)
	* 💻[GitHub Repository](https://github.com/ictnlp/flexrag)