CoRAG-Llama3.1-8B-MultihopQA

This is the CoRAG-8B model fine-tuned on MultihopQA data in the paper Chain-of-Retrieval Augmented Generation.

Model Evaluation

Model 2WikiQA EM 2WikiQA F1 HotpotQA EM HotpotQA F1 Bamboogle EM Bamboogle F1 MuSiQue EM MuSiQue F1
3-shot Llama-3.1-8B-Inst. 30.7 39.9 34.1 46.6 28.0 37.3 7.7 15.4
3-shot GPT-4o 49.0 56.2 45.8 59.4 53.6 63.8 15.7 25.8
Fine-tuned Llama-8B w/ E5large 55.1 60.7 50.3 63.5 40.8 53.7 17.4 28.1
CoRAG-8B (Ours)
> L=1, greedy 56.5 62.3 50.1 63.2 37.6 51.4 18.6 29.3
> L=6, greedy 70.6 75.5 54.4 67.5 48.0 63.5 27.7 38.5
> L=6, best-of-4 71.7 76.5 55.3 68.5 51.2 63.1 28.1 39.7
> L=6, tree search 71.7 76.4 55.8 69.0 48.8 64.4 29.0 40.3
> L=10, best-of-8 72.5 77.3 56.3 69.8 54.4 68.3 30.9 42.4

Please refer to https://github.com/microsoft/LMOps/tree/main/corag for evaluation instructions.

Model predictions are available as the predictions field at https://huggingface.co/datasets/corag/multihopqa

Disclaimer

This model has been specifically trained for the task of MultihopQA. It may not perform well on other tasks.

References

@article{wang2025chain,
  title={Chain-of-Retrieval Augmented Generation},
  author={Wang, Liang and Chen, Haonan and Yang, Nan and Huang, Xiaolong and Dou, Zhicheng and Wei, Furu},
  journal={arXiv preprint arXiv:2501.14342},
  year={2025}
}
Downloads last month
0
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.